next up previous contents
Next: Theory of the Network Up: The Boltzmann Machine Previous: Network Characteristics

Network Operation

The determination of the network weights of the Boltzmann machine is a two-step process. In the first step, the states of the input and output units are held fixed and the states of the hidden units are modified by the simulated annealing process until a steady state is achieved. When this happens, statistics on the frequency with which pairs of units are both on are collected. This process is repeated for all the exemplar pairs several times. In the second step, only the states of the input units are held fixed, and the states of the hidden units and the output units are modified by the simulated annealing process until a steady state is reached. When this happens, statistics on the frequency with which pairs of units are both on are again collected. The statistics collected are used to update the network weights and the procedure is repeated until the weights stabilise.

Suppose the network has Ni input units, denoted $\mu_1, \ldots , \mu_{N_i}$,Nh hidden units, $\mu_{N_i+1}, \ldots , \mu_{N_i+N_h}$, and No output units, $\mu_{N_i+N_h+1}, \ldots , \mu_{N_i+N_h+N_o}$. Put N =Ni + Nh + No. The weights wij and thresholds $\theta_i, 
1 \leq i,j \leq N$, are initialized to some arbitrary values, subject to $w_{ij} = w_{ji}, i \neq j$,and wii = 0.

The following procedure is repeated until the weights stabilise.

The input and output pairs are selected in some order for presentation to the network. The states of the input and output units are fixed at the values determined by each selected input and output pair, the states of the hidden units are set to some random initial values, and the temperature is set to some high value.

Simulated annealing is used to bring the network to a steady state. The hidden units are chosen in some random order and the quantity $\Delta E = \sum_j w_{ij} \mu_j(t) - \theta_i$ is computed, where the chosen unit is the ith one, and $\mu_j(t)$ is the state of the jth unit at time t. A random number between and 1, denoted $\rho$, is also chosen. The state of the chosen unit is udpated according to the following rules:

After cycling through the units a number of times, the temperature parameter is reduced and the procedure is repeated. When the annealing schedule is completed, the network is run for a number of cycles while the states of pairs of units are examined, and the pairs which are both in the on state are recorded.

After all the input and output pairs have been presented a number of times, the recorded results from each presentation are used to compute pij, the proportion of the time that both the ith and jth units were on.

The exemplar patterns are now presented to the network again, but this time only the states of the input units are held fixed, while the states of both the hidden units and the output units are changed according to the simulated annealing process. Statistics are collected for the computation of $p^\prime_{ij}$, the proportion of the time that both the ith and jth units were on when the output units were free.

The weights are now updated. If $p_{ij}-p^\prime_{ij} 
\gt 0, w_{ij}$ is incremented by a fixed amount. If $p_{ij}-p^\prime_{ij} 
< 0, w_{ij}$ is decremented by a fixed amount.

Once the weights stabilize, the network can be used for recall. The states of the input units are fixed at the values determined by the selected input vector and the simulated annealing procedure is applied to the hidden units and the output units. When equilibrium is reached, the output vector can be read off the output units.


next up previous contents
Next: Theory of the Network Up: The Boltzmann Machine Previous: Network Characteristics
Mike Alder
9/19/1997