RBF_Weights

Next: RBF_Weights_Kohonen Up: Initialization Functions Previous: Initialization Functions

RBF_Weights

Of the named three procedures RBF_Weights is the most comprehensive one. Here all necessary initialization tasks (setting link weights and bias) for a fully connected three layer feedforward network (without shortcut connections) can be performed in one single step. Hence, the choice of centers (i.e. the link weights between input and hidden layer) is rather simple: The centers are evenly selected from the loaded teaching patterns and assigned to the links of the hidden neurons. The selection process assigns the first teaching pattern to the first hidden unit, and the last pattern to the last hidden unit. The remaining hidden units receive centers which are evenly picked from the set of teaching patterns. If, for example, 13 teaching patterns are loaded and the hidden layer consists of 5 neurons, then the patterns with numbers 1, 4, 7, 10 and 13 are selected as centers.

Before a selected teaching pattern is distributed among the corresponding link weights it can be modified slightly with a random number. For this purpose, an initialization parameter ( deviation, parameter 5) is set, which determines the maximum percentage of deviation allowed to occur randomly. To calculate the deviation, an inverse tangent function is used to approximate a normal distribution so that small deviations are more probable than large deviations. Setting the parameter deviation to 1.0 results in a maximum deviation of 100%. The centers are copied unchanged into the link weights if the deviation is set to 0.

A small modification of the centers is recommended for the following reasons: First, the number of hidden units may exceed the number of teaching patterns. In this case it is necessary to break the symmetry which would result without modification. This symmetry would render the calculation of the Moore Penrose inverse matrix impossible. The second reason is that there may be a few anomalous patterns inside the set of teaching patterns. These patterns would cause bad initialization results if they accidentally were selected as a center. By adding a small amount of noise, the negative effect caused by anomalous patterns can be lowered. However, if an exact interpolation is to be performed no modification of centers may be allowed.

The next initialization step is to set the free parameter p of the base function h, i.e. the bias of the hidden neurons. In order to do this, the initialization parameter bias (p) is directly copied into the bias of all hidden neurons. The setting of the bias is highly related to the base function h used and to the properties of the teaching patterns. When the Gaussian function is used, it is recommended to choose the value of the bias so that 5--10% of all hidden neurons are activated during propagation of every single teaching pattern. If the bias is chosen too small, almost all hidden neurons are uniformly activated during propagation. If the bias is chosen too large, only that hidden neuron is activated whose center vector corresponds to the currently applied teaching pattern.

Now the expensive initialization of the links between hidden and output layer is actually performed. In order to do this, the following formula which was already presented above is applied:

The initialization parameter 3 ( smoothness) represents the value of in this formula. The matrices have been extended to allow an automatic computation of an additional constant value. If there is more than one neuron inside the output layer, the following set of functions results:

The bias of the output neuron(s) is directly set to the calculated value of b (). Therefore, it is necessary to choose an activation function for the output neurons that uses the bias of the neurons. In the current version of SNNS, the functions Act_Logistic and Act_IdentityPlusBias implement this feature.

The activation functions of the output units lead to the remaining two initialization parameters. The initialization procedure assumes a linear activation of the output units. The link weights are calculated so that the weighted sum of the hidden neurons equals the teaching output. However, if a sigmoid activation function is used, which is recommended for pattern recognition tasks, the activation function has to be considered during initialization. Ideally, the supposed input for the activation function should be computed with the inverse activation function depending on the corresponding teaching output. This input value would be associated with the vector during the calculation of weights. Unfortunately, the inverse activation function is unknown in the general case.

The first and second initialization parameters ( 0_scale) and ( 1_scale) are a remedy for this dilemma. They define the two control points of a piecewise linear function which approximates the activation function. 0_scale and 1_ scale give the net inputs of the output units which produce the teaching outputs 0 and 1. If, for example, the linear activation function Act_IdentityPlusBias is used, the values 0 and 1 have to be used. When using the logistic activation function Act_Logistic, the values -4 and 4 are recommended. If the bias is set to 0, these values lead to a final activation of (resp. ). These are comparatively good approximations of the desired teaching outputs 0 and 1. The implementation interpolates linearly between the set values of 0_scale and 1_scale. Thus, also teaching values which differ from 0 and 1 are mapped to corresponding input values.

Figure: Relation between teaching output, input value and logistic activation

Figure shows the activation of an output unit under use of the logistic activation function. The scale has been chosen in such a way, that the teaching outputs 0 and 1 are mapped to the input values -2 and 2.

The optimal values used for 0_scale and 1_scale can not be given in general. With the logistic activation function large scaling values lead to good initialization results, but interfere with the subsequent training, since the logistic function is used mainly in its very flat parts. On the other hand, small scaling values lead to bad initialization results, but produce good preconditions for additional training.

Next: RBF_Weights_Kohonen Up: Initialization Functions Previous: Initialization Functions

Niels.Mache@informatik.uni-stuttgart.de
Tue Nov 28 10:30:44 MET 1995