The Dynamic Decay Adjustment Algorithm

Next: Using RBF--DDA in Up: Dynamic Decay Adjustment Previous: Dynamic Decay Adjustment

The Dynamic Decay Adjustment Algorithm

The Dynamic Decay Adjustment (DDA--)Algorithm is an extension of the RCE-Algorithm (see [Hud92,RCE82]) and offers easy and constructive training for Radial Basis Function Networks. RBFs trained with the DDA-Algorithm often achieve classification accuracy comparable to Multi Layer Perceptrons (MLPs) but training is significantly faster ([BD95]).

An RBF trained with the DDA-Algorithm (RBF-DDA) is similar in structure to the common feedforward MLP with one hidden layer and without shortcut connections:

The number of units in the input layer represents the dimensionality of the input space.
The hidden layer contains the RBF units. Units are added in this layer during training. The input layer is fully connected to the hidden layer.
Each unit in the output layer represents one possible class, resulting in an 1-of-n or binary coding. For classification a winner-takes-all approach is used, i.e. the output with the highest activation determines the class. Each hidden unit is connected to exactly one output unit.

The main differences to an MLP are the activation function and propagation rule of the hidden layer: Instead of using a sigmoid or another nonlinear squashing function, RBFs use localized functions, radial Gaussians, as an activation function. In addition, a computation of the Euclidian distance to an individual reference vector replaces the scalar product used in MLPs:

If the network receives vector as an input, indicates the activation of one RBF unit with reference vector and standard deviation .

The output layer computes the output for each class as follows:

with m indicating the number of RBFs belonging to the corresponding class and being the weight for each RBF.

An example of a full RBF-DDA is shown in figure . Note that there do not exist any shortcut connections between input and output units in an RBF-DDA.

Figure: The structure of a Radial Basis Function Network.

In this illustration the weight vector that connects all input units to one hidden unit represents the centre of the Gaussian. The Euclidian distance of the input vector to this reference vector (or prototype) is used as an input to the Gaussian which leads to a local response; if the input vector is close to the prototype, the unit will have a high activation. In contrast the activation will be close to zero for larger distances. Each output unit simply computes a weighted sum of all activations of the RBF units belonging to the corresponding class.

The DDA-Algorithm introduces the idea of distinguishing between matching and conflicting neighbors in an area of conflict. Two thresholds and are introduced as illustrated in figure .

Figure: One RBF unit as used by the DDA-Algorithm. Two thresholds are used to define an area of conflict where no other prototype of a conflicting class is allowed to exist. In addition, each training pattern has to be in the inner circle of at least one prototype of the correct class.

Normally, is set to be greater than which leads to a area of conflict where neither matching nor conflicting training patterns are allowed to lie. Using these thresholds, the algorithm constructs the network dynamically and adjusts the radii individually.

In short the main properties of the DDA-Algorithm are:

constructive training: new RBF nodes are added whenever necessary. The network is built from scratch, the number of required hidden units is determined during training. Individual radii are adjusted dynamically during training.
fast training: usually about five epochs are needed to complete training, due to the constructive nature of the algorithm. End of training is clearly indicated.
guaranteed convergence: the algorithm can be proven to terminate.
two uncritical parameters: only the two parameters and have to be adjusted manually. Fortunately the values of these two thresholds are not critical to determine. For all tasks that have been used so far, and was a good choice.
guaranteed properties of the network: it can be shown that after training has terminated, the network holds several conditions for all training patterns: wrong classifications are below a certain threshold () and correct classifications are above another threshold ().

The DDA-Algorithm is based on two steps. During training, whenever a pattern is misclassified, either a new RBF unit with an initial weight = 1 is introduced (called commit) or the weight of an existing RBF (which covers the new pattern) is incremented. In both cases the radii of conflicting RBFs (RBFs belonging to the wrong class) are reduced (called shrink). This guarantees that each of the patterns in the training data is covered by an RBF of the correct class and none of the RBFs of a conflicting class has an inappropriate response.

Two parameters are introduced at this stage, a positive threshold and a negative threshold . To commit a new prototype, none of the existing RBFs of the correct class has an activation above and during shrinking no RBF of a conflicting class is allowed to have an activation above . Figure shows an example that illustrates the first few training steps of the DDA-Algorithm.

Figure: An example of the DDA-Algorithm: (1) a pattern of class A is encountered and a new RBF is created; (2) a training pattern of class B leads to a new prototype for class B and shrinks the radius of the existing RBF of class A; (3) another pattern of class B is classified correctly and shrinks again the prototype of class A; (4) a new pattern of class A introduces another prototype of that class.

After training is finished, two conditions are true for all input--output pairs of the training data:

at least one prototype of the correct class c has an activation value greater or equal to :
all prototypes of conflicting classes have activations less or equal to ( indicates the number of prototypes belonging to class k):

For all experiments conducted so far, the choice of =0.4 and =0.2 led to satisfactory results. In theory, those parameters should be dependent on the dimensionality of the feature space but in practice the values of the two thresholds seem to be uncritical. Much more important is that the input data is normalized. Due to the radial nature of RBFs each attribute should be distributed over an equivalent range. Usually normalization into [0,1] is sufficient.

Next: Using RBF--DDA in Up: Dynamic Decay Adjustment Previous: Dynamic Decay Adjustment

Niels.Mache@informatik.uni-stuttgart.de
Tue Nov 28 10:30:44 MET 1995