The Dynamic Decay Adjustment (DDA--)Algorithm is an extension of the
RCE-Algorithm (see [Hud92,RCE82]) and offers easy and
constructive training for Radial Basis Function Networks. RBFs
trained with the DDA-Algorithm often achieve classification accuracy
comparable to Multi Layer Perceptrons (MLPs)
but training is significantly faster ([BD95]).
An RBF trained with the DDA-Algorithm (RBF-DDA) is similar in structure to the common feedforward MLP with one hidden layer and without shortcut connections:
The main differences to an MLP are the activation function and propagation rule of the hidden layer: Instead of using a sigmoid or another nonlinear squashing function, RBFs use localized functions, radial Gaussians, as an activation function. In addition, a computation of the Euclidian distance to an individual reference vector replaces the scalar product used in MLPs:
If the network receives vector as an input,
indicates
the activation of one RBF unit with reference vector
and
standard deviation
.
The output layer computes the output for each class as follows:
with m indicating the number of RBFs belonging to the corresponding
class and being the weight for each RBF.
An example of a full RBF-DDA is shown in figure .
Note that there do not exist any shortcut connections between input
and output units in an RBF-DDA.
Figure: The structure of a Radial Basis Function Network.
In this illustration the weight vector that connects all input units to one hidden unit represents the centre of the Gaussian. The Euclidian distance of the input vector to this reference vector (or prototype) is used as an input to the Gaussian which leads to a local response; if the input vector is close to the prototype, the unit will have a high activation. In contrast the activation will be close to zero for larger distances. Each output unit simply computes a weighted sum of all activations of the RBF units belonging to the corresponding class.
The DDA-Algorithm introduces the idea of distinguishing between
matching and conflicting neighbors in an area of
conflict. Two thresholds and
are
introduced as illustrated in figure
.
Figure: One RBF unit as used by the DDA-Algorithm. Two thresholds
are used to define an area of conflict where no
other prototype of a conflicting class is allowed to exist.
In addition, each training pattern has to be in the inner
circle of at least one prototype of the correct class.
Normally, is set to be greater than
which
leads to a area of conflict where neither matching nor
conflicting training patterns are allowed to lie
. Using these
thresholds, the algorithm constructs the network dynamically and
adjusts the radii individually.
In short the main properties of the DDA-Algorithm are:
The DDA-Algorithm is based on two steps. During training, whenever a pattern is misclassified, either a new RBF unit with an initial weight = 1 is introduced (called commit) or the weight of an existing RBF (which covers the new pattern) is incremented. In both cases the radii of conflicting RBFs (RBFs belonging to the wrong class) are reduced (called shrink). This guarantees that each of the patterns in the training data is covered by an RBF of the correct class and none of the RBFs of a conflicting class has an inappropriate response.
Two parameters are introduced at this stage, a positive threshold
and a negative threshold
. To commit
a new prototype, none of the existing RBFs of the correct class has
an activation above
and during shrinking no RBF of
a conflicting class is allowed to have an activation above
.
Figure
shows an example that illustrates the first
few training steps of the DDA-Algorithm.
Figure: An example of the DDA-Algorithm: (1) a pattern of class A
is encountered and a new RBF is created; (2) a training pattern of
class B leads to a new prototype for class B and shrinks the
radius of the existing RBF of class A; (3) another pattern of
class B is classified correctly and shrinks again the prototype
of class A; (4) a new pattern of class A introduces another
prototype of that class.
After training is finished, two conditions are true for all input--output
pairs
of the training data:
For all experiments conducted so far, the choice of =0.4
and
=0.2 led to satisfactory results. In theory, those
parameters should be dependent on the dimensionality of the feature
space but in practice the values of the two thresholds seem to be
uncritical. Much more important is that the input data is normalized.
Due to the radial nature of RBFs each attribute should be distributed
over an equivalent range. Usually normalization into [0,1] is
sufficient.