next up previous contents
Next: Summary of Chapter Five Up: Other types of (Classical) Previous: References

Quadratic Neural Nets: issues

The MLP cuts up the space of a finite binary data set into pieces by hyperplanes. This allows us to fit functions from a family so as to get agreement on the function sample defined by the data set, and the algorithmic process consists of shuffling hyperplanes about a space until they cut the data up in a satisfactory way.

If you were to be given a two dimensional problem with data consisting of noughts and crosses or black circles and white circles and invited to place a collection of hyperplanes judiciously so as to separate the classes, you could probably do it faster than a computer could do it using a neural net. For some easy problems such as the double spiral data set you could do it in a few seconds, while it is likely to take mini-supercomputers a few weeks using Back-Propagation for three layer nets. Even given the remarkable powers of the eye-brain combination, this does not inspire much confidence in Back-Prop.

Some reflection on the reasons why Back-Prop is so bad, or a little sensible experimenting, lead you to see that you are hoping that a collection of hyperplanes will fall into a nice set of positions, each complementing the others. Why should they? There is nothing at all in Back-Prop to make them. Fahlman pointed this out (see the references), and suggested more thoughtful ways of going about the job of partitioning data than taking random walks in a huge state space. Whether the resulting algorithm can reasonably be called a neural net algorithm is open to dispute, but it sure out-performs Back-Prop.

A little more reflection suggests that doing it with hyperplanes is pretty dumb anyway. If you are going to do it using the smallest number of units, you are likely to want to decompose your sets of points into a union of convex sets, as in the case of the articulated four layer net. Why not start off with convex sets and plonk them onto the data? Specht did this for spheres, why not use ellipses in two dimensions or hyperellipsoids in general? They are all specified by a positive definite quadratic form, or alternatively by a symmetric positive definite matrix. That's no big deal, since defining a simplex, the simplest closed convex shape you can make out of hyperplanes, in dimension n takes n+1 hyperplanes, each of which takes n numbers. This is about twice the amount of information it takes to specify a quadratic form.

Which leads to a number of questions, one of which might be: what, if anything, does any of this have to do with neurons in people's heads? Are quadratic forms more plausible than affine discriminants?

I shall argue later that the answer is `yes'.

The next question is, If you want to tell noughts from crosses in the double spiral problem, for instance, how do ellipses help you? The answer is that you have two species of ellipses, red ellipses for the noughts and green ones for the crosses. Then you attract the red ellipses in towards the nought data, the green one toward the cross points, and they totally ignore each other. If your rule of attraction is judiciously chosen, you wind up with the answer to the next question, how do you train the quadratic neurons? How can I attract them towards the data? I'll go into that a little later on, too. It requires a little thought.

Fig. 5.25 shows a partial solution to the double spiral problem which may be obtained in times four or more orders of magnitude faster than may be found using the classical piecewise affine neural nets, that is to say it's at least ten thousand times faster. The artist got bored of drawing vile shaped ellipses and so only put them on the O's, leaving the reader to draw some on the X's if he feels so inclined, after which a small child may be invited to colour them in, yielding an image much like one of those on disk.


  
Figure 5.25: A partial solution to the Double Spiral Data Set.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig5.18.ps}\end{figure}

A case may be made out, and indeed will be made, for the proposition that real, live, wetware neurons do something rather like fitting quadratic or higher order forms to data. The effect of this may be something not too different from an adaptive Probabilistic Neural Net, and it tends to suggest that neurons are actually making local estimates of the density of data. This is an important conception of the function of a neuron, so deserves some extended exegesis. This will be done in a later chapter.

For the present, it suffices to observe that if data comes in through sensors in a stream, extracting information may require an adaptive process somewhat different from usual statistical methods for computing model parameters. Moreover, although a gaussian pdf is determined by a quadratic form, a quadratic form may be used to define other pdfs or may be used as are hyperplanes simply as delimiters of different regimes.

This way of looking at things unifies some otherwise disparate matters. One satisfying feature of Quadratic neural nets is that enclosing points of different categories by hyperellipsoids rather than by hyperplanes makes Neural Net methods look a whole lot more like statistics and gaussian mixture modelling, as well as being many thousands of times faster on many problems. A second is that classifying different types of point by having different categories of model neuron responding to them, reduces classifying to multiple clustering. So function fitting and clustering have at least some common elements from a neural net point of view.

After I've told you how to make positive definite quadratic forms, otherwise hyperellipsoids, jump through hoops and find data points, you will never want to use Back Propagation with Multi-layer Perceptrons for classifying data ever again. This will save you a lot of computer time.


next up previous contents
Next: Summary of Chapter Five Up: Other types of (Classical) Previous: References
Mike Alder
9/19/1997