next up previous contents
Next: History: the good old Up: An Introduction to Pattern Previous: Bibliography

Decisions: Neural Nets(Old Style)

In the second chapter I suggested ways of turning an image or a part of an image, into a vector of numbers. In the last chapter, I showed, inter alia, how to model a collection of points in ${\fam11\tenbbb R}^n$ by gaussians and mixtures of gaussians. If you have two or more categories of point (paint them different colours) in ${\fam11\tenbbb R}^n$, and if you fit a gaussian or mixture of gaussians to each category, you can use the decision process (also described in the last chapter) to decide, for any new point, to which category it probably belongs.

It should be clear that in modelling the set of points belonging to one category by gaussians (or indeed any other family of distributions) we are making some assumptions about the nature of the process responsible for producing the data. The assumptions implicit in the gaussian mixture pdf are very modest and amount to supposing only that a large number of small and independent factors are producing unpredictable fluctuations about each of a small number of `ideal' or `template' stereotypes described by the measuring process. This is, frequently, not unreasonable: if we are reading printed text, we can suppose that there are several ideal shapes for a letter /A/, depending on whether it is italic, in Roman or sans-serif font, and that in addition there are small wobbles at the pixel level caused by quantisation and perhaps the noise of scanning. There should be as many gaussians for each letter as there are distinct stereotypes, and each gaussian should describe the perturbations from this ideal. So the approach has some attractions. Moreover, it may be shown that any pdf may be approximated arbitrarily closely by a mixture of gaussians, so even if the production process is more complex than the simple model suggested for characters, it is still possible to feel that the model is defensible.

If we take two clusters of points, each described by a gaussian, there is, for any choice of costs, a decision hypersurface separating the two regions of ${\fam11\tenbbb R}^n$ containing the data, as in the figure. This hypersurface is defined by a linear combination of quadratics and hence is itself the zero of some quadratic function. In the particular case when both clusters have the same covariance matrix, this reduces to a hyperplane. If the covariance matrices are not very different, then a hyperplane between the two regions will still be a fair approximation in the region between the two clusters, which is usually the region we care about. And you can do the sums faster with an affine hyperplane, so why not use hyperplanes to implement decision boundaries? Also, we don't usually have any good grounds for believing the clusters are gaussian anyway, and unless there's a whole lot of data, our estimates of the covariance matrices and centres are probably shaky, so the resulting decision boundary is fairly tentative, and approximating it with a hyperplane is quick and easy. And for more complicated regions, why not use piecewise affine decision boundaries? Add to this the proposition that neurons in the brain implement affine subspaces as decision boundaries, and the romance of neural nets is born.

In this chapter, I shall first outline the history and some of the romance of neural nets. I shall explain the Perceptron convergence algorithm, which tells you how to train a single neuron, and explain how it was modified to deal with networks of more than one neuron, back in the dawn of neural net research. Then I shall discuss the hiatus in Neural Net research caused by, or at least attributed to, Marvin Minsky, and the rebirth of Neural Nets. I shall explain layered nets and the Back-Propagation algorithm. I shall discuss the menagerie of other Articial Neural Nets (ANNs for short) and indicate where they fit into Pattern Recognition methods, and finally I shall make some comparisons with statistical methods of doing pattern Recognition.

There was an intereresting exchange on the net in 1990 which concerned the relative merits of statistical and neural net (NN) models. Part of it went as follows:

` ... I think NNs are more accessible because the mathematics is so straightforward, and the methods work pretty well even if you don't know what you're doing (as opposed to many statistical techniques that require some expertise to use correctly)' ......

However, it seems just about no one has really attempted a one-to-one sort of comparison using traditional pattern recognition benchmarks. Just about everything I hear and read is anecdotal. Would it be fair to say that ``neural nets'' are more accessible, simply because there is such a plethora of `sexy' user-friendly packages for sale? Or is back-prop (for example) truly a more flexible and widely-applicable algorithm than other statistical methods with uglier-sounding names? If not, it seems to me that most connectionists should be having a bit of a mid-life crisis about now.'

From Usenet News, comp.ai.neural-net, August 1990.

This may be a trifle naive, but it has a refreshing honesty and clarity. The use of packages by people who don't know what they're doing is somewhat worrying, if you don't know what you're doing, you probably shouldn't be doing it, but Statistics has had to put up with Social Scientists (so called) doing frightful things with SPSS and other statistical packages for a long time now. And the request for some convincing arguments in favour of Neural Nets is entirely reasonable and to be commended. The lack of straight and definitive answers quite properly concerned the enquirer.

The question of whether neural nets are the answer to the question of how brains work, the best known way of doing Artificial Intelligence or just the current fad to be exploited by the cynical as a new form of intellectual snake oil, merits serious investigation. The writers tend to be partisan and the evidence confusing. We shall investigate the need for a mid-life crisis in this chapter.



 
next up previous contents
Next: History: the good old Up: An Introduction to Pattern Previous: Bibliography
Mike Alder
9/19/1997