next up previous contents
Next: Exercises Up: Decisions: Neural Nets(Old Style) Previous: Quadratic Neural Nets: issues

Summary of Chapter Five

We started the chapter with a brief and possibly prejudiced account of some of the history of neural nets. We raised the issue of the extent to which they could accomplish wonders beyond the capacity of statistical methods, or whether connectionists should be having a mid-life crisis. We continued by first studying the alpha perceptron or single unit neural net, and proceeded by choosing to investigate the geometry of committee nets (three layered neural nets, in the conventional usage) and then of articulated nets with an extra layer. This allowed us to gain some insight into the principles of chopping up, by hyperplanes, the measurement space for a pattern classification task. We also constructed some simple algorithms which extended the perceptron convergence algorithm for a single unit to the case of these three layer and four layer nets. The Back-Propagation algorithm could then be seen in perspective: it was easy to see what it did and how it did it, and to perceive its practical limitations. The Back-Prop algorithm was outlined; the details of the algorithm were brushed off in rather an off-hand manner with a referral to the literature on the grounds that (a) clever people like you and me could now work it out from first principles if that looked like a good idea and (b) it looked a pretty bad idea. A discursion into function fitting followed, again with the intention of casting doubts on the standard procedures involving multi-layer perceptrons for doing this job, and shaking any ingenuousness the reader might have about the value of the contribution of pure mathematicians, who are as desperate to publish anything that might be mistaken for an article of practical value as any of us, and a good deal worse equipped to recognise practical value if they should chance to meet it.

The next section considered the value of Feed Forward MLP's from the point of view of their effectiveness as models from the Rissanen data compression perspective. I argued that an MLP accomplishing a binary classification was using hyperplanes to store only the category of the data and said nothing about where the data is in the space. It is thus easy to compute the compression effected by a neural net of this type and compare it with the null model to see if it is actually extracting information about the disposition of the points of a category. We concluded that the number of data points must be more than nkP in order for the model to have any value, where n is the dimension of a committee net, k the number of units and P the precision, some value between 2 and 64 which was in principle data dependent. The situation was worse for more complex net topologies. This suggested a certain element of frivolity in some artificial neural net publications.

A brief survey of neural nets other than MLP's was conducted, the thrust being to expose the underlying ideas rather than to specify the algorithms in any great detail. It was considered sufficient to describe what the things actually did and outline how they did it, because this is often enough to put one off the whole idea of getting involved with them.

Finally, quadratic neural nets were introduced in order to show that they did what some of the others did, and did it more intelligently and consequently much faster. The case was weakened by omitting details which, it was promised, would emerge later.


next up previous contents
Next: Exercises Up: Decisions: Neural Nets(Old Style) Previous: Quadratic Neural Nets: issues
Mike Alder
9/19/1997