next up previous contents
Next: The Kohonen Net Up: Other types of (Classical) Previous: Other types of (Classical)

General Issues

It is worth contemplating again the nature of the problems which can usefully be tackled by neural nets. As mentioned in chapter one, supervised learning means function fitting. We have some quantity of data which are represented as points in ${\fam11\tenbbb R}^n$ for some n, and we have values assigned to each of these points which may be real values or may be integers representing the categories to which the points belong. A Neural Net has to be prepared to say what the value or category is for some new point it hasn't seen before. In other words, it has to assign a value to each possible point of the space which is a potential datum. Which means it is a map or function defined on the space ${\fam11\tenbbb R}^n$, or a piece of it, which agrees, more or less, with the values for the data.

Unsupervised learning, by contrast, means finding clusters in the data, deciding when two things belong in the same category without being told what the categories are. The space may be discrete, as when we are looking at 11 by 9 grids of binary pixels, the case where we give a list of different characters without saying which each is supposed to be and ask the system to pick out the different letters without knowing what they are. Or it may be a continuum. Either way we can represent the data as points in ${\fam11\tenbbb R}^n$ for some n. And inevitably we run into problems of what `close' means. Clustering is a complicated business which can be accomplished in lots of ways, with nuances I have not discussed. But the two problems although related are distinguishable, and the basic framework is worth bearing in mind in what follows.

Unsupervised learning is also referred to in the literature as the behaviour of a self organising system; I suppose there is an image, invoked by this terminology, of a chemical broth evolving through amino acids to proteins to a tribe of jelly fish. Since we are interested in winding up with a program that can respond usefully to a new datum such as an image, and are not proposing to simply watch on our computer screen a sort of Conway's Life game, evolving Truth and Beauty, as a substitute for Saturday afternoon football on television, our interest in self organising systems is confined to finding clusters in the data. What is meant by a `cluster' is left deliberately vague for the moment, but classifying human beings into males and females can be done reasonably well in the height-weight diagram by noting that two bivariate gaussians seem to fit the data in a natural way. And it isn't too unreasonable to suppose that if we measure a rather larger number of parameters by looking at and listening to them, we might find two largely discriminable clusters in the larger representation space. So although we wouldn't have the names of the categories, we would have a system set up for doing a classification or recognition job by concluding that there are some natural categorisations of the data, the clusters.

There are certainly subtleties to consider at a later stage, but it is as well to focus on these two matters first when considering what a neural net of some type accomplishes. Thus life is made simpler when one works out that the Kohonen Self-Organising Map (SOM) is finding a cluster with some sort of manifold stucture, and the ART of Grossberg is also a cluster finding system. The nets which do associative recall have the problem of mapping new points to one of the older points or some output associated with the older points on which the system has been trained.

Another distinction which may be of practical importance is the `on-line/off-line' classification. Is the data given all at once, or is it coming in serially, one datum at a time, or is it coming in in lumps of many data at once, but not all? Most statistical methodologies presuppose that all the data is there to be operated on in one pass, in order to do either the function fitting or the clustering, while most neural net methodologies assume the data is coming in sequentially. The neural netters therefore tend to use a dynamical process for generating a solution, something like a Kalman Filter, which starts with a possibly random initial estimate and then updates it as the data comes in down the line. Bayesian methods lend themselves to this kind of approach, and other methods can be adapted to it. This leads us to study dynamical systems which find clusters, or which fit functions to data.

In discussing neural nets in this way, as things which find clusters or fit functions from a family to data, we look at them from the point of view of their function rather than their form. Giving a network diagram while omitting to say what kind of thing the net is trying to do is not entirely helpful. This level of abstraction makes many people extremely unhappy; they feel insecure without concrete entities they know and love. Such people concentrate on details and fail to see the wood for the trees. Their explanations of what they are up to are confusing because it is hard to see what problem they are trying to solve or what kinds of methods they are using. Abstraction for its own sweet sake may be a sterile game for losers, but being able to figure out what kind of thing you are doing can help you do it better, and also allow you to see that two methods that appear to be different are really the same with the names changed.


next up previous contents
Next: The Kohonen Net Up: Other types of (Classical) Previous: Other types of (Classical)
Mike Alder
9/19/1997