next up previous contents
Next: Probabilistic Neural Nets Up: Other types of (Classical) Previous: General Issues

The Kohonen Net

Inspired by some results obtained in Neurophysiological studies, where it has been shown that there is a mapping between the world and adjacent neurons in the human cerebral cortex, Kohonen in the 1980's showed how a set of neurons could plausibly be constructed so as to adapt to extract the shape of a data set. It is important to understand that it works only for data sets which lie on curves, surfaces, or in general manifolds embedded in ${\fam11\tenbbb R}^n$. I make a brief diversion to say what an embedded manifold is.

The figure 5.16 shows something that could arise in an image. The eye picks out three distinct curves.


  
Figure 5.16: Three curves in the plane.
\begin{figure}
\vspace{8cm}
\special {psfile=5.curves.ps}\end{figure}


  
Figure 5.17: Three curves in the plane?
\begin{figure}
\vspace{8cm}
\special {psfile=5.dotcurves.ps}\end{figure}

The following figure, 5.17, is a discrete version of its predecessor. It is most simply considered as being a finite set of points which has been drawn from three underlying continuous curves. This is closer to realism than the preceding figure, since data sets in real life are finite.

To say that there are three curves underlying the data of figure 5.17 is to claim that there are three `clusters' in the data. The allocation of which points belong to which of the three underlying curves is very simple to the eye, except in the case where the curves intersect, and there is only limited scope for disagreement here. So we have a kind of `cluster', picked out easily by the eye, which is very different from the sense of `cluster' which the reader may have formed after the discussion on gaussian mixture models. Nevertheless, the eye sees three distinct `objects' in the image. The points may be assigned one of three different labels.

Defining the notion of a `smooth curve in the plane' is best done by saying that it is a set of points which is the image of a twice differentiable map from the unit interval, [0,1], into ${\fam11\tenbbb R}^2$. In order to ensure that it is not a space filling curve and that its existence may be inferred from a finite sample, we may stipulate that there is some bound on the curvature of any such map.

Curves are often specified in mathematics implicitly, that is to say as zeros of functions, for example:

\begin{displaymath}
\{ (x,y) \in {\fam11\tenbbb R}^2: x^2 + y^2 -1 = 0 \} \end{displaymath}

which is the unit circle. Note that we have here a smooth map from ${\fam11\tenbbb R}^2$ to ${\fam11\tenbbb R}$ which imposes one condition on the two variables, thus leaving a one dimensional object, the circle.

All this also applies to curves in ${\fam11\tenbbb R}^n$, and also to surfaces in ${\fam11\tenbbb R}^n$. In implicit representations, we often have k smooth conditions on n variables, which may (but need not) give rise to a smooth (n-k) dimensional manifold.

Without going into too much undergraduate geometry, if every sufficiently small open ball in ${\fam11\tenbbb R}^n$ intersecting a set U does so in a region which can be put into a smooth 1-1 correspondence with an open ball in ${\fam11\tenbbb R}^q$, then we say that U is a q-dimensional manifold embedded in ${\fam11\tenbbb R}^n$.Since there are often constraints of a physical sort on the variables which have been obtained by taking measurements of a system, it is often the case that data sets lie on embedded manifolds. For example, the vocal tract can vary, for an individual speaker, only in a limited number of possible constriction points, and this gives a two dimensional vowel space. If we measure filter bank values to get a discrete representation of the log power spectrum of speech, we may be taking 16 measurements. The vowel space for a speaker may be a two dimensional manifold embedded in this sixteen dimensional space.

The Kohonen process tries to fit a (finite) data set in ${\fam11\tenbbb R}^n$ by moving a q-dimensional grid towards it. The grid can be thought of as a finite approximation to Iq, where I is the unit interval. Iq is called a q-cube. The following example is done in dimension one, but the idea is the saem for all dimensions.

Suppose that we have a set of points in the plane that form a curve, lying, for example roughly equally spaced along the curve y = x2 for definiteness. Suppose we have a line of model neurons arranged initially at random in the same space. Let the line consist of `neurons' $\{n_1,n_2, \cdots n_k \}$,where each neuron is simply a point in the space, but where the consecutive points are imagined as joined by line segments.


  
Figure 5.18: Initial (random) state of Kohonen network.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig5.15a.ps}\end{figure}

Present a point in the data set, xj to the network. The Kohonen rule is to move the closest point of the set of neurons towards the calling datum, but also to move, by a smaller amount, the adjacent neurons. Adjacent not in the sense of being close in the plane, but in the sense of being neighbours in the ordering of the neurons, not at all the same thing in general. We can imagine the ordered line of neurons as being, initially, badly tangled by its random embedding in the plane.

If this process is iterated, with different choices of the data points selected to do the calling, the tangle of neurons gradually becomes straightened out and the line is attracted towards the data points and becomes an approximation to it. The two senses of `close', close in ${\fam11\tenbbb R}^2$ and close in the topology of the space of neurons, become similar. Mathematically speaking, the cleanest way to look at this is to think of the tangled grid of neurons as being the image of a continuous map, f0 from the standard k-cube into ${\fam11\tenbbb R}^n$, or in practice from some finite approximation to such a k-cube consisting of a k-dimensional grid. Then a single point of the data set in ${\fam11\tenbbb R}^n$ is presented to the system, and the map is modified by being made to move the points of the k-cube closer to the calling point, in such a way as to preserve the continuity of the map. Thus we have the whole data set acting on the continuous maps from the k-cube into ${\fam11\tenbbb R}^n$ so as to make them fit the data set. The sequence of maps should converge to some satisfactory map eventually, if the algorithm is right.

Kohonen wrote of the `order' being preserved, although it could also be reversed by the process, if one takes it that the data set is also ordered. It is simpler to note that the mapping from the line of neurons to the line of data points is locally 1-1 or is an immersion in the jargon of the topologists. At least it would be if we had an infinite number of neurons which really lay along a line.

It is not difficult to generalise to more complicated objects than curves being fitted with a linear grid. If one has a set of data points in ${\fam11\tenbbb R}^n$, it may be that they form a random cluster in the space, or it may be that they form a lower dimensional manifold. For example, they could be a swarm of flies all of which are sitting on a large inflated balloon; in this case the dimension of the surface of the balloon is two, less than the original three in which the flies are free to move. A fly which chooses to move only on the surface of the balloon is giving up a whole dimension, just like Obi Wan Kenobi in the Star Wars epic.

The set of points on the surface of a spherical balloon constitute a manifold, and the flies form a finite subset of this manifold. One can imagine linear submanifolds of ${\fam11\tenbbb R}^n$ or curved ones, as in the case of the balloon. If there are enough flies, it is easy for the eye to discern the lower dimensional structure, or at least the eyes, since some parallax is necessary before it can be seen. A manifold of dimension k is a generalisation of the idea of a surface such as a balloon which is technically described as a 2-manifold. A k-manifold is said to be parametrised when there is some collection of k-dimensional disks which are mapped to it so as to cover it without folding. Conceptually like sticking postage stamps in order to cover a spherical balloon, such parametrisations are usually given by explicit maps, an example being the map that sends the interval $(-\pi, \pi )$ onto the unit circle by sending t to $(\cos(t), \sin(t))$. This covers every point of the circle except the point (-1,0), so we send another copy of $(-\pi, \pi )$ to the circle by the map that sends t to $(-\cos(t), \sin(t))$,which has a lot of overlap with the first map. Each map separately is 1-1 and infinitely differentiable, we have covered or parametrised the circle with a pair of one dimensional disks (open intervals). A Sphere could likewise be parametrised with a couple of open 2-disks, and it is an easy exercise to find a couple of explicit maps from the interior of D2 to ${\fam11\tenbbb R}^3$ which give a parametrisation of the 2-sphere.

More generally, then, if we have a set of data points in ${\fam11\tenbbb R}^n$, lying on a manifold of dimension m, and if we have a grid of model neurons arranged to lie in a cube of dimension q, the Kohonen algorithm may be easily modified to attract the grid to the data and to do as good a job of fitting the data as possible. If q < m then it will resemble the case of a line of neurons trying to fit itself onto a plane, and a deal of folding will be necessary and the final fit less than adequate; if q > m then it will resemble the case of a square being attracted to a curve, and a good deal of buckling will be necessary, but if q = m then a relatively good fit may be obtained. The general property that the q-cube fits itself without local folding to a q-manifold may be shown to hold, although the term `order preserving' is inappropriate here. Experiments show that manifold structure can be extracted auomatically by this means. See the references for Dimension of the Speech Space as a case in point, and Kohonen's algorithm for the numerical parametrisation of manifolds for a discussion of the essential properties of the Kohonen algorithm. It is accomplishing, for a finite point set which comes from a manifold, a sort of parametrisation of part of the manifold, given by a map of the q-disk (or q-cube, or grid) into the containing space.

We can see this as a sort of structured clustering: if there is a manifold structure instead of a gaussian cluster structure this system will find it. But what if both occur? As soon as one starts to think about the possibilities for what sorts of shaped subsets of ${\fam11\tenbbb R}^2$ or ${\fam11\tenbbb R}^3$ we can have made up out of finite point sets, one starts to boggle a bit at the prospect of using any one system. And ${\fam11\tenbbb R}^n$ for higher n leaves the imagination behind fast.

The general device of finding a point (e.g. a zero of a function) by making the point an attractor of a dynamical system goes back to Newton and the Newton-Raphson method. In the present case, the attractor is a set of points rather than just one, and some aspects of the topological structure is being extracted. It is a method which has some charm, and is suceptible of a range of applications; one of the few cases where an idea from the study of Artificial Neural Nets has applications outside its original domain. Chaos theorists are often presented with finite point sets in ${\fam11\tenbbb R}^n$ which purport to be from a strange attractor, and trying to extract its dimension (not in general an integer, since attractors don't have to be manifolds and usually aren't) so the study of point sets and their shapes is flavour of the month in Dynamical Systems theory as well as for us humble Pattern Recognisers.[*]

The applications to Pattern Clasification are more doubtful however, and the differences between the piecewise affine neural nets and the Kohonen nets are large. But the basic idea has novelty and is not immediately obvious. There are lots of ways of extracting linear or affine strucure in data, but when it is non-linearly arranged in the space, there are few methods available and the Kohonen process is one of them.


next up previous contents
Next: Probabilistic Neural Nets Up: Other types of (Classical) Previous: General Issues
Mike Alder
9/19/1997