Inspired by some results obtained in Neurophysiological
studies, where it has been shown that
there is a mapping between the world and adjacent
neurons in the human cerebral cortex, Kohonen
in the 1980's showed how a set of neurons could plausibly
be constructed so as to adapt to extract the
shape of a data set. It is important to understand that
it works only for data sets which lie on curves, surfaces, or
in general manifolds embedded in
. I make a brief
diversion to say what an embedded manifold is.
The figure 5.16 shows something that could arise in an image. The eye picks out three distinct curves.
The following figure, 5.17, is a discrete version of its predecessor. It is most simply considered as being a finite set of points which has been drawn from three underlying continuous curves. This is closer to realism than the preceding figure, since data sets in real life are finite.
To say that there are three curves underlying the data of figure 5.17 is to claim that there are three `clusters' in the data. The allocation of which points belong to which of the three underlying curves is very simple to the eye, except in the case where the curves intersect, and there is only limited scope for disagreement here. So we have a kind of `cluster', picked out easily by the eye, which is very different from the sense of `cluster' which the reader may have formed after the discussion on gaussian mixture models. Nevertheless, the eye sees three distinct `objects' in the image. The points may be assigned one of three different labels.
Defining the notion of a `smooth curve in the plane' is best
done by saying that it is a set of points which is the
image of a twice differentiable map from the unit interval,
[0,1], into
. In order to ensure that it is not
a space filling curve and that its existence may be inferred
from a finite sample, we may stipulate that there is some
bound on the curvature of any such map.
Curves are often specified in mathematics implicitly, that is to say as zeros of functions, for example:
![]()
All this also applies to curves in
, and also to surfaces
in
. In implicit representations, we often have k
smooth conditions on n variables, which may (but need not)
give rise to a smooth (n-k) dimensional manifold.
Without going into too much undergraduate geometry, if every
sufficiently small open
ball in
intersecting a set U does so in a region which
can be put into a smooth 1-1 correspondence with an open ball
in
, then we say that U is a q-dimensional manifold
embedded in
.Since there are often constraints of a physical sort on the
variables which have been obtained by taking measurements of a
system, it is often the case that data sets lie on embedded
manifolds. For example, the vocal tract can vary, for an individual
speaker, only in a limited number of possible constriction points,
and this gives a two dimensional vowel space. If we measure filter
bank values to get a discrete representation of the log power
spectrum of speech, we may be taking 16 measurements. The vowel
space for a speaker may be a two dimensional manifold embedded
in this sixteen dimensional space.
The Kohonen process tries to fit a (finite) data set in
by moving a q-dimensional grid towards it. The grid
can be thought of as a finite approximation to Iq, where
I is the unit interval. Iq is called a q-cube.
The following example is done in dimension one, but the idea
is the saem for all dimensions.
Suppose that we have a set of points in the plane
that form a curve, lying, for example roughly
equally spaced along the curve y = x2 for definiteness.
Suppose we have a line of model neurons
arranged initially at random in the same space.
Let the line consist of `neurons'
,where each neuron is simply a point in the space,
but where the consecutive points are imagined
as joined by line segments.
Present a point in the data set, xj to the network. The Kohonen rule is to move the closest point of the set of neurons towards the calling datum, but also to move, by a smaller amount, the adjacent neurons. Adjacent not in the sense of being close in the plane, but in the sense of being neighbours in the ordering of the neurons, not at all the same thing in general. We can imagine the ordered line of neurons as being, initially, badly tangled by its random embedding in the plane.
If this process is iterated, with different choices
of the data points selected to do the calling,
the tangle of neurons gradually becomes straightened
out and the line is attracted towards the
data points and becomes an approximation to it.
The two senses of `close', close in
and
close in the topology of the space of neurons, become
similar. Mathematically speaking, the cleanest
way to look at this is to think of the tangled grid of
neurons as being the image of a continuous map,
f0 from the standard k-cube into
,
or in practice from some finite approximation
to such a k-cube consisting of a k-dimensional grid. Then
a single point of the data set in
is presented
to the system, and the map is modified by being made
to move the points of the k-cube closer to
the calling
point, in such a way as to preserve the continuity
of the map. Thus we have the whole data set acting
on the continuous maps from the k-cube into
so as to make them fit the data set. The
sequence of maps should converge to some satisfactory map
eventually, if the algorithm is right.
Kohonen wrote of the `order' being preserved, although it could also be reversed by the process, if one takes it that the data set is also ordered. It is simpler to note that the mapping from the line of neurons to the line of data points is locally 1-1 or is an immersion in the jargon of the topologists. At least it would be if we had an infinite number of neurons which really lay along a line.
It is not difficult to generalise to more complicated
objects than curves being fitted with a linear
grid. If one has a set of data points in
,
it may be that they form a random cluster in
the space, or it may be that they form a lower dimensional
manifold. For example, they could be a swarm of
flies all of which are sitting on a large inflated
balloon; in this case the dimension of the
surface of the balloon is two, less than the original
three in which the flies are free to move.
A fly which chooses to move only on the surface
of the balloon is giving up a whole dimension,
just like Obi Wan Kenobi in the Star Wars
epic.
The set of points on the surface of a spherical
balloon constitute a manifold, and the flies
form a finite subset of this manifold. One can
imagine linear submanifolds of
or curved
ones, as in the case of the balloon. If there are enough
flies, it is easy for the eye to discern the
lower dimensional structure, or at least the eyes, since
some parallax is necessary before it can be
seen. A manifold of dimension k is a generalisation
of the idea of a surface such as a balloon
which is technically described as a 2-manifold.
A k-manifold is said to be parametrised
when there is some collection of k-dimensional
disks which are mapped to it so as to cover it
without folding. Conceptually like sticking postage
stamps in order to cover a spherical balloon,
such parametrisations are usually given by explicit
maps, an example being the map that sends the
interval
onto the unit circle
by sending t to
. This covers every
point of the circle except the point (-1,0),
so we send another copy of
to the circle
by the map that sends t to
,which has a lot of overlap with the first map.
Each map separately is 1-1 and infinitely
differentiable, we have covered or parametrised
the circle with a pair of one dimensional
disks (open intervals). A Sphere could likewise
be parametrised with a couple of open 2-disks,
and it is an easy exercise to find a couple of explicit
maps from the interior of D2 to
which
give a parametrisation of the 2-sphere.
More generally, then, if we have a set of data
points in
, lying on a manifold
of dimension m, and if we have a grid of model
neurons arranged to lie in a cube of dimension
q, the Kohonen algorithm may be easily modified
to attract the grid to the data and to do as
good a job of fitting the data as possible. If
q < m then it will resemble the case of a line
of neurons trying to fit itself onto a plane,
and a deal of folding will be necessary and
the final fit less than adequate; if
q > m then it will resemble the case of a square
being attracted to a curve, and a good deal of
buckling will be necessary, but if q = m then
a relatively good fit may be obtained. The
general property that the q-cube fits itself
without local folding to a q-manifold may be
shown to hold, although the term `order preserving'
is inappropriate here. Experiments show that
manifold structure can be extracted auomatically
by this means. See the references for
Dimension of the Speech Space as a case
in point, and Kohonen's algorithm for
the numerical parametrisation of manifolds for
a discussion of the essential properties of the
Kohonen algorithm. It is accomplishing, for a
finite point set which comes from a manifold,
a sort of
parametrisation of part of the manifold, given
by a map of the q-disk (or q-cube, or grid) into
the containing space.
We can see this as a sort of structured clustering:
if there is a manifold structure instead of a
gaussian cluster structure this system will find
it. But what if both occur? As soon as one starts
to think about the possibilities for what sorts of
shaped subsets of
or
we can have
made up out of finite point sets, one starts to boggle
a bit at the prospect of using any one system.
And
for higher n leaves the imagination behind
fast.
The general device of finding a point (e.g. a
zero of a function) by making the point an attractor
of a dynamical system goes back to Newton and
the Newton-Raphson method. In the present case,
the attractor is a set of points rather
than just one, and some aspects of the topological
structure is being extracted. It is a method
which has some charm, and is suceptible of a range of
applications; one of the few cases where an idea
from the study of Artificial Neural Nets has applications
outside its original domain. Chaos theorists
are often presented with finite point sets in
which purport to be from a strange attractor,
and trying to extract its dimension (not in general
an integer, since attractors don't have to be
manifolds and usually aren't) so the study of
point sets and their shapes is flavour of the
month in Dynamical Systems theory as well as for us
humble Pattern Recognisers.
The applications to Pattern Clasification are more doubtful however, and the differences between the piecewise affine neural nets and the Kohonen nets are large. But the basic idea has novelty and is not immediately obvious. There are lots of ways of extracting linear or affine strucure in data, but when it is non-linearly arranged in the space, there are few methods available and the Kohonen process is one of them.