The reflective reader will, perhaps, have been
turning to the not
so silly question of how he or she tells men from
women.
Or to put it another way, looking at the clusters
of points in
Fig.1.2., if instead of having labelled one set
as X points for
males and O points for females, suppose we had
just drawn
unlabelled points as little black dots:
could a program have looked at the data and seen
that there are two
populations? It seems reasonable to
suppose that the reader, with somewhat different
sensory apparatus,
has some internal way of representing human beings
via neurons, or
in wetware as we say in the trade, and that this
shares
with
the capacity for coding resemblance
or similarity in terms of
proximity. Then the dimension may be a little
higher for you, dear reader,
but most of the problem survives.
There is, undeniably, a certain amount of overlap between the two clusters when we measure the weight and height, and indeed there would be some overlap on any system of measurement. It is still the case however (despite the lobbying of those who for various reasons prefer not to be assigned an unambiguous sex) that it is possible to find measurement processes which do lead to fairly well defined clusters. A count of X and Y chromosomes for example. Given two such clusters, the existence of the categories more or less follows.
One of the reasons for being unhappy with the neural net model we have described is that it is crucially dependent on the classification being given by some external agent. It would be nice if we had a system which could actually learn the fact that women and men are distinguishable categories by simply noticing that the data form two clusters.
It has to be assumed that at some point human beings learn to classify without any immediate feedback from an external agent. Of course, kicking a neuron when it is wrong about a classification, and kicking a dog when it digs up the roses have much the same effect; the devastation is classified as `bad' in the mind of the dog, or at least, the owner of the rose bush hopes so. It is not too far fetched to imagine that there are some pain receptors which act on neurons responsible for classifying experiences as `good' and `bad' in a manner essentially similar to what happens in neural nets. But most learning is a more subtle matter than this; a sea anemone `learns' when the tide is coming in without getting a kick in the metaphorical pants. Mistaking a man for a woman or vice versa might be embarrassing, but it is hard to believe you learnt the difference between men and women by making many errors and then reducing the average embarrassment, which is how an artificial neuron of the classical type would do it.
Learning a value, a +1 or -1 for each point
in some set of points,
and then being asked to produce a rule or algorithm
for guessing the value at some new point, is most
usefully
thought of as fitting a function to a space when
we know its value
on a finite data set. The function in this case
can take only
binary values,
, but this is not in principle
different
from drawing a smooth curve (or surface) through
a set of points.
The diagram Fig.1.11. makes it clear,
in one dimension, that we are just fitting a function
to data.
This perspective can be applied to the use of nets in control theory applications, where they are used to learn functions which are not just binary valued.
So Supervised Learning is function fitting, while Unsupervised Learning is cluster finding. Both are important things to be able to do, and we shall be investigating them throughout this book.