Returning now to the principal thread of static Pattern Recognition, and armed with the righteousness that comes from having read up a little algebra and the attendant feeling of virtue, we contemplate a few uncomfortable facts.
The first is the practical difficulty
attached to segmenting before measuring and classifying.
It is
common to explain to outsiders something of the
difficulties of
pattern classification and have them explain
to you that standard function fitting methods
will work.
When you draw a `5' and do so with the horizontal
stroke not
connected, and ask how they propose to deal with
this
case, they shrug in irritation and assure you
that these little
details may be taken care of by a variety of means,
none requiring
intelligence. They hint that simple bottom-up
procedures, involving
minor extensions in the direction of existing
line
segments, will take care of the difficulties.
They are wrong.
Their eyes tell them it is perfectly in order
to make a join in
some cases but not in others, but close inspection
reveals that they
have already made the classification by this point.
It is not done bottom-up at all; or if it is, there
are elements, are smaller than characters but bigger than
pixels, which are the basis of recognising a `5'.
Segmentation is not, in short, something to be done independent of the classification process. But this seems to provide us with a chicken and egg situation. How do we decide when things belong together in order to feed them into a classifier, without classifying them? This is a very profitable question to think about, and it is left to the reader to do so.
The second is more prosaic and less abstract. Contemplate the simple diagram, of two data sets in the plane, shown in figure 8.1. I have labelled the two classes as noughts and crosses out of a fondness for the game known to the English Speaking World as Noughts and Crosses, and to the American Speaking World as Tic-Tac-Toe.
Now focus on the point at the dot belonging to the question mark. Would you expect this to be a nought (O) or a cross (X)? Note that although there is no answer to this question, since I am the one who generated the data and I am capable of anything, most people would agree that the category they would be most inclined to expect is a cross, despite the fact that every method of pattern classification we have discussed would give the opposite answer. The same holds for fig.8.2, where most right thinking people would expect the ? to be a cross, despite the fact that its closest neighbours are mostly noughts.
The fact that the automatic systems discussed so far would all give different results from the eye of a child, merits a little thought. It suggests something essentially circumscribed about our methodology.
It is, of course, easy enough to devise a modification to the existing methods which will give the `right' answer. This is to miss the point. We aren't looking for a quick fix; we thought we had the business of pattern recognition reduced to finding clusters and suddenly we discover that life is much harder than that. So we need a fix that will work not just on the two figures shown, but on lots more which may be sprung on us any minute now. We need to ask ourselves what it is about the data sets that means our simple systems don't work.
Contemplate a very similar situation: The word `profes5or', where the `5' is replaced by an `s', despite the low level data suggesting that `5' is what is there. In all cases, there is a low level or neighbourhood expectation, and a higher level of structure to the data which is given precedence by the human eye. In the case of `profes5or', the level above the letter level is the word level. In the case of fig.8.1 the level above the local is the alternating or chess-board pattern to the data. In the case of fig.8.2 it is the spiral curves which are extrapolated by the eye. I claim that in all cases, it makes sense to say that there are two levels of structure and that the higher level is accorded precedence in the human decision as to which class should be assigned to a new point. I claim that just as there is a process of UpWriting strings of letters into words, there must be a way of UpWriting sets of points in the plane into something else, so that we have some basis for making decisions at a different level. I claim that the intuitive idea of levels of structure, which can be shown to make good algorithmic sense for natural language can also be found to make sense in the case of more general, geometric data.