next up previous contents
Next: Alternative Representations Up: Basic Concepts Previous: Dynamic Patterns

Structured Patterns

The possibility that instead of having a single point in ${\fam11\tenbbb R}^n$ to classify we shall have a trajectory is only the tip of an iceberg. The temporal order is about the simplest that can be imposed on a set of points in ${\fam11\tenbbb R}^n$, but it is far from being the only one.


 
Figure 1.12: Structured objects in an image
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig1.9a.ps}\end{figure}

To see another possibility, contemplate the problem of recognising an image of a line drawing of a cube and distinguishing it from an image of a line drawing of a pyramid. The approach suggested so far would be to find some measurement operation on the image which would do the job. This is obviously possible. If we were to count edges in some way, that would solve the problem without even having to worry about which edges joined to which.

The trouble is, it requires the pattern recognising human to choose, for each geometrical object, some measurement process specific to the objects to be recognised. This is currently how things are done; when somebody writes a program to recognise chinese characters, he sits and thinks for a while about how to make some measurements on them so as to give some resulting point in ${\fam11\tenbbb R}^n$ for each character, or if not a point in ${\fam11\tenbbb R}^n$, some other kind of representation. Having done this for the kinds of objects he is interested in classifying, he then tries to automate the process of producing the point or other symbolic representation describing the original object, and then he sets about writing a program to classify the representations.

The process which chooses the representation is in the head of the programmer, the program does not make the decision for him. This makes a lot of pattern recognition rather slow; it provides employment for any number of hackers, of course, which is something hackers all think is a good thing, but it plainly isn't particularly satisfactory to the thinkers among us. It looks to be something which can be and ought to be automated; the approach would have to be concerned with extracting some information about how parts of the object are built up out of sub-parts, and suitably coding this information.

A related issue is the scene analysis problem, when one image contains several subimages each of which is required to be recognised. In Optical Character Recognition (OCR) for instance, one is usually given a page with rather a lot of characters on it, and the first task is usually to segment the image into bits. This can be very difficult when the objects touch. A photograph of your dear old Grandmother in front of the house does not usually stop you recognising both granny and the house.

One might hope that it is possible to say how some pixels aggregate to form lines or edges, some edges aggregate to form corners, some corners aggregate to form faces, and some faces aggregate to form a cube, or possibly a pyramid. Similarly, an image of an aeroplane is made up out of subimages of tail, wings and fuselage; an image of a face is made up out of a nose, eyes and a mouth.

The arrangement of the bits is crucial, and is easily learnt by quite small children. Counting the number of people grinning out at from a photograph is easy for the small child. The hacker confronted with a problem like this usually counts eyes and divides by two, getting the wrong answer if there are grapes in the picture, or if Aunty Eth is hidden behind Uncle Bert except for the hat. His program can be thrown off by the shine on someone's spectacles. When it works, it has been developed until it contains quite a lot of information about what the programmer thinks faces are like. This can take a long time to get into the machine, and it can be wrong. It would be necessary to start all over again if you wanted to count lumps of gravel or blood cells. It is clear that human beings don't have to learn by being told everything in this way; they can figure out things for themselves. It would be nice if our programs could do the same, if only in a small way. This can in fact be done, by methods which may perhaps mimic, abstractly, the central nervous system, and I shall describe them in later chapters under the heading of Syntactic Pattern Recognition.


next up previous contents
Next: Alternative Representations Up: Basic Concepts Previous: Dynamic Patterns
Mike Alder
9/19/1997