The chapter you have just read was a general survey of the ideas and issues in Pattern Recognition, and may have left you feeling a little insecure. After all, this is a Mathematics course in those parts of mathematics useful for Information Technology, and there was hardly any mathematics in the first chapter. Well, actually there was some mathematics, but it was said in English instead of algebra, and you may not have recognised it.
From now on we shall be dealing with more concrete objects and for the present we shall concentrate on image recognition. This is part of the plan to say something about how robots can be made to see the world.
In this chapter we shall consider various kinds of images such as
might be produced by a camera or scanner, and the methods which
may be employed to code them as points in
so as to
accomplish recognition of the objects in the images. The
underlying assumption is that a robot is looking at some
collection of objects and needs to make decisions based on
a pre-defined human classification of the types. The robot
has to analyse a set of pixels in order to make a
classification. In the next chapters I shall discuss
statistical and ANN methods of recognition of the resulting
vectors, but for now I attend to the matter of how to extract
vectors from static images. Some of the algorithms will be
described algebraically, reassuring you if you feel that the
medium is the message, but some will be given in English. Simple
algorithms can be taken from English into C without passing through
algebra on the way, and it is often easier to understand why they work.
Concntrating on images obtained from a video-camera or scanner may strike you as a bit limited, since there are other sources of information than cameras. On the other hand, the general problems are best approached through concrete instances, and much of the conceptual apparatus required for thinking about measuring images is capable of extension to thinking about measuring other things.
The process of performing an operation on an image to obtain a vector of numbers is often referred to as feature extraction. Each number in the vector is sometimes said to describe a `feature'; this somewhat mystical terminology might leave you wondering what a `feature' actually is, and the answer would appear to be that anything you can do to an image to get a number out is detecting a `feature'. In various parts of the literature, a `feature' may mean a dark pixel, a bright pixel, a lot of bright (or dark) pixels, an edge, or some combination of the above, as well as other quite different things. Some of this confusion will be resolved in the chapter on syntactic pattern recognition.
Because I don't know what a feature is and therefore feel shy about using the term, I shall simply talk about making measurements on an image (or indeed any other objects), and a suite of measurements on such an object produces a sequence of numbers. A measurement of an object is any operation performed on the object which yields a real number.
An integer or boolean value is treated as a particular case of a real number for these purposes, but we shall have a preference for real real numbers, expressed, of course, to some finite number of decimal places.
For most of this chapter, I shall deal with binary images, that is to say images where the pixel is either black (0) or white (1) because the recognition problems are sharpest here. Colour images or monochrome greyscale images are, of course, more common, but the recognition problems become overshadowed by the image processing problems- which belong in a different book. In the later sections I shall discuss the complexities introduced by having greyscale and colour images to recognise.
Since there are an awful lot of different things which can be found in an image, even a binary image, we approach the subject by stages, dealing with the complications progressively. I shall focus on a particular problem so as to concentrate the mind; I shall investigate the problem of Optical Character Recognition, so as to sharpen the issues to concrete cases. Virtually everything that applies to recognising characters applies to other types of binary image object, except that the other objects are usually harder to deal with. It might be useful to contemplate Chinese or Arabic characters occasionally, or a handful of nuts and bolts in silhouette, so as to avoid parochialism.
Quite a lot of what I shall describe here is treated in books on Image Processing, and parts of what is done on books on Machine Vision or Computer Vision, and there is some intersection with the material in the Master's course on Computer Vision.
This is not a book on Image Processing, but it will be necessary to outline the ideas and relevance of Image Processing material. My treatment will therefore be sketchy, and the bibliography should be used to fill out the details. Since many books have been devoted to what is covered in this chapter, you may take it that the treatment will be lacking in that depth which the intelligent reader quite properly craves; in exchange you have here the opportunity to get an overview of what the stuff can be used for. An honest attempt to do the Exercises at the end of the chapter, if necessary with the aid of some of the books referred to in the bibliography, will provide a crash introductory course in Image Processing. This is not a substitute for the Computer Vision course, but a complement to it.
The first chapter gave an overview from such an exalted height that the reader may have been left feeling a touch of vertigo. It is now necessary to take one's feet down, pour the drink into the cat's bowl, lace up one's running shoes and sober up. A certain amount of the nitty-gritty is waiting further down the alley with a sand-filled sock.