next up previous contents
Next: Mathematical Morphology Up: Image Measurements Previous: Generalities

Image segmentation: finding the objects

How can we tackle the problem of writing a program which will read the text of Fig.2.2.? When I say `read the text', I mean of course that the input will be an image file, such as that from which Fig.2.2. was extracted, and the output will be a sequence of ASCII characters, possibly with some additional formatting information to say where the characters are on the page. This is the inverse function to a printer, which takes an ascii file and turns it into black marks on white paper; together with a scanner, a text reading system inverts that.

This is a clear case of a Pattern Recognition problem of an eminently practical type. Typically image files are very big, and text files which contain the data from which an image of text was obtained by printing are very much smaller. Also the latter can be read into your favourite word processor and the spelling checked, something which cannot be done with an array of pixels.

There are clearly pre-processing problems which must be gone through in order to identify lines of text in the image, and to work out where the characters actually are. These are not negligible and something must be said about them.


 
Figure 2.3: Postcode digits 6512
\begin{figure}
\vspace{4cm}
\special {psfile=patrecfig2.3.ps}\end{figure}

The general problem of image segmentation, chopping an array of pixels up into different bits, is fraught with difficulty: It is easy to run two letters together in an image so that they overlap, and for the eye to still be able to read them as two separate things. A program which kindly lumped the two together as one, would not make automatic recognition any easier. The proposition that the objects are going to come neatly separated by white spaces around them is simply not true. The postcode in Fig.2.3 is handwritten, it is easy to read, but segmenting it automatically is best done in hindsight once you have worked out what the characters are!

By choosing to deal with printed text of sufficiently high quality we can make our lives much simpler, but there has to be something intrinsically unsatisfactory about a method which cannot generalise to cases which human beings find easy.

With the cautionary note out of the way, and bearing in mind for later chapters that there has to be a better method, let us proceed to examine the standard means of proceeding with the image of Fig.2.4, which is a somewhat cleaner and larger version of Fig.2.2. Anyone who can identify the source of the text is entitled to a prize, but already has it.


 
Figure 2.4: A bigger and better sample from the same book.
\begin{figure}
\vspace{6cm}
\special {psfile=patrecfig2.4.ps}\end{figure}

First we find the lines of text. Moving a long way from Fig.2.4 and squinting at it with eyes half closed, one sees more or less horizontal bands where the words blur together; these are our lines of text. One way of finding them is to `fuzz' the image by what are known as Mathematical Morphology techniques. The ideas of Mathematical Morphology are very simple (and usually made difficult by being expressed in formalism, a favourite method by which esoterrorists try to impress the innocent). We proceed to elucidate the easiest case; references to more comprehensive accounts may be found in the bibliography at the chapter's end.



 
next up previous contents
Next: Mathematical Morphology Up: Image Measurements Previous: Generalities
Mike Alder
9/19/1997