next up previous contents
Next: Image segmentation: finding the Up: Image Measurements Previous: Image File Formats

Generalities

Let us start then by supposing the simplest sort of image, a binary array of pixels, say 256 by 256. Each pixel is either black (0) or white (1). Hardly any modern equipment actually produces such images unless you count photocopiers, and I have doubts about them. Video cameras and scanners usually produce larger arrays in many colours or grey levels. Monochrome (grey scale) cameras are getting hard to find, and scanners will be following this progression soon. So the images I shall start with are rather simple. The image in Fig.1.14 , reproduced here for your convenience, is an example of the kind of thing we might hope to meet.

Getting an image to this simple, binary state may take a lot of work. In some forms of image recognition, particularly in Optical Character Recognition (OCR) the first operation performed upon the image is called thresholding and what it does is to take a monochrome image and rewrite all the light grey (above threshold) pixels to white, and all the dark grey (below threshold) pixels to black. How to make a judicious choice of threshold is dealt with in image processing courses, and will not be considered here. Although a little unphysical, because real life comes with grey levels[*], the binary image is not without practical importance.


 
Figure 2.1: A handwritten word.
\begin{figure}
\vspace{6cm}
\special {psfile=patrecfig2.1.ps}\end{figure}

The result of digitising and thresholding a piece of handwriting is not unlike Fig.2.1., where the word /the/ was hand drawn by mouse using the unix bitmap utility. It is reasonable to want to be able to read the characters, /t/, /h/, /e/ by machine. For handwriting of the quality shown, this is not too difficult. It is, however, far more difficult than reading printed script, and so we shall treat the printed case first.

In order to make the problem moderately realistic, we look at the output of a scanner, or at least a bit of such output, when applied to two pages of text taken from a book. (Fig.2.2.). The quality of the image is not, we regret to say, up to the quality of the book. Such images are, however, not uncommon, and it is worth noting a number of features.


 
Figure 2.2: A sample of scanned text from a book.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig2.2.ps}\end{figure}

The problem of reading such text is therefore far from trivial, even if we stipulate somewhat higher quality reproduction than is present in Fig.2.2. In doing some serious reading of a document such as a newspaper, large and uncomfortable problems of working out which bits of text belong together, which bits of an image are text and which bits are the pin-up girl, and similar high level segmentation issues arise. Any illusion that the problem of reading text automatically is easy should be abandoned now.


next up previous contents
Next: Image segmentation: finding the Up: Image Measurements Previous: Image File Formats
Mike Alder
9/19/1997