The segmentation of greyscale characters is rather more complex because we cannot just trace the white spaces between objects; the separating background will be textured. It makes sense to try to get a characterisation of the background texture and identify that. How this may be done will be outlined below.
Additional information may be available which can assist in segmentation, such as the number of characters and the alphabet. Life is simpler if we are looking for digits than for digits together with alphabetics, and if we know there are precisely four of them and how big they are, and possibly how separated they are, the problem of segmentation becomes easier. If I know that there is going to be a set of parallel texture bands of known width, orientation and number, distinguishable from the digits themselves at the texture level, then finding such separating bands may not be too hard a problem.
In general, the finding of edges in grey scale or colour images is a relatively complicated business; again this is properly part of an image processing course rather than a pattern recognition course, so I shall merely sketch the elements of the methods available.
Regarding the function which picks out a letter /E/ in Fig.2.11, we see that the bright regions (for I assume that we have here a bright letter against a black background) are the tops of mountains, and the dark regions are the background. For a black image on a white ground, the signs are reversed but no essential difference occurs. Now if an edge can be said to exist, it is a region where the gradient of the two dimensional function is steep. So we need to go around looking for the places where the gradient is high. This may be done by looking along a horizontal line and measuring the difference in pixel values for consecutive pixels. If this exceeds some threshold, you have a (vertical) edge at the pixel in the centre of the rapid change. Similarly one may look along vertical or diagonal lines. If you have found a possible edge by scanning left to right, some time may be saved by looking at adjacent pixels in the row underneath. Edges that have no extension in a direction transverse to the scan line are probably just noise. This method can be quite fast, and can be altered by making the threshold vary according to whether or not adjacent pixels in a preceding row or column have produced `micro' edges. This is a direct and obvious method which can work poorly on some images- but then, for each method there is an image it will perform badly on.
We can actually differentiate an image with respect to a direction, horizontal vertical or either of the diagonals, by simply rewriting each pixel to the difference between consecutive values: if in a scanline we have pixels xn followed by pixel xn+1, then in the differentiated image, we put yn = xn+1 - xn and the result is, approximately, the partial derivative of the function in the horizontal direction if xn is to the left of xn+1. Similarly we could partially differentiate in the vertical or diagonal directions with the obvious modifications. When one of the partial derivative values exceed some threshold, we have a putative edge. Real edges are putative edges that have some other putative edges adjacent to them, and the longer the chain of putative edges, the realer the edge becomes. This intuitively appealing idea can be formalised and made respectable.
Making the above system slightly more complicated,
we can express it in the following terms:
run a step function g(n) along each row of the
image, xn which for consistency ought to be
written as x(n). Let g be defined by g(0)
= -1, g(-1) = 1, g(n) = 0 for
.
Now
define
.
Now those depressing infinities vanish since
g is zero except at 0 and -1, so
which gives us
the
partial derivative in whichever direction n is
increasing. Or at least, a discrete approximation
to
it.
The expression
is said to be the convolution of x
with g, and so we have treated a one dimensional
convolution which outputs a directional
derivative. It is convenient to think of this
as taking a sort of generalised `window'
over the function x, and applying a kind of
weighted average operation to x, weighting
by
g, in order to get y. If, for example we had
g(0) = 0.5, g(1) = 0.25 = g(-1), then we would
be doing a smoothing operation to x. The essential
idea is simple: we input a function x, and
replace x(t) by some linear combination of the
values of x in a neighbourhood of t. The
set
of coefficients in the linear combinations is
specified by a function g. It is distinctly
reminiscent of Morphology operations, but linear.
In two dimensions, it is common to impose a three by three array of pixels on which g is defined; everywhere else it is supposed to take the value zero. In the nine squares, the window, g, can have any set of values whatever, although we often normalise to ensure that they add up to 1. Then we generate a new image by plonking the window down on the (p,q) pixel of the old image, summing over the neighbourhood of this pixel with the weights of g applied, and then taking the resulting value and assigning it to the (p,q) pixel of the filtered output image. Suitable choices of g can find edges in different directions, in essentially the same way as the example of partial differentiation. Three by three windows are on the small side, and larger windows are more common.
Working through a few cases by hand beats the idea into ones skull very rapidly, and you should explore the exercises at the end of the chapter.
Using bigger windows or to put it in another way, taking a convolution with a function having a bigger support, can give improved results at slightly higher cost in time. Image convolutions may be done in hardware on some image processing boards.
And that is your very brief introduction to convolution filters. For more details, see any good book on image processing, several are cited in the bibliography.