Next: Border Tracing
Up: Image segmentation: finding the
Previous: Mathematical Morphology
Next we can start at the left hand end of each
line (of the original image, not the dilated
one!) and locate the first black pixel. This may
be the bottom half of a character of text in
the line above, the page number of the page, some
noise from the book spine via an out of
focus scanner, or a bit of a thumb-print or soup
stain or worse. Since we have
estimates of the character height obtained from
the thickening, we can make an estimate of
where the bottom of the character should be if
we actually have the top left hand pixel of a
character, and also have some ideas about where
the rest of it should be. These will be very
crude, because we might have an /o/, or a /b/
or a /p/ or a /q/, not to mention capitals or
punctuation. If we have confidence in our knowledge
of the font, we can look for vertical
(relative to the direction of the line as horizontal)
and horizontal separating lines between
which our character may be found.
It is common practice in the case of the Roman
alphabet to fit slightly
slanted lines from South-South-West to North-North
East because, as a careful examination of the
italic text shows, there is not in general a vertical
separating line between well spaced
characters in italic fonts. This can also happen
with well spaced non-italic fonts as when TEX
places the characters in a word such as
WAVE
and it should be noted that there may be no line
slanted like: `/' separating characters,
either.
It should be plain that separating out characters
from each other and putting boxes around
them is not usually the simple procedure one might
have hoped for.
Next: Border Tracing
Up: Image segmentation: finding the
Previous: Mathematical Morphology
Mike Alder
9/19/1997