The earliest way of reading characters automatically was to look at each character through a family of masks, basically holes in a piece of metal, and see how much was visible in each case.
This is not essentially different
from measuring intersections with scan lines,
except that the mask holes don't have to be lines,
they can be any shape. Nor is it very different
in principle from Exercise 1.6.3, where
we have weights of 1 and 0 on the arcs joining
input cells to the neural unit. The older
terminology also leads to terms like template
matching, where you have a mask exactly the
shape of the character you are trying to detect,
and you measure, in effect the distance from
the template mask. To give an example, suppose
you wanted to detect a vertical bar in the
middle of a three by three grid of cells, say
a white bar on a black background. Then you could
arrange the weights so as to give a zero to every
cell of a three by three array of cells
where the actual cell value was the same as the
desired pattern, and a plus one to every cell
where there is a difference. Then this simply
gives as sum the Hamming distance between the
two
images. If zero, they are identical, if 9 they
are negatives of each other. Or you can change
things round so that the score is plus 1 when
they are the same and zero when they differ,
which means you go for the high score instead
of the lowest. The difference between the two
systems is rather trivial. So masks and templates
are just old fashioned language for measuring
distances between points in
where n is
the number of pixels in the image and the entries
in the vectors are usually just 0 or 1.
It is worth being clear about this. You measure similarity between image and template, you measure a distance in some representation space. If you do this for several templates, you get a vector of distances. That gives a new representation in which your choice of the `right' interpretation is just the vector component which is smallest. This is the metric method discussed in chapter one.