In fig.8.4 we have a representation of a finite pixel set such as might be seen on the screen of a computer.
It is easy to see some structure in the image: it is far from being a random set of points inside the rectangle which is the computer screen.
Let us first choose to regard this as a finite
set of
points in a bounded region of
. This
representation allows us to treat resolution issues
as
a separate matter. From now on I use the term
pixel and point of the image interchangeably.
I propose to treat every point of the image as being a symbol, in an abstract sense which I shall clarify by example rather than by formal definitions. I want to apply to the image the same technique which chunks a stream of symbols (letters) into a new stream of higher level symbols (words) from a different alphabet. I shall thus have a chunking UpWrite process which will take some subset of the image and declare it to be a single chunk.
I proceed in a manner essentially similar to the chunking process for letters. I choose a resolution radius, k, corresponding to the k in a choice of kgrammar. I choose a pixel and set up a ball of radius k, which need not be an integer. I then take the set of all points of the image within the ball. This has to be turned into a local predictor of other points. There are many possibilities, but I shall simply compute low order moments for the set of points in the ball. For simplicity of exposition, I shall compute zeroth, first and second order moments of the set. This is equivalent to computing the number of pixels, the centroid of the set, and the covariance matrix of the set of points inside the ball. I may conveniently represent the quadratic form specified by the first and second order moments by drawing an ellipse, as in chapter four. This becomes a predictor of adjacent points, and also of adjacent ellipses, in an obvious way: we may expect points within some Mahalanobis distance of the centre, and we expect other balls to be placed on the image, so we expect other ellipses with centres about Mahalanobis distance two apart, most probably, therefore, along the major axis. I remove the point selected as my first centre or even all the points in the ball, and repeat with another choice of ball. In this way I can rapidly reduce the diagram to the set of quadratic forms shown in fig.8.5.
We have now got something which is analogous to the kgram predictor, but which works in a wholly geometric setting. We use it to perform entropic chunking of the pixel set. More precisely, we `colour' a set of pixels inside one ellipse, say, purple. If the pixels in a neighbouring ellipse are reasonably well predicted, i.e. if they are in the same line in roughly the expected place, we colour them purple as well. If not, we don't. We proceed in both possible directions, colouring pixels purple, until we come to an end where the behaviour is different. We then select another pixel, uncoloured, and another colour, vermillion say, and repeat. The effect of this on our triangle is clear enough: each side gets painted a different colour and we wind up with three colours. Each colour denotes a chunk of pixels.
The final stage of the chunking UpWrite is to
assign a
symbol to each chunk from a new alphabet. Since
we want
geometric representations, we want a single vector
which
describes the set of points along an edge. Since
we want
it to work for other sets in other images, it
must not
be specific to line segments but must work for
any shaped
set of points . We choose, from many
possible representations, to compute low order
moments.
For straight line segments, zeroth, first and
second order
suffice. So we simply compute three sets of moments,
one
for each side of the triangle, and since the original
points are in
, and there is one number
for the
zeroth order moment, two first order moments and
three
second order moments, we wind up at the UpWrite
stage
with three points in
as a description of
the
triangle.
We have now UpWritten a set of symbols (points
in
) to a smaller set of symbols from a different
alphabet (points in
) in a way which is
analogous to the way in which we extract words
from
strings of letters. The situation can be described
by
the diagram of fig.8.7:
Having done it once, there is nothing to stop
us doing it
again. In this case a judicious choice of radius
will
include all three points, and there is only one
chunk.
The whole lot may be UpWritten to a set of moments.
In
this case, again the zeroth, first and second
order
moments suffice to specify the set of three points.
We
obtain one point in
describing the triangle,
at the second level of UpWrite.
Having done this with one triangle, we may do
it with another which is smaller or bigger than the original
one. Suppose we take a scaling parameter, r,
and multiply the original image by r in order to
get the scaled version. I shall suppose that there is
no translation of the images. Let us suppose, in
a spirit of charity, that r varies from about 0.5 to
1.5. Now provided that the resolution radius is big enough
to enclose enough pixels to avoid total degeneracy in the
covariance calculation
, and sufficiently small not to enclose more
than one edge at once, the same process will work
to the first level. Each edge will be in a different
location and will be scaled appropriately. At the second
level of UpWrite, the centroid will be in the
same position, but the covariance will go up as the
square of the scaling term. Thus the different triangles
will UpWrite to points lying along a smooth curve in
the topmost space,
in this case.
This set of `triangles', can also be described by moments (although greater than second order might be expedient) to specify it as a point in yet another space. This point in the top level space corresponds to the language of fig.8.3. From it, we can generate the points of the triangle space. From each of these points we can DownWrite to a set of line segments, and from each of these to a set of pixels. The DownWrite is not generally unique, but DownWrites often aren't.
It should be clear that we have all the capacity
to precribe equilateral triangles of the Fu program,
and rather a lot more besides. If someone gives us
a collection of triangles and a collection of squares,
we will get two clusters in the space formerly containing
only triangles as points. Providing we don't scale down
to zero, the clusters will be disjoint. The clusters
will be named as vectors at the top level, but
because they are quite separate and no intermediate forms
occur, they might just as well be described by symbols.
Indeed, a symbol may now be interpreted as being a point
in
, assigning an alphabet of them a metric
structure.
If instead of having a triangle we had presented the system with a drawing of a cube, as in fig.8.8 we should have been obliged, in order to do a good job of capturing the higher order structure, to go to a higher level of UpWrite. And of course, it can go further, as the world of block objects shows.
It is possible to write a program then, which
can be fed as input a set of line segments at various orientations
and locations. If we had given it a set of vertical
short lines, and a set of horizontal long lines, disposed
anywhere on the screen, the program would produce two clusters
in
. Given a label to each cluster and a new
line segment, we could use standard methods to decide
to which family it belonged. Moreover, it is a simple extension
to the program to have it presented with squares
and triangles, so that having first found the edges, it then
UpWrites each square to four points in
, each triangle
likewise to three points in
, and then performs exactly
the same trick again to obtain a cluster of square points
and a cluster of triangle points. These might be taken
to be in
. This would give us a natural system
for classifying triangles and squares. It is easy
to see that the same process can be applied again to higher
level objects such as cubes and pyramids, and even higher
order objects built up out of boxes. And the only limitations
on this process are computational.