next up previous contents
Next: Curved Elements Up: Syntactic Pattern Recognition Previous: Precursors

Linear Images


 
Figure 8.4: A pixel set with some structure.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig8.4.ps}\end{figure}

In fig.8.4 we have a representation of a finite pixel set such as might be seen on the screen of a computer.

It is easy to see some structure in the image: it is far from being a random set of points inside the rectangle which is the computer screen.

Let us first choose to regard this as a finite set of points in a bounded region of ${\fam11\tenbbb R}^2$. This representation allows us to treat resolution issues as a separate matter. From now on I use the term pixel and point of the image interchangeably.

I propose to treat every point of the image as being a symbol, in an abstract sense which I shall clarify by example rather than by formal definitions. I want to apply to the image the same technique which chunks a stream of symbols (letters) into a new stream of higher level symbols (words) from a different alphabet. I shall thus have a chunking UpWrite process which will take some subset of the image and declare it to be a single chunk.

I proceed in a manner essentially similar to the chunking process for letters. I choose a resolution radius, k, corresponding to the k in a choice of kgrammar. I choose a pixel and set up a ball of radius k, which need not be an integer. I then take the set of all points of the image within the ball. This has to be turned into a local predictor of other points. There are many possibilities, but I shall simply compute low order moments for the set of points in the ball. For simplicity of exposition, I shall compute zeroth, first and second order moments of the set. This is equivalent to computing the number of pixels, the centroid of the set, and the covariance matrix of the set of points inside the ball. I may conveniently represent the quadratic form specified by the first and second order moments by drawing an ellipse, as in chapter four. This becomes a predictor of adjacent points, and also of adjacent ellipses, in an obvious way: we may expect points within some Mahalanobis distance of the centre, and we expect other balls to be placed on the image, so we expect other ellipses with centres about Mahalanobis distance two apart, most probably, therefore, along the major axis. I remove the point selected as my first centre or even all the points in the ball, and repeat with another choice of ball. In this way I can rapidly reduce the diagram to the set of quadratic forms shown in fig.8.5.


 
Figure 8.5: Prelude to Chunking.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig8.5.ps}\end{figure}

We have now got something which is analogous to the kgram predictor, but which works in a wholly geometric setting. We use it to perform entropic chunking of the pixel set. More precisely, we `colour' a set of pixels inside one ellipse, say, purple. If the pixels in a neighbouring ellipse are reasonably well predicted, i.e. if they are in the same line in roughly the expected place, we colour them purple as well. If not, we don't. We proceed in both possible directions, colouring pixels purple, until we come to an end where the behaviour is different. We then select another pixel, uncoloured, and another colour, vermillion say, and repeat. The effect of this on our triangle is clear enough: each side gets painted a different colour and we wind up with three colours. Each colour denotes a chunk of pixels.


 
Figure 8.6: Chunks.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig8.6.ps}\end{figure}

The final stage of the chunking UpWrite is to assign a symbol to each chunk from a new alphabet. Since we want geometric representations, we want a single vector which describes the set of points along an edge. Since we want it to work for other sets in other images, it must not be specific to line segments but must work for any shaped set of points . We choose, from many possible representations, to compute low order moments. For straight line segments, zeroth, first and second order suffice. So we simply compute three sets of moments, one for each side of the triangle, and since the original points are in ${\fam11\tenbbb R}^2$, and there is one number for the zeroth order moment, two first order moments and three second order moments, we wind up at the UpWrite stage with three points in ${\fam11\tenbbb R}^6$ as a description of the triangle.

We have now UpWritten a set of symbols (points in ${\fam11\tenbbb R}^2$) to a smaller set of symbols from a different alphabet (points in ${\fam11\tenbbb R}^6$) in a way which is analogous to the way in which we extract words from strings of letters. The situation can be described by the diagram of fig.8.7:


 
Figure 8.7: A useful Analogy.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig8.7.ps}\end{figure}

Having done it once, there is nothing to stop us doing it again. In this case a judicious choice of radius will include all three points, and there is only one chunk. The whole lot may be UpWritten to a set of moments. In this case, again the zeroth, first and second order moments suffice to specify the set of three points. We obtain one point in ${\fam11\tenbbb R}^{28}$ describing the triangle, at the second level of UpWrite.

Having done this with one triangle, we may do it with another which is smaller or bigger than the original one. Suppose we take a scaling parameter, r, and multiply the original image by r in order to get the scaled version. I shall suppose that there is no translation of the images. Let us suppose, in a spirit of charity, that r varies from about 0.5 to 1.5. Now provided that the resolution radius is big enough to enclose enough pixels to avoid total degeneracy in the covariance calculation[*], and sufficiently small not to enclose more than one edge at once, the same process will work to the first level. Each edge will be in a different location and will be scaled appropriately. At the second level of UpWrite, the centroid will be in the same position, but the covariance will go up as the square of the scaling term. Thus the different triangles will UpWrite to points lying along a smooth curve in the topmost space, ${\fam11\tenbbb R}^{28}$ in this case.

This set of `triangles', can also be described by moments (although greater than second order might be expedient) to specify it as a point in yet another space. This point in the top level space corresponds to the language of fig.8.3. From it, we can generate the points of the triangle space. From each of these points we can DownWrite to a set of line segments, and from each of these to a set of pixels. The DownWrite is not generally unique, but DownWrites often aren't.

It should be clear that we have all the capacity to precribe equilateral triangles of the Fu program, and rather a lot more besides. If someone gives us a collection of triangles and a collection of squares, we will get two clusters in the space formerly containing only triangles as points. Providing we don't scale down to zero, the clusters will be disjoint. The clusters will be named as vectors at the top level, but because they are quite separate and no intermediate forms occur, they might just as well be described by symbols. Indeed, a symbol may now be interpreted as being a point in ${\fam11\tenbbb R}^n$, assigning an alphabet of them a metric structure.


 
Figure 8.8: An object requiring more levels for its decomposition.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig8.8.ps}\end{figure}

If instead of having a triangle we had presented the system with a drawing of a cube, as in fig.8.8 we should have been obliged, in order to do a good job of capturing the higher order structure, to go to a higher level of UpWrite. And of course, it can go further, as the world of block objects shows.


 
Figure 8.9: And one requiring even more.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig8.9.ps}\end{figure}

It is possible to write a program then, which can be fed as input a set of line segments at various orientations and locations. If we had given it a set of vertical short lines, and a set of horizontal long lines, disposed anywhere on the screen, the program would produce two clusters in ${\fam11\tenbbb R}^6$. Given a label to each cluster and a new line segment, we could use standard methods to decide to which family it belonged. Moreover, it is a simple extension to the program to have it presented with squares and triangles, so that having first found the edges, it then UpWrites each square to four points in ${\fam11\tenbbb R}^6$, each triangle likewise to three points in ${\fam11\tenbbb R}^6$, and then performs exactly the same trick again to obtain a cluster of square points and a cluster of triangle points. These might be taken to be in ${\fam11\tenbbb R}^{28}$. This would give us a natural system for classifying triangles and squares. It is easy to see that the same process can be applied again to higher level objects such as cubes and pyramids, and even higher order objects built up out of boxes. And the only limitations on this process are computational.


next up previous contents
Next: Curved Elements Up: Syntactic Pattern Recognition Previous: Precursors
Mike Alder
9/19/1997