next up previous contents
Next: Intrinsic and Extrisic Chunking Up: Syntactic Pattern Recognition Previous: Parameter Regimes

Invariance:
Classifying Transformations

An important case of families of images occurs where there is essentially one object, which generates a set of images by the application of a Lie group of transformations, or some neighbourhood of the identity in such a group. [*]

In order to focus ideas, I shall consider again the simple case of triangles which are all equilateral and which differ only in the scale. I shall suppose that we have a standard such triangle with the centroid at the origin, and that by taking a range of scalars between 0.5 and 1.5 I have shrunk or enlarged the basic triangle to give a set of images, one for each scalar.

If we UpWrite the triangles to points in ${\fam11\tenbbb R}^{28}$ in the standard way, we see that the set will constitute an embedding of the interval [0.5,1.5] in ${\fam11\tenbbb R}^{28}$. The line will be curved rather than straight, and in practice we will get a finite point set on or close to the idealised embedding.

Doing the same thing with squares and hexagons will yield two more such curves. If the size of the standard object in all three cases is similar, then the curves will lie approximately on a space of `scaled planar objects'. Other objects, such as pentagons, heptagons, octagons and circles will also lie on such a space. This space is foliated by the curves representing the Lie group action. Given a new object, it is possible in principle to estimate the scaling curve upon which it lies, and to be able to recognise a scaled version of the object which has never been seen before. In other words, we may learn the transformation.

Less ambitious undertakings have been found to work quite satisfactorily: in an application involving the recognition of aircraft sillhouettes, the space of each curve was found for two aircraft, and a hyperplane fitted to each of them. Scalings from $60\%$ to $140\%$ of the original size were used. A new silhouette of one of the aircraft was then presented to the system, and it was classified according to which of the two hyperplanes was closest. By this means it was found possible to obtain correct classification down to about $20\%$ of the original size, at which point resolution issues came into play. The system was capable of learning the space and performing creditable extrapolations in order to achieve a classification.

Other Lie Groups occurring naturally are the group of shifts or translations, and the group of rotations. The cartesian product of these spaces with a scaling space gives, for each planar object, a four dimensional manifold embedded in the top level space.

It is of course also possible to use invariant moments, but this presupposes that we know the appropriate group. If we contemplate the general situation, we see that this may be infeasible. For example, an image of a bar code printed on a can of beans is deformed in a way which is rather harder to specify, and the transformations to a trajectory in the speech space involved in (a) breathing over the microphone, (b) enlarging the vocal tract, (c) changing sex, we see that although we have group actions on the data, and although factoring out these group actions is highly desirable, an analytic description of the action is at least difficult and generally impossible to obtain.

A symmetric object will not generally give an embedding, but will give an immersion of the group[*]. If, for example, we rotate a square, the result of UpWriting will not embed the circle group SO(2) in the top level space, because the path traced out will recur after one quarter of a circuit. We get a circle mapped into the space, but we trace it out four times. Given noise, quantization and finite sampling, the result may be four very close lobes, each close to what is topologically a circle. This gives us a means of identifying symmetries in the object.

If we consider three dimensional rotations of an object such as an aircraft which is then projected onto a plane, as when we take a video image, we get, via the UpWrite process, a space which may not be a manifold, but which is likely to be one for almost all of the points. If the reader thinks of something like a circle with a diameter included, he will observe that the object is pretty much one dimensional, in that for most points, a sufficiently small neighbourhood of the point will give a set which looks like an interval in ${\fam11\tenbbb R}$. The two points where the diameter meets the circle are singular, but there are only two of them. For most points of the UpWritten image, a small perturbation derived from a rotation in three dimensions, will give another point which if DownWritten to the image set will give something qualitatively the same. Imagine an aeroplane or a cube `in general position'. For some views of the cube or the aeroplane however, the image degenerates into something different from neighbouring views, as for example when the cube is seen along an axis orthogonal to a face and we see a square. There may be discontinuities in the UpWrite at such locations; nevertheless, the space is mainly a manifold, it may be UpWritten as an entity in its own right. When this is done with distinct classes of objects, we recover at this level separated objects which may conveniently have the symbol designating them replaced by an alphabetic term, since the toplogy has become trivial. Thus we see that the case of strings and symbols in the usual sense may be recovered as a degenerate case of the more general topological one.


next up previous contents
Next: Intrinsic and Extrisic Chunking Up: Syntactic Pattern Recognition Previous: Parameter Regimes
Mike Alder
9/19/1997