next up previous contents
Next: Linear Images Up: Syntactic Pattern Recognition Previous: Syntactic Pattern Recognition

Precursors

Syntactic Pattern Recognition was inaugurated by King Sun Fu, who wanted to extend some syntactic ideas to images. He tried to do it by the following approach: first he would take his image and scrutinise it carefully for bits of subimage that, more or less, recurred. If, for example, he had handprinted alphabetics, he would find bits of curves and lines from which he could regenerate the image, or at least an approximation to it, by gluing them together in some way. He assigned a symbol to each of his image primitives, and constrained the relationships between the primitives. In the simplest case it was just an order relation, and the image decomposed into a string of primitives and hence was described by a symbol string. Rafael Gonzalez has also written about this approach. Pavlidis has also considered the issue of structured objects, usually images.

The set of strings which you get from a family of images which are of the same type, for example handwritten words, or Kanji characters, where repetitions of the same word or character generate generally different strings which are nevertheless classified together, constitutes a language in the formal sense, indeed a stochastic language.

Now we produce a grammar for this language, usually a stochastic grammar. So we have one grammar for each of the families of image. Finally, for a new object, we reduce it to a string of primitives and calculate the probability for each grammar. The best bet wins. This is similar to straight statistical modelling of points in ${\fam11\tenbbb R}^n$ but gives us a class of models for languages instead. It should be noted that this is almost exactly what is done in the standard VQ/HMM automatic Speech Recognition systems. The space is chopped up into lumps, and it is the lump through which the trajectory goes which determines the symbol. The symbol strings are then used to generate a hidden markov model, a stochastic grammar.

There are four problems associated with this plan:

First, the order relation may not suffice for a specification of the way the primitives are related to each other, and generally does not.

Second, the system for choosing the primitives is frequently fraught with problems and normally involves human intervention. It should, one feels, be taken care of automatically.

Third, the system of matching the actual signal to the appropriate primitive may be equally doubtful and must inevitably entail some degree of forcing a continuum of possible forms into a discrete set of allowed descriptors. It is reasonable to want this to emerge from the data rather than to be forced upon it in essentially arbitrary ways.

Fourth and finally, the business of inferring a grammar from a sample of strings is often difficult and sometimes impossible to do in any manner which does not feel arbitrary and insanitary.


  
Figure 8.3: Classical Syntactic Decomposition.
\begin{figure}
\vspace{9cm}
\special {psfile=patrecfig8.3.ps}\end{figure}

To bring home all of them at once, suppose we feed the system a set of images of equilateral triangles, all scalar multiples by an integer of a smallest such one. We provide the system with primitives as shown in fig8.3. I have left arrows off, but we can take it that line a goes uphill, line b downhill, and line c from right to left. Then the string for triangles of side n units is anbncn which is a context sensitive language. If we constructed a hexagon by

\begin{displaymath}
a \bar{c} b \bar{a} c \bar{b}\end{displaymath}

where the bars denote the opposite direction, we can distinguish the objects which are hexagons from those which are triangles by examining the syntax of the strings. Triangles don't contain $\bar{x}$ for any x. Other strings such as ancnbn are different triangles.

The system seems absurdly rigid, having no way of describing a triangle half as big again as the basic one. One could find ways around this, but what if the edges didn't quite meet? What if the lines were curved a little? It looks as though some sort of a symbolic strait-jacket is being forced onto a geometric world.[*] The straight-jacket approach leaves one wondering how one is going to decide to push cases into the formalism in general.

And finally, one might be a little unhappy about a system which had to do such careful counting in order to infer the language. Inferring regular languages is easy, but inferring context sensitive languages does not sound a good way to go.

If one contemplates this example and the fixes required to make it fly, one feels some sympathy for the basic insight: we need some mechanism for obtaining descriptions of at least some geometric objects in terms of levels of structure, just as we have with natural language. But it is less than clear that the program inaugurated by Fu is the way to go forward. It seems more natural to preserve the geometric structure of images for as long as possible, until the description at a symbol level occurs naturally.

There are many other kinds of grammars which have been devised, in particular graph-grammars for graphs and stochastic grammars for images (Markov Random Fields) but none have been free of flaws. When they allowed inference, it did not lead to hierarchies of description which look the least like natural language structures. In what follows, I shall outline a theory which does it in a more agreeable manner. In order to avoid excursions into category theory, I leave the details of the abstraction to papers mentioned in the bibliography, and shall instead concentrate on giving algorithms which perform the needful operations. I start with some very simple cases in order to focus the mind on the essentials.


next up previous contents
Next: Linear Images Up: Syntactic Pattern Recognition Previous: Syntactic Pattern Recognition
Mike Alder
9/19/1997