In effect then, the system under discussion will do automatically the aggregating into what sets should become segments, hierarchically, under the accumulation of data. We have disposed of the segmentation problem.
There were two simple data sets which all existing classification systems get wrong. It was asked if we could find a generic `fix' which would do better than the naive clustering methods discussed thus far under the heading of static pattern classification.
The answer is gratifyingly positive. Consider
first the chess-board
pattern. At the lowest level of aggregation, we
find neighbourhoods
in a small ball correlate with other adjacent
small balls provided
the diameter of each ball is small compared with
the size of a
square. The points are technically in
,
with the last
component either +1 or -1 depending on the
colour or type of
point. The entropic chunking will thus collect
all the points of a
square together for the next level of UpWrite.
The next level now represents the alternating
pattern. In a suitable
space of square regions, perhaps crudely represented
by fourth order
moments of a square in
, we get alternating
blocks
of points in the base level. Taking any one such
point and looking
at a neighbourhood, we get, as we go across the
image, very tight
clustering: we keep getting a point which is a
cross being
surrounded by opposite category points along ranks
and files, and
diagonals which are of the same type if the neighbourhood
is
slightly bigger. This structure is in effect learnt
from other parts
of the same data set. Using such a neighbourhood
descriptor, the
entire image is decomposed into iterates of this
primitive. The
system learns the process by which it appears
to have been generated
in the same way as the bag structure ABCDABCDABCD
was learnt in the
last chapter.
Classifying a point in a region is done by downwriting from the higher level description which will fill the empty square with points which although not present in fact are deduced from the higher level structure. Now we can determine the category of the unknown point by looking at its neighbours- not the actual neighbours but the virtual neighbours generated by the DownWrite. This is analogous to identifying the word profes5or as being close to the word professor which has been seen before, DownWriting the word to its spelling (a rather trivial operation), and then identifying the symbol between s and o
We are using the same machinery in both cases,
just doing it with
different interpretations of the terms `symbol'
and `concatenate'
in fact. This point requires some mathematical
sophistication not
required of the reader of this book.
In the same way, the double spiral data set is
a set of points in
, the first two coordinates telling you
where the data is and
the last digit telling you if it is a nought or
a cross. The local
structure is quite different from the case of
the chess-board
pattern, but the same machinery works. The set
of noughts is locally
described by a set of ellipses, and local groups
of, say three
ellipses may be represented by giving their location
relatively,
or this information may be inferred from the canonical
UpWrite
via moments of the points in
. The relationship
will in fact
change slightly, but predictably, as the data
gets further from the
centre. Extrapolation is therefore possible. This
is done
automatically by simply inferring the structure
`rule' and
DownWriting to get the distribution of points
which the top level
coding will generate. Again, the classification
is accomplished by
the same machinery, we look at the virtual neighbouring
data
obtained from the DownWrite.