The framework discussed so far,
however, has concentrated
on recognising things which just sit
there and wait to be recognised; but many things
change in time in
distinctive ways. As an example, if we record
the position and possibly
the pressure of a stylus on a pad, we can try
to work out what
characters are being written when the user writes
a memo to himself.
This gives us a trajectory in dimension two to
classify. Or we might
have an image of a butterfly and a bird captured
on videotape,
and wish to identify them, or, more pressingly,
two kinds of aeroplane
or missile to distinguish. In these cases, we
have trajectories in
or possibly
as the objects to be
recognised. A
similar situation occurs when we recognise speech,
or try to: the first
thing that is done is to take the time sequence
which gives the microphone
output as a function of time and to perform some
kind of
analysis of its component frequencies, either
by a hardware filter bank,
an FFT (Fast Fourier Transform) followed by some
binning so as to give a
software simulation of the hardware filterbank,
or relatively exotic methods such as Cepstral
Coefficients or
Linear Predictive Coding coefficients.
All of these transform the utterance into a trajectory
in
some space
for n anywhere between
2 and 256. Distinguishing the word `yes' from
the word `no',
is then essentially similar to telling
butterflies from birds, or boeings from baseballs,
on the basis of
their trajectory characteristics.
An even more primitive problem occurs when one
is given a string
of ascii characters and has to assign provenance.
For example,
if I give you a large sample of Shakespearean
text and a sample of Marlowe's
writing, and then ask you to tell me what category
does a piece written
by Bacon come under, either or neither, then I
am asking for a
classification of sequences of symbols. One of
the standard methods of
doing Speech Recognition consists of chopping
up the space of speech
sounds into lumps (A process called vector
quantisation in the
official documents) and labelling each lump with
a symbol.
Then an utterance gets turned first into a trajectory
through the
space, and then into a sequence of symbols, as
we trace to see what
lump the trajectory is in at different times.
Then we try to
classify the symbol strings. This might seem,
to the naive,
a bizarre approach, but it might sound more impressive
if we
spoke of vector quantisation and Hidden Markov
Models. In this form, it is
more or less a staple of speech recognition, and
is coming into favour
in other forms of trajectory analysis.
The classification of trajectories, either in
or in some
discrete alphabet space, will also therefore preoccupy
us at later
stages. Much work has been done on these in various
areas:
engineers wanting to clean up signals have developed
adaptive
filters which have to learn properties of the
signal as
the signal is transmitted, and statisticians and
physicists have
studied ways to clean up dirty pictures. Bayesian
methods of
updating models as data is acquired, look very
like skeletal models for
learning, and we shall be interested in the extent
to which we can use
these ideas, because learning and adaption are
very much things that
brains do, and are a part of getting to be better
at
recognising and classifying and, in the case of
trajectories, predicting.