Suppose you have a few hundred vectors of two categories, each vector of dimension 12. It may be possible to look at the column of numbers and tell by eye which is which. If the points are well separated this can sometimes be done. This primitive and basic eyeballing of the data is well worth trying. Sometimes it can show that you have done something daft, such as having a zero vector in both categories, which has been known to be produced by a program bug.
If the number of categories is bigger than two,
or the data less well separated, looking at the
columns of numbers may be uninformative. But just
as a photograph of, say, a tree is a projection
of a three dimensional object onto two dimensions,
so it is possible, indeed easy, to project from
down to
for any n. And to get from
any bounded piece of
to the computer screen
is just shrinking or stretching and shifting.
Any view of a piece of
will leave somethings
hidden behind others and leave it unclear as
to the relative distances, but doing any rotation
about almost any axis will show us the relative
spacings and reveal what was hidden. If we can
rotate
and look at the resulting data
from different perspectives, we can sometimes see some
useful structure in the data.
Suppose we have two orthogonal vectors
in
of length 1. Then if
is a
point in
and
is the usual dot
product, the projection of
onto the plane
spanned
by
is
. So by taking the two numbers
we map the point
into
. It is then straightforward
to plot these points on the computer screen, and one of
the programs on disk does this for you.
Rotations in
are most easily thought about
as composites of rotations in a choice of plane.
The three dimensional case is rather special,
since in choosing a plane to rotate in
we are also choosing an orthogonal line which is fixed.
In
, we can leave a two dimensional
subspace fixed, we rotate about a plane. So it
is simpler to stop thinking in terms of axes
of rotation and to concentrate on the plane that
is moved into itself. It is easiest to take these
to be defined by some choice of two different axes.
We therefore obtain a basic rotation in
by choosing distinct i and j between 1
and n. We write down the identity n by n matrix,
and change the locations (i,i), (i,j),(j,i)
and (j,j). In the (i,i),(j,j) locations we
write
, for some choice of
,in location (i,j) we write
and
in location (j,i) we write
for
the same choice of
. The new matrix rotates
by an angle
in the (i,j) plane.
By making any choices of i,j and
, we
get different rotations. By composing one after
another we can get any special orthogonal matrix.
Initialising the vectors
and
to being,
say, the first two elements in the standard basis
for
, and then applying a sequence of rotations,
we can view the points of the data set from
a variety of angles. We regard our two `window
vectors' as the things to be operated on rather
than the data points, as this is a lot quicker.
There are some obvious modifications to the above algorithm, such as scaling (zooming in or out) and shifts in the space. After a large number of rotations, it may happen that the window vectors get deformed out of true: they may no longer be orthonormal as a result of the accumulation of rounding errors in the matrix multiplications. It is fairly easy to renormalise the window vectors if this happens.
There are several advantages to inspecting the data by this method; outliers may be found (and sometimes recognised as errors), and the decision as to whether to use Statistical or Neural Net methods or alternatives may be taken in some cases. It has been found, for example, that some data appear to consist of one gaussian for each category of point under every explored projection, and in this case it is tempting to automate the visual inspection: make random choices of rotation, for each projection test to see if the distribution is acceptably gaussian, using any of the standard statistical methods, and repeat. If it is gaussian from enough directions, where `enough' depends on n, we might feel that it would be sensible to use a gaussian distribution to model the data. If it looks very much like a collection of convex sets bounded by hyperplanes, one might use the classical neural nets.
On the disk may be found some images of vowels
obtained by extracting from the NIST data base
samples of many speakers saying `Ah' and `Oo'.
Each sound was sampled and a Fast Fourier Transform
applied to obtain the power spectrum. The results
were binned into 12 different frequency bands,
spaced more or less logarithmically along the
frequency axis, and the resulting twelve numbers
used to describe the sound. An utterance was thus
turned into a trajectory in
, although
rather a short one in this case. The collection
of such trajectories is fitted reasonably well
by a twelve dimensional gaussian distribution, a different
one for each vowel sound. Classifying vowels
by these means may be accomplished by the methods
of the next section but one. A projection program
which allows you to look at the data is also available
on the disk.
There is a program called fview written by Gareth Lee which allows projection of views and rotations of higher dimensional data. The program is available by anonymous FTP from ``ciips.ee.uwa.edu.au'' (I/P number 130.95.72.69), is located in directory ``pub/speech/tools/fview'', and is called ``fview1.0.tar.Z''. It takes up about 2.5Mbytes. It should run straight-out-of-the-box on SparcStations and does not require XView to be installed. It will need to be compiled to run on other machines and will need XView. The same site contains papers and data concerned with Pattern Recognition in various sub-directories.
After fview had been written, the UNIX utility XGOBI became public and is more useful for some kinds of data. It is a very nice piece of work, and is warmly recommended to everyone who works with data in dimensions higher than two.
I cannot stress too much the fundamental silliness of trying to solve a problem of pattern recognition using a package that saves anybody from ever looking at the data or thinking about how it was obtained or generated. Any method of examining the data by eye is a help. Trying to get out of the business of getting your hands dirty is to embrace intellectual death, you turn into one of those arty types who can talk about everything but can't actually do anything. This is even worse than being one of the hearty, unintellectual types who can remember the recipes but has no idea why they work or how anybody devised them.