next up previous contents
Next: Computing PDFs: Gaussians Up: Decisions: Statistical methods Previous: Decisions: Statistical methods

The view into ${\fam11\tenbbb R}^n$

Suppose you have a few hundred vectors of two categories, each vector of dimension 12. It may be possible to look at the column of numbers and tell by eye which is which. If the points are well separated this can sometimes be done. This primitive and basic eyeballing of the data is well worth trying. Sometimes it can show that you have done something daft, such as having a zero vector in both categories, which has been known to be produced by a program bug.

If the number of categories is bigger than two, or the data less well separated, looking at the columns of numbers may be uninformative. But just as a photograph of, say, a tree is a projection of a three dimensional object onto two dimensions, so it is possible, indeed easy, to project from ${\fam11\tenbbb R}^n$ down to ${\fam11\tenbbb R}^2$ for any n. And to get from any bounded piece of ${\fam11\tenbbb R}^2$ to the computer screen is just shrinking or stretching and shifting. Any view of a piece of ${\fam11\tenbbb R}^3$ will leave somethings hidden behind others and leave it unclear as to the relative distances, but doing any rotation about almost any axis will show us the relative spacings and reveal what was hidden. If we can rotate ${\fam11\tenbbb R}^n$ and look at the resulting data from different perspectives, we can sometimes see some useful structure in the data.

Suppose we have two orthogonal vectors $ {\u},{\v}$ in ${\fam11\tenbbb R}^n$ of length 1. Then if ${\bf x}$ is a point in ${\fam11\tenbbb R}^n$ and $ \cdot $ is the usual dot product, the projection of ${\bf x}$ onto the plane spanned by $ {\u},{\v}$ is $ ({\bf x}\cdot \u)\u + ({\bf x}\cdot 
\v)\v $. So by taking the two numbers ${\bf x}\cdot \u , {\bf x}\cdot \v $ we map the point ${\bf x}$ into ${\fam11\tenbbb R}^2$. It is then straightforward to plot these points on the computer screen, and one of the programs on disk does this for you.

Rotations in ${\fam11\tenbbb R}^n$ are most easily thought about as composites of rotations in a choice of plane. The three dimensional case is rather special, since in choosing a plane to rotate in ${\fam11\tenbbb R}^3$ we are also choosing an orthogonal line which is fixed. In ${\fam11\tenbbb R}^4$, we can leave a two dimensional subspace fixed, we rotate about a plane. So it is simpler to stop thinking in terms of axes of rotation and to concentrate on the plane that is moved into itself. It is easiest to take these to be defined by some choice of two different axes.

We therefore obtain a basic rotation in ${\fam11\tenbbb R}^n$ by choosing distinct i and j between 1 and n. We write down the identity n by n matrix, and change the locations (i,i), (i,j),(j,i) and (j,j). In the (i,i),(j,j) locations we write $cos(\theta)$, for some choice of $\theta$,in location (i,j) we write $sin(\theta)$ and in location (j,i) we write $-sin(\theta)$ for the same choice of $\theta$. The new matrix rotates ${\fam11\tenbbb R}^n$ by an angle $\theta$ in the (i,j) plane.

By making any choices of i,j and $\theta$, we get different rotations. By composing one after another we can get any special orthogonal matrix.

Initialising the vectors $\u$ and $\v$ to being, say, the first two elements in the standard basis for ${\fam11\tenbbb R}^n$, and then applying a sequence of rotations, we can view the points of the data set from a variety of angles. We regard our two `window vectors' as the things to be operated on rather than the data points, as this is a lot quicker.

There are some obvious modifications to the above algorithm, such as scaling (zooming in or out) and shifts in the space. After a large number of rotations, it may happen that the window vectors get deformed out of true: they may no longer be orthonormal as a result of the accumulation of rounding errors in the matrix multiplications. It is fairly easy to renormalise the window vectors if this happens.

There are several advantages to inspecting the data by this method; outliers may be found (and sometimes recognised as errors), and the decision as to whether to use Statistical or Neural Net methods or alternatives may be taken in some cases. It has been found, for example, that some data appear to consist of one gaussian for each category of point under every explored projection, and in this case it is tempting to automate the visual inspection: make random choices of rotation, for each projection test to see if the distribution is acceptably gaussian, using any of the standard statistical methods, and repeat. If it is gaussian from enough directions, where `enough' depends on n, we might feel that it would be sensible to use a gaussian distribution to model the data. If it looks very much like a collection of convex sets bounded by hyperplanes, one might use the classical neural nets.

On the disk may be found some images of vowels obtained by extracting from the NIST data base samples of many speakers saying `Ah' and `Oo'. Each sound was sampled and a Fast Fourier Transform applied to obtain the power spectrum. The results were binned into 12 different frequency bands, spaced more or less logarithmically along the frequency axis, and the resulting twelve numbers used to describe the sound. An utterance was thus turned into a trajectory in ${\fam11\tenbbb R}^{12}$, although rather a short one in this case. The collection of such trajectories is fitted reasonably well by a twelve dimensional gaussian distribution, a different one for each vowel sound. Classifying vowels by these means may be accomplished by the methods of the next section but one. A projection program which allows you to look at the data is also available on the disk.

There is a program called fview written by Gareth Lee which allows projection of views and rotations of higher dimensional data. The program is available by anonymous FTP from ``ciips.ee.uwa.edu.au'' (I/P number 130.95.72.69), is located in directory ``pub/speech/tools/fview'', and is called ``fview1.0.tar.Z''. It takes up about 2.5Mbytes. It should run straight-out-of-the-box on SparcStations and does not require XView to be installed. It will need to be compiled to run on other machines and will need XView. The same site contains papers and data concerned with Pattern Recognition in various sub-directories.

After fview had been written, the UNIX utility XGOBI became public and is more useful for some kinds of data. It is a very nice piece of work, and is warmly recommended to everyone who works with data in dimensions higher than two.

I cannot stress too much the fundamental silliness of trying to solve a problem of pattern recognition using a package that saves anybody from ever looking at the data or thinking about how it was obtained or generated. Any method of examining the data by eye is a help. Trying to get out of the business of getting your hands dirty is to embrace intellectual death, you turn into one of those arty types who can talk about everything but can't actually do anything. This is even worse than being one of the hearty, unintellectual types who can remember the recipes but has no idea why they work or how anybody devised them.


next up previous contents
Next: Computing PDFs: Gaussians Up: Decisions: Statistical methods Previous: Decisions: Statistical methods
Mike Alder
9/19/1997