next up previous contents
Next: The view into Up: An Introduction to Pattern Previous: Bibliography

Decisions: Statistical methods

In the last chapter I took a somewhat jaundiced look at the raw ideas of modern statistics and probability theory; I can only hope that the thoughtful reader is still with us and has not abandoned the entire enterprise in disgust.

In that chapter, I drank deep of Philosophical matters to an extent which can drive many an engineer bananas. Some people like that kind of thing, but it doesn't go with the hard headed attitude of those who want to get on with programming their robot to distinguish between eggshells and the best china. The trouble is, that it is possible to sell a kind of intellectual snake oil to such engineers, as the existence of all those books on Fuzzy Set Theory makes clear, so those engineers should start getting more sceptical. Looking critically at the relatively honest and only faintly crackpot ideas of the probabilists is a good start. What looks like a promising avenue of enquiry to some can look to others like the ravings of a glue-sniffing, born-again fruit bat. While, in the fullness of time, truth will out and either your programs will work or they won't, it can take a long time to get to this happy state, and some critical thought about what kind of methods merit consideration can save a lot of effort.

In other words, I am only mildly apologetic. If that.

In this chapter I shall give some statistical methods for coping with the situation where some measurements have been made, the data consists of some collection of points in ${\fam11\tenbbb R}^n$,each point labelled as belonging to some finite set of categories, and the problem is to decide to which category a new, as yet unlabelled, point belongs. If the attempt looks to be a desperate one, fraught with risk and uncertainty, and if the justifications offered seem to be shonky and unconvincing, we can only say with Bruce Fairbairn's famous character, `if you know a better 'ole, go to it.' The reader who still has his feet up and is getting slightly sloshed would do well to sit up straight and drink some coffee.

First an important principle in any branch of Science or Engineering: eyeball the data. Look at it, run it through your fingers, get a feel for it. Learn to know its little vagaries. In these days of instant packages which allow you to get out results using methods you don't understand, implemented by people you don't know, on equipment you haven't tested, it is very easy to produce totally mindless garbage, orders of magnitude faster and stupider than at any time in history. Have no part of this. Mindless stupidity is always popular with the mindlessly stupid, but it doesn't cut the mustard. So, I repeat, EYEBALL THE DATA! In the case of data in dimension 12, say, this is not so easy as if the dimension is 2, when you can plot O's and X's as I did in chapter one. But given the powers of the computer it is not much harder to project it from dimension n onto the screen of a computer, and to spin it and look at it from several angles. So we start off with a painless bit of linear algebra.



 
next up previous contents
Next: The view into Up: An Introduction to Pattern Previous: Bibliography
Mike Alder
9/19/1997