next up previous contents
Next: Cost Functions Up: Decisions: Statistical methods Previous: Other Possibilities

Bayesian Decision

Suppose now that we have, for each category of data point, modelled the set of points in that category by some suitable pdf, with a gaussian or a mixture of gaussians being likely to be suitable in practical cases quite a lot of the time. We have, in the vernacular, trained the model on the data, and now wish to use it on some new data, either to test the model to see how reliable it is or because we want to make an informed guess at what the category of a new datum is and this is our preferred method. I shall explain the ideas behind the best known decision methods.

Recall from the last chapter when we explained Bayes' Theorem and its use in decision, that we considered the case where gm(x) and gf(x) were the likelihoods produced by two competing models, one gaussian gm for the guys and another gf for the gals. We decided, you may recall, to regard these two numbers as the conditional probability of having got x given model m and the conditional probability of having got x given model f, respectively. We wrote this, with some reservations, as p(x|m) and p(x|f) respectively. These were interpreted by the sanguine, and us, as measuring the probability that the datum x will be observed given that model m (respectively, f) is appropriate. Now what we want is p(m|x) and p(f|x) respectively, the probabilities that the models are true given the observation x. By applying Bayes Theorem in a spirit of untrammelled optimism, we deduced that

\begin{displaymath}
p(m\vert x) = \frac{p(x\vert m) p(m)}{p(x)} \end{displaymath}

with a similar expression for p(f|x).

In the situation of the first problem at the end of chapter one; you will doubtless recall, yet again if you read the last chapter, the half naked ladies and possibly the trees from which they were to be distinguished. Now it can be, and was, argued that in the absence of any data from an actual image, we would rate the probability of the image being of a tree as 0.9 and of it being a naked lady as 0.1, on the grounds that these are the ratios of the numbers of images. Bayesians refer to these as the prior probabilities of the events. So in the above formulae we could put p(m) and p(f) in as numbers if we had some similar sort of information about the likelihoods of the two models. This leaves only p(x) as a number which it is a little hard to assign. Happily, it occurs in both expressions, and if we look at the likelihood ratio, it cancels out. So we have:

\begin{displaymath}
\frac{p(m\vert x)}{p(f\vert x)} = \frac{p(x\vert m)p(m)}{p(x\vert f)p(f)} \end{displaymath}

and the right hand side, known as the likelihood ratio is computable. Well, sort of.

We concluded in the last chapter that if you are prepared to buy this, you have a justification for the rule of thumb of always choosing the bigger value, at least in the case where p(m) and p(f) are equal. In this case, p(m|x) is proportional to p(x|m) and p(f|x) is proportional to p(x|f) and with the same constant of proportionality, so choosing whichever model gives the bigger answer is the Bayes Optimal Solution. More generally than the crude rule of thumb, if it is ten times as likely to be m as f which is responsible for a datum in a state of ignorance of what the actual location of the datum is, then we can use this to obtain a bias of ten to one in favour of m by demanding that the ratio of the likelihoods be greater than ten to one before we opt for f as the more likely solution.



 
next up previous contents
Next: Cost Functions Up: Decisions: Statistical methods Previous: Other Possibilities
Mike Alder
9/19/1997