Recall from the last chapter when we explained Bayes' Theorem and its use in decision, that we considered the case where gm(x) and gf(x) were the likelihoods produced by two competing models, one gaussian gm for the guys and another gf for the gals. We decided, you may recall, to regard these two numbers as the conditional probability of having got x given model m and the conditional probability of having got x given model f, respectively. We wrote this, with some reservations, as p(x|m) and p(x|f) respectively. These were interpreted by the sanguine, and us, as measuring the probability that the datum x will be observed given that model m (respectively, f) is appropriate. Now what we want is p(m|x) and p(f|x) respectively, the probabilities that the models are true given the observation x. By applying Bayes Theorem in a spirit of untrammelled optimism, we deduced that

In the situation of the first problem at the end of chapter one; you will doubtless recall, yet again if you read the last chapter, the half naked ladies and possibly the trees from which they were to be distinguished. Now it can be, and was, argued that in the absence of any data from an actual image, we would rate the probability of the image being of a tree as 0.9 and of it being a naked lady as 0.1, on the grounds that these are the ratios of the numbers of images. Bayesians refer to these as the prior probabilities of the events. So in the above formulae we could put p(m) and p(f) in as numbers if we had some similar sort of information about the likelihoods of the two models. This leaves only p(x) as a number which it is a little hard to assign. Happily, it occurs in both expressions, and if we look at the likelihood ratio, it cancels out. So we have:

We concluded in the last chapter that if you are prepared to buy this, you have a justification for the rule of thumb of always choosing the bigger value, at least in the case where p(m) and p(f) are equal. In this case, p(m|x) is proportional to p(x|m) and p(f|x) is proportional to p(x|f) and with the same constant of proportionality, so choosing whichever model gives the bigger answer is the Bayes Optimal Solution. More generally than the crude rule of thumb, if it is ten times as likely to be m as f which is responsible for a datum in a state of ignorance of what the actual location of the datum is, then we can use this to obtain a bias of ten to one in favour of m by demanding that the ratio of the likelihoods be greater than ten to one before we opt for f as the more likely solution.