next up previous contents
Next: Non-parametric Bayes Decisions Up: Bayesian Decision Previous: Bayesian Decision

Cost Functions

In the argument for choosing between models m and f using Bayes' theorem, we had a prior probability for p(m) and p(f) and we could calculate p(x|m) and p(x|f). There is an extension to the idea which tries to choose not just the most likely model, but the model which will be the least expensive. If we make a decision to opt for m and we turn out to be wrong, there is, on this model, some cost associated with the error. Likewise if one turns out to be wrong in a choice of f. Presumably, being right does not incur any costs. If it does, it might be prudent to refuse to make a decision at all.

Suppose C(m) is a cost associated with choosing m and being wrong, and C(f) is a similar cost of wrongly choosing f under the impression it is right when it isn't. Suppose we are about to make a decision for a particular x, assigning it to m or f.

If we choose m and we are right, the cost is zero, if we choose m and we are wrong, then it is really an f, and the fraction of times it is an f is p(f|x). So a strategy of always choosing m on observing x would have an expected loss of :

\begin{displaymath}
{\cal E}(m,f) = p(f\vert x) C(m) \end{displaymath}

Similarly, the strategy of always choosing f after having observed x would have a total expected cost of:

\begin{displaymath}
{\cal E}(f,m) = p(m\vert x) C(f) \end{displaymath}

In order to minimise the total cost, at each decision point we simply choose the smaller of these two numbers. Making the Bayesian substitution to obtain things we can more or less calculate, we get that the rule is to pick the smaller of

\begin{displaymath}
\frac{C(m) p(x\vert f)p(f)}{p(x)}\ , \ \ \frac{C(f) 
p(x\vert m) p(m)}{p(x)} \end{displaymath}

i.e. We get the Bayes Classifier Rule:

Choose m iff $ \frac{p(x\vert m)}{p(x\vert f)} \gt \frac{C(m)p(f)}{C(f)p(m)} $.

In the event of there being more than two categories, some minor modifications have to be made, which will be left as an exercise for the diligent student.

The above analysis is rather simple minded; for a more sophisticated discussion see the paper On the Derivation of the Minimax Criterion in Binary Decision Theory by Pyati and Joseph, as well as the following note by Professor H.V. Poor in the IEEE Information Theory Society Newsletter, Vol. 43, No.3. September 1993.


next up previous contents
Next: Non-parametric Bayes Decisions Up: Bayesian Decision Previous: Bayesian Decision
Mike Alder
9/19/1997