Suppose C(m) is a cost associated with choosing m and being wrong, and C(f) is a similar cost of wrongly choosing f under the impression it is right when it isn't. Suppose we are about to make a decision for a particular x, assigning it to m or f.
If we choose m and we are right, the cost is zero, if we choose m and we are wrong, then it is really an f, and the fraction of times it is an f is p(f|x). So a strategy of always choosing m on observing x would have an expected loss of :
![]()
Similarly, the strategy of always choosing f after having observed x would have a total expected cost of:
![]()
In order to minimise the total cost, at each decision point we simply choose the smaller of these two numbers. Making the Bayesian substitution to obtain things we can more or less calculate, we get that the rule is to pick the smaller of

i.e. We get the Bayes Classifier Rule:
Choose m iff
.
In the event of there being more than two categories, some minor modifications have to be made, which will be left as an exercise for the diligent student.
The above analysis is rather simple minded; for a more sophisticated discussion see the paper On the Derivation of the Minimax Criterion in Binary Decision Theory by Pyati and Joseph, as well as the following note by Professor H.V. Poor in the IEEE Information Theory Society Newsletter, Vol. 43, No.3. September 1993.