next up previous contents
Next: Where do Models come Up: Statistical Ideas Previous: Models and Data: Some

Maximum Likelihood Models

Given a probabilistic model and a set of data, we can calculate the likelihood that the model `produced' each output value and indeed the likelihood that it produced the whole lot in independent applications of a somewhat mythical generating process. Given two such models, we can find out which is more likely to have produced the actual data. For example, if we threw a coin 10 times and got 8 Heads (H) and 2 Tails (T) in some particular order, we might try the family of models

\begin{displaymath}
\{ q \in [0,1] : p(H) = q\} \end{displaymath}

and decide that the p(H) = 0.5 model is not so likely to have produced the outcome as the model with p(H) = 0.7. The computation is rather simple: The p(H) = 0.5 model has p(T) = 0.5 also, and so the probability of the actual data is (0.5)8 (0.5)2. This is, of course, the same as for any other sequence of outcomes, although there are many different ways of getting 8 Heads and two Tails. The probability of having generated the same sequence if p(H) = 0.7 is (0.7)8 (0.3)2, and some messing with a calculator gives 0.0009765 for the former and 0.005188 for the latter. So we get more than five times the probability of obtaining 8 Heads and 2 Tails on the second model as on the first.(In any order.)

In fact it is rather likely to cross your mind that the most attractive model is the one with p(H) = 0.8. It is not hard to prove that this particular model gives a bigger value of the probability of getting that output than any other model from that family. I shall call it the Maximum Likelihood Model. [*] Note that in this case we have a nice little continuous space of models, each one specified by a number between 0 and 1. I shall refer to this as a one parameter family of models. More generally, it is often the case that we have one model for each point in a manifold or manifold with boundary, and such families of models are referred to as parametric models. In some cases it is considered necessary by statisticians to deal with a set of models which cannot be, or at least have not yet been, finitely parametrised and such sets of models are called non-parametric.

This invites critical comment based on the observation that life provides only finite amounts of data and the conclusion that one cannot hope to distinguish between too many models in such a case, but I am trying to be good.

For any parametric family of models, that is to say one where we have a manifold each point of which is a model, and for any fixed data set, we can compute a likelihood of each model having produced that data. Thus we have a map from the cartesian product of the manifold and the possible data sets into the Reals. In many cases, for a fixed data set, this function has a unique maximum, that is, there is a unique model for which the likelihood of a fixed data set is a maximum. There is a strong intuitive feeling that induces many people to show a particular fondness for the Maximum Likelihood model, and there are routine ways of computing it for several families of models. For example, given a family of gaussians on ${\fam11\tenbbb R}^n$ parametrised by the centres and the covariance matrices, computing the centroid and the covariance for the data gives the maximum likelihood model.



 
next up previous contents
Next: Where do Models come Up: Statistical Ideas Previous: Models and Data: Some
Mike Alder
9/19/1997