![]()
In fact it is
rather likely to cross your mind that the most
attractive model is the one with p(H) = 0.8.
It is not hard to prove that this particular model gives
a bigger value of the probability of getting
that output than any other model from that family.
I shall call it the Maximum Likelihood
Model.
Note that in this case we have a nice little
continuous space of models, each one specified
by a number between 0 and 1. I shall refer to
this as a one parameter family of models. More
generally,
it is often the case that we have one model for
each point in a manifold or manifold with boundary,
and such families of models are referred to as
parametric models. In some cases it is
considered necessary by statisticians to deal
with a set of models which cannot be, or at least
have not yet been, finitely parametrised and such
sets of models are called non-parametric.
This invites critical comment based on the observation that life provides only finite amounts of data and the conclusion that one cannot hope to distinguish between too many models in such a case, but I am trying to be good.
For any parametric family of models, that is to
say one where we have a manifold each point of
which is a model, and for any fixed data set, we can
compute a likelihood of each model having
produced that data. Thus we have a map from the
cartesian product of the manifold and the possible
data sets into the Reals. In many cases, for a
fixed data set, this function has a unique maximum,
that is, there is a unique model for which the
likelihood of a fixed
data set is a maximum. There is a strong intuitive
feeling that induces many people to show a
particular fondness for the Maximum Likelihood
model, and there are routine ways of computing
it for several families of models. For example, given
a family of gaussians on
parametrised
by the centres and the covariance matrices, computing
the centroid and the covariance for the data
gives the maximum likelihood model.