next up previous contents
Next: Problems with EM Up: How many things in Previous: Example

The Akaike Information Criterion

The Akaike Information Criterion (AIC) is another attempt to tackle the problem of working out how much to penalise the parameters of the model. It proceeds by making some rather strong assumptions about the family of models, and also that the data is in fact generated by some member of the family, which is a hierarchically ordered set of manifolds, each manifold embedded at height zero in its successor. By investigating the way in which the maximum likelihood model will overestimate the expected log-likelihood (by fitting itself to noise, just like you seeing patterns in the fire), Akaike came out with the rule:

(AIC) Maximise the function (Log-Likelihood(Data) - Number of parameters)

This is easy to compute for gaussian mixture models, and omits all reference to the question of the precision of the data or the parameters. All we have to do is to use EM to get the maximum likelihood model (but see below) and then compute the Log-Likelihood and subtract the number of parameters. Do this for a reasonable range of the number of gaussians and pick the maximum.

It will be seen that the only differences between the two formulae are the base of the logarithms (natural logs for the AIC) and the precision of the parameters, since the precision, dimension and number of the data are constant. Generally the Stochastic Complexity criterion will penalise the number of parameters more heavily than the AIC. The AIC is only appropriate when there is a fairly large data set.

In addition to the AIC, the BIC due to Scwartz and Bayesian arguments may also be used to compute penalties for having too many parameters, but it is a matter of opinion as to which of these, if any, is appropriate for any particular problem. See the paper by Speed and Yu, in the Bibliography for the last chapter, for a contemporary examination of the issues for a special case.

I shall return to the matter of compression and optimal modelling when we come to Neural Net models, which are notorious for having enormous dimensions for representing the data and consequently large numbers of parameters.


next up previous contents
Next: Problems with EM Up: How many things in Previous: Example
Mike Alder
9/19/1997