next up previous contents
Next: Other Possibilities Up: Lots of Gaussians: The Previous: Lots of Gaussians: The

The EM algorithm for Gaussian Mixture Modelling

1. Initialise the k gaussians so that each of the ${\bf V}_i$ is the identity matrix, each of the wi is $\frac{1}{k}$, and the k centres are taken to be some small random step in ${\fam11\tenbbb R}^n$ away from a randomly selected point of the data set X.

2. For each point $ {\bf x}\in {\cal X} $, and for each i between 1 and k, compute the weighted likelihood $ w_i g_{[{\bf m}_i,{\bf V}_i]} ({\bf x}) $ using the last estimate of the parameters $w_i, {\bf m}_i,
{\bf V}_i$. For each point ${\bf x}$, let ${\cal L}_i({\bf x})$ be the likelihood assigned to it by the ith member of the mixture:

\begin{displaymath}
{\cal L}_i({\bf x}) = w_i g_{[{\bf m}_i,{\bf V}_i]} ({\bf x}) \end{displaymath}

and let Sx be the sum

\begin{displaymath}
S_x = \sum_{i= 1}^k w_i g_{[{\bf m}_i,{\bf V}_i]} ({\bf x}) 
 = \sum_{i=1}^k {\cal L}_i({\bf x}) \end{displaymath}

and let

\begin{displaymath}
P_i({\bf x}) = \frac{{\cal L}_i({\bf x})}{S_x} \end{displaymath}

We think of the point ${\bf x}$ as ready to be split up between the k gaussians according to the k fractions

\begin{displaymath}
\frac{w_i g_{[{\bf m}_i,{\bf V}_i]} ({\bf x})}{S_x} \end{displaymath}

In consequence, we have that $ \sum_{{\bf x}\in {\cal 
X}} \sum_{i=1}^k P_i({\bf x}) = \vert{\cal X}\vert$, the cardinality of ${\cal X}$.

3. Now re-estimate the wi as

\begin{displaymath}
w'_i = \frac{\sum_{{\bf x}\in {\cal X}} P_i({\bf x})}{\vert{\cal 
X}\vert} \end{displaymath}

and the centres ${\bf m}'_i$ as the weighted centroids

\begin{displaymath}
{\bf m}'_i = \frac{\sum_{{\bf x}\in {\cal X}} P_i({\bf x}) 
{\bf x}}{\sum_{{\bf x}\in {\cal X}} P_i({\bf x})} \end{displaymath}

and finally, the ith covariance matrix, ${\bf V}'_i$ is re-estimated as

\begin{displaymath}
{\bf V}'_i = \frac{1}{(\sum_{{\bf x}\in {\cal X}} P_i({\bf x...
 ...\left[ ( {\bf x}- {\bf m}'_i) 
({\bf x}- {\bf m}'_i)^T \right] \end{displaymath}

4. Run through steps 2 and 3 until there is no significant change in any of the numbers being estimated, then stop.

It may be shown that in principle this converges to a maximum likelihood mixture model for the data set X with k gaussians in the mixture. In practice it is found that the initialisation is rather critical and one does not always get good results. It is easy to experiment with data such as Fig.4.1 in two dimensions, where the data comes from a known number of gaussians, and to discover that very unsatisfactory `solutions' are found by the above process, as Fig.4.3 shows.


 
Figure 4.3: A `solution' found by the EM algorithm with poor initialisation.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig4.3.ps}\end{figure}

The problem of suitably initialising the EM algorithm will be discussed later, as will the issue of finding the `right' value of k, the number of gaussians. In the chapter on Quadratic Neural Nets I shall describe a wholly dynamical approach to the business of modelling point sets which has affinities with mixture modelling.

Inspecting the data by projection can help in suggesting a suitable number of gaussians to employ in a mixture model, and can also suggest reasonable initialisations of the centres.

It is worth noting that the estimates of the eigenvalues obtained from diagonalising the covariance matrix are biased in an interesting way: the true ellipsoid is likely to be shorter and fatter than the estimate obtained from a data set of limited size. In other words the larger eigenvalues are overestimated and the smaller ones underestimated. One can accomodate this in various ways which are left to the imagination and enterprise of the reader.


next up previous contents
Next: Other Possibilities Up: Lots of Gaussians: The Previous: Lots of Gaussians: The
Mike Alder
9/19/1997