next up previous contents
Next: The Akaike Information Criterion Up: How many things in Previous: Overhead

Example

Suppose we have data consisting of the 18 points:

\begin{displaymath}
\{ -1.2, -1.1,-1.1,-1.0,-1.0, -1.0, -0.9,-0.9,\end{displaymath}

\begin{displaymath}
-0.8, 0.8, 0.9,0.9,1.0,1.0, 1.0,1.1, 1.1, 1.2 
\} \end{displaymath}

I decide to model this with two gaussians or maybe just one, on the interval from -2 to 2. I decide that I only care about the data accuracy to one part in 64, which is to say 6 bits. Rather than normalise, I shall assume that all data must come in the interval from -2 to 2. My first model for maximum likelihood with 2 gaussians will have equal weights and a centre at -1 with a variance of .015 and hence one standard deviation is 0.1225. The same applies to the second gaussian except that here the centre is at +1. With this value for $\sigma$ (to 6 bit precision) the model for f is

\begin{displaymath}
\frac{1}{2} \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x+1)^2}{...
 ... \frac{1}{\sigma \sqrt{2\pi}} 
e^{-\frac{(x-1)^2}{ 2\sigma^2}} \end{displaymath}

The likelihoods may be approximated (by ignoring the contribution from the other gaussians) as:

data point      likelihood       log-likelihood 

-1.2             0.4293           -0.845567   
-1.1             1.16699           0.154433  
-1.1             1.16699           0.154433 
-1.0             1.628675          0.487767 
-1.0             1.628675          0.487767 
-1.0             1.628675          0.487767 
-0.9             1.16699           0.154433 
-0.9             1.16699           0.154433 
-0.8             0.4293           -0.845567
 0.8             0.4293           -0.845567   
 0.9             1.16699           0.154433   
 0.9             1.16699           0.154433  
 1.0             1.628675          0.487767 
 1.0             1.628675          0.487767
 1.0             1.628675          0.487767 
 1.1             1.16699           0.154433   
 1.1             1.16699           0.154433  
 1.2             0.4293           -0.845567
This gives the sum of the log-likelihoods as 0.78 which tells us that the model is pretty bad. I have taken natural logarithms here, if I convert to bits instead of nats, I get 1.1 bits.

The bit cost is therefore a total of

(18)(6) -1.1 + (2)(6)(3)

bits, that is 142.9.

The corresponding calculation for one gaussian has $ \sigma \approx 1.06$ and all the likelihoods are around 0.22, so the sum of the log likelihoods is about -27.25 nats. This works out at around a 39.3 bit disadvantage. This should tell you that a bad model is worse than the uniform model, which is to say worse than sending the data. The bit cost of using a single gaussian centred on the origin therefore works out at

(18) (6) + 39.3 + (1)(6)(3)

that is, 165.3 bits. This is bigger, so it is better to use two gaussians than one to model the data, but it is even better to just send the data at a cost of 108 bits. You should be prepared to believe that more data that fitted the model more convincingly would give a genuine saving. Even I think I could believe that, although, of course, I can't be sure I could.

Better check my arithmetic, it has been known to go wrong in the past.


next up previous contents
Next: The Akaike Information Criterion Up: How many things in Previous: Overhead
Mike Alder
9/19/1997