next up previous contents
Next: How many things in Up: Bayesian Decision Previous: Non-parametric Bayes Decisions

Other Metrics

Suppose we have fitted a gaussian to some data but have some difficulty in believing that the distribution is truly gaussian. We may nevertheless feel that in fitting a quadratic form to the data we have done something to say a little about how the data is distributed. For example, if we had the data of fig. 1.4 and a gaussian for each category of point, we might believe that it told us something about the right metric to use. Calculating likelihoods requires some assumptions about how the density of the data falls off with distance, and we may feel uncommitted to the gaussian model, but we may nevertheless feel that the gaussian model is telling us something about the choice of units and the way in which distances ought to be measured.

Given a gaussian distribution containing the quadratic form

\begin{displaymath}
({\bf x}-{\bf m})^T {\bf V}^{-1}({\bf x}-{\bf m})\end{displaymath}

we may use this to calculate a number for any ${\bf x}$. If we get the result 1, we have said that the result is the distance of ${\bf x}$ from ${ {\bf m}}$ is one standard deviation, generalising the one dimensional sense. In general the square root of the result for any ${\bf x}$ is called the Mahalanobis distance of ${\bf x}$ from ${ {\bf m}}$ relative to the form ${\bf V^{-1}}$. Alternatively, the Mahalanobis distance is just the distance measured in standard deviations.

A Riemannian Metric on the space ${\fam11\tenbbb R}^n$ is specified by assigning a positive definite, symmetric, quadratic form to every point of the space. If you have a curve in the space, you can take a set of points along it, and compute the Mahalanobis distance of each point from its predecessor, then add them up. Doing this with more and more points gives, in the limit, the length of the curve in the given metric. To actually compute the distance between two points in the given metric, take all curves joining them and find the length of the shortest.

The assignment of a quadratic form to each point of a space constitutes an example of a tensor field, and classical physics is full of them. General Relativity would be impossible without them.

We are not going to obtain a Riemannian metric from a finite data set without a good deal of interpolation, but there are occasions when this might be done. In the case of fig.1.4 for example, we can argue that there are grounds for putting on the same quadratic form everywhere, which is equivalent to squashing the space along the X-axis until the ellipses fitting the data turn into circles, and then using the ordinary Euclidean distance.

It is generally quicker to compute Mahalanobis distances (if not say them) than likelihoods since we save an exponentiation, and simply using the Mahalanobis distance to the various centres and choosing the smallest may be defended as (a) quick, (b) using a more defensible metric and (c) containing less compromising assumptions. If the determinants of the different forms are all the same, this will give the same answers anyway. And if they do not differ by very much, we can argue that we are kidding ourselves if we are pretending they are really known accurately anyway, so why not go for the quick and dirty?

It is not uncommon to take the logarithms of the likelihoods, which amounts to just the Mahalanobis distance plus a correction term involving the determinant, and since the logarithm is a monotone function, this will give precisely the same answers as the likelihood if we are just choosing the greater likelihood source. The same simplification may be introduced in the case of gaussian mixture models.


next up previous contents
Next: How many things in Up: Bayesian Decision Previous: Non-parametric Bayes Decisions
Mike Alder
9/19/1997