next up previous contents
Next: Fundamentals of dynamic patterns Up: Filters Previous: Wiener Filters

Adaptive Filters, Kalman Filters

This brief discussion of adaptive filtering is intended to place the general ideas of what they do and how they do it in some sort of context relevant to pattern recognition. The details of how to do the sums are easily obtained from the sources given in the bibliography, should that look to be a good idea.

The last subsection should have made it clear that computing the values of an optimal MA filter was essentially the same idea as fitting a straight line through a set of points. After all, if I give you data consisting of $\{ (x_1,y_1),(x_2,y_2),(x_3,y_3)\}$ and ask for the line y = mx+c that fits the data best, then you follow Gauss (by about two hundred years; my how time flies when you're having fun) and simply take the sum of squares:

(y1 - (mx1 + c))2 + y2 - (mx2 + c))2 + (y3 - (mx3 + c))2

and choose m and c so as to make this a minimum. This sum of squares we regard as the distance from the point % latex2html id marker 7815
$ \left( \begin{array}
{c} 
y_1 \\  y_2 \\  y_3 \end{array} \right) $to the plane % latex2html id marker 7817
$ m \left( \begin{array}
{c} x_1 \\  
x_2 \\  x_3 \end{array} \right)+ c
\left( \begin{array}
{c} 1 \\  1 \\  1 \end{array}\right) 
$. This is a minimum at a point $\u$ on the plane

when the line joining % latex2html id marker 7821
${\bf y}= \left( \begin{array}
{c} 
y_1 \\  y_2 \\  y_3 \end{array} \right) $ to the point $\u$ is orthogonal to the plane and hence to its basis vectors. This is precisely what we did with a filter having two coefficients. Gauss was confronted with the problem of continuously updating his estimate; he had the batch mode least squares worked out by 1795, and devised an online least squares system quarter of a century later. He also did rather a lot of other things in the same period.

If I give you a stream of (xi,yi) and leave you to find the best fit straight line to the data, you can wait until the stream stops and then start working or you can take a block of some number, do it for the block, and then work out how to pool your results with those for the next block of data. A block of data can of course consist of just one datum. If you have reason to suspect the best fit line may be changing in time, you may want to discount old data somewhat. When the model does not change its parameters in time we say it is stationary, a definition which raises as many questions as it answers, and a recursive least squares best fit algorithm which can discount the remote past can track a non-stationary process.

The various methods of doing adaptive filtering, are, in effect doing just this. The details can be found in some of the books given in the references.

In Kalman filtering the process is taken to a high level. In the simplest case, the equation y= mx+c is generalised so that y,x and c are vectors and m is a matrix, and c is often taken to be gaussian white noise that has been filtered by a MA filter to autocorrelate the noise. It remains true however that except for the complication of an inner product other than the old dot product to describe the varying degree of credibility of past data and the autocorrelations of the noise with time, what we have is basically a least squares best fit problem done sequentially rather than in batch mode. The choice of a modified least squares can either be justified by placing restrictions on the model class (which are never in practice confirmed to be correct), or not justified at all and regarded as a simple and workable heuristic (as Gauss did) for getting an answer of some sort which is justified post hoc and traded in for a better method when one turns up. Statisticians seem to be tempted by the first approach; they feel the comforts of philosophy keenly. Engineers, as is well known, incline to be brutal, insensitive oafs[*] who can't distinguish clearly between rationales and rationalisations and don't trust either.

Kalman filtering is of considerable interest partly because instead of the vector c, above, being a noise vector to be got rid of, it may also be interpreted as a control function to be used to force the system from one state to another, so there is a theory of control which is dual to the theory of filters. This halves the amount of work you have to put in to understand a given amount of material, which is a good thing.


next up previous contents
Next: Fundamentals of dynamic patterns Up: Filters Previous: Wiener Filters
Mike Alder
9/19/1997