next up previous contents
Next: Adaptive Filters, Kalman Filters Up: Filters Previous: States

Wiener Filters

To take the simplest example of filtering, suppose $\{u = u(n): n \in {\fam11\tenbbb Z}\}$ is an input series and $\{v = v(n): n \in {\fam11\tenbbb Z}\}$ is an output series. I want to have a convolution of u with some finite filter so that u looks like v. Let us suppose that I have some finite amount of history of the past of u, say for $j\in [1 \ldots 
n]$. Then I want to choose some coefficients $b_0,b_1,\cdots, b_k$ so that

\begin{displaymath}
\sum_{j \in [1 \ldots n]} (v_j - b_0 u_j - 
b_1 u_{j-1} - \cdots b_k u_{j-k})^2\end{displaymath}

is a minimum.

If I take k to be 1, for purposes of exposition, then I am trying to minimise

\begin{displaymath}
\sum_{j \in [1 \ldots n]}( v_j - b_0 u_j - 
b_1 u_{j-1})^2 \end{displaymath}

The expression is best seen as a distance in ${\fam11\tenbbb R}^n$ from the point % latex2html id marker 7679
$\left(\begin{array}
{c} v_1 \\ v_2\\  \vdots \\  v_n\end{array} \right)$ to the plane % latex2html id marker 7681
$ b_0 \left(\begin{array}
{c} u_1 \\  u_2\\  \vdots ...
 ...1
\left(\begin{array}
{c} u_0 \\  u_1\\  \vdots \\  
u_{n-1}\end{array} \right)$ determined by all possible choices of b0 and b1. In general, it will be a k-dimensional subspace determined by all possible choices of the bi.

The problem of finding the closest point on a plane, P to a given point v in ${\fam11\tenbbb R}^n$ but not on the plane, is a simple piece of linear algebra. We observe that if the closest point to v on P is called q, then the line joining the point v to the point q is orthogonal to the plane. Hence it is orthogonal to the vectors which span the plane. This follows immediately from Pythagoras' theorem, although it is called rather pompously the `orthogonality principle' in many text books. If you draw a picture for n=3 you should have no particular difficulty with this: see Fig. 6.3.


  
Figure 6.3: Closest point to a plane through the origin: line v q orthogonal to plane.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig6.4.ps}\end{figure}

For orthogonal vectors the dot product is zero, so we obtain the two equations

\begin{displaymath}
% latex2html id marker 7356
\left( b_0 \left(\begin{array}
{...
 ...y}
{c} u_1 \\  u_2 \\  \vdots \\  
u_n \end{array} \right) = 0 \end{displaymath}

and

\begin{displaymath}
% latex2html id marker 7357
\left( b_0 \left(\begin{array}
{...
 ...c} u_0 \\  u_1 \\  \vdots \\  
u_{n-1} \end{array} \right) = 0 \end{displaymath}

If instead of having a filter of order 2 we had used k coefficients, we should have had the same orthogonality condition (still by Pythagoras' Theorem!) but this time there would have been k of them. $\bullet$ denotes the dot product in ${\fam11\tenbbb R}^n$.

The k equations are called the normal equations and solving them gives the point q. They are simply k linear equations in k unknowns. We leave it to the reader to decide when there is a unique solution.

In the filtering case, the equations are also known as the Yule-Walker equations, and the corresponding equations in the continuous case are called the Wiener-Hopf equations. [*] The solution gives a so-called optimal filter but it should be noted that we are simply trying to get as close as we can to a known point in ${\fam11\tenbbb R}^n$ while staying in a linear subspace.

An obvious practical objection to the process as described above, is that the filter pays as much attention to the current values as it does to the remote past ones. In other words we have chosen a particular (Euclidean) metric on the space ${\fam11\tenbbb R}^n$ given by

\begin{displaymath}
% latex2html id marker 7358
\left( d \left( \left( \begin{ar...
 ...(x_n -y_n)^2 + (x_{n-1} - y_{n-1})^2 + \cdots 
+ (x_1 - y_1)^2 \end{displaymath}

where it might be more realistic to discount the past by weighting it less. This can be achieved by changing the metric to:

\begin{displaymath}
% latex2html id marker 7359
\left( d \left( \left( \begin{ar...
 ..._{n-1} - y_{n-1})^2 
+ \cdots + \frac{1}{2^{n-1}}(x_1 - y_1)^2 \end{displaymath}

This can be done by defining a new inner product. The choice of $\frac{1}{2}$ raised to increasing powers can be varied to taste. A new inner product can be conveniently represented by a symmetric, positive definite matrix. If ${\bf x}$ and ${\bf y}$ are vectors and A is such a matrix, the new inner product will be ${\bf x}^T A {\bf y}$ and the good old `dot' product is simply the special case where A is the identity matrix. The above equations may look slightly different in this form.

It is common to rewrite the (k+1) normal equations as follows:

\begin{displaymath}
\forall j \in [0 \cdots k], (b_0 \u_n + b_1 
\u_{n-1} + \cdots b_k \u_{n-k})\bullet \u_{n-j} 
=
\v \bullet \u_{n-j} \end{displaymath}

or alternatively as:

\begin{displaymath}
\forall j \in [0 \cdots k], \sum_{i=0}^k b_i 
(\u_{n-i} \bullet \u_{n-j}) = \v \bullet \u_{n-j} \end{displaymath}

If you reflect on what n ought to be in practice, you can see that it should be as long a sample as you have from starting time to finishing time, except for the small problem of shifting outside the region for which we have data. A certain amount of wastage will occur at the ends of a finite sample. Note that we do not in fact need the time series $\v$ we need only its dot product with different shifts of the time series $\u$.

A slightly different formulation can be obtained by dividing the above equations on both sides by n; in this case we can define the Auto-Correlation Function (ACF) for a time series u as the function which assigns to an integer j the average or expected value of u(n)u(n-j).

We can define the Cross-Correlation Function (CCF) for u and v to be the average or expected value of v(n) u(n-j). If the time series u is stationary then

\begin{displaymath}
\frac{1}{n} (\u_{n-i} \bullet \u_{n-j}) = 
r_{11}(i-j) \end{displaymath}

where r11(t) is the ACF of u at t. Similarly

\begin{displaymath}
\frac{1}{n} \v \bullet \u_{n-j} = r_{12}(j) \end{displaymath}

where r12(t) is the CCF of v and u. Whereupon the Normal-Yule-Walker-Wiener-Hopf equations become:

\begin{displaymath}
\forall j \in [0 \cdots k],\ \sum_{i=0}^k 
b_i r_{11}(i,j) = r_{12}(j) \end{displaymath}

with the obvious changes made for the case when everything is continuous.

The expected value of the CCF of two functions may be zero; if v is a signal and u is random noise, then this should be the case, and if u is uncorrelated white gaussian noise with zero mean, then the ACF of u (the CCF of u with itself) ought to be zero except at 0 where it should be the variance. This is the sort of thing which is dead obvious, but when said in algebra can leave you wondering whether it is profound or not.

Not.

If we write the time series as a time series of vectors obtained by storing consecutive windows of n of them and moving the windows along to turn the time series of real numbers into a time series of vectors in ${\fam11\tenbbb R}^n$, as described earlier, then we can compute the autocorrelation matrix of a piece of the time series regarded as a set of vectors. The autocorrelation matrix is the covariance matrix with a shift, because the mean is not subtracted from the values when you compute the autocorrelation matrix. It is easy to verify that the matrix equations relating the covariance matrix with the autocorrelation matrix are

\begin{displaymath}
Cov(Data) = AutCor(Data) - {\frac{N}{N-1}} 
MM^T\end{displaymath}

where Data is the matrix formed by listing N vectors in ${\fam11\tenbbb R}^n$ to give an $N \times n$ matrix, M is the $n \times 1$ matrix of the mean of the data vectors, MT is its transpose, and Cov(Data) and AutCor(Data) are self explanatory.


next up previous contents
Next: Adaptive Filters, Kalman Filters Up: Filters Previous: States
Mike Alder
9/19/1997