To take the simplest example of filtering, suppose
is an input series
and
is an output series.
I want to have a convolution of u with some
finite filter so that u looks like v. Let
us suppose that I have some finite amount of
history of the past of u, say for
. Then I want to choose some coefficients
so that
![\begin{displaymath}
\sum_{j \in [1 \ldots n]} (v_j - b_0 u_j -
b_1 u_{j-1} - \cdots b_k u_{j-k})^2\end{displaymath}](img518.gif)
If I take k to be 1, for purposes of exposition, then I am trying to minimise
![\begin{displaymath}
\sum_{j \in [1 \ldots n]}( v_j - b_0 u_j -
b_1 u_{j-1})^2 \end{displaymath}](img519.gif)
The expression is best seen as a distance in
from the point
to the
plane
determined by
all possible
choices of b0 and b1. In general, it will
be a k-dimensional subspace determined by
all
possible choices of the bi.
The problem of finding the closest point on
a plane, P to a given
point v in
but not on the plane, is
a simple piece of linear algebra. We observe
that if
the closest point to v on P is called q,
then the line joining the point v to the point
q is orthogonal to the plane. Hence it is orthogonal
to the vectors which span the plane. This
follows immediately from Pythagoras' theorem,
although it is called rather pompously the
`orthogonality principle' in many text books.
If you draw a picture for n=3 you should have
no
particular difficulty with this: see Fig. 6.3.
For orthogonal vectors the dot product is zero, so we obtain the two equations


If instead of having a filter of order 2 we had
used k coefficients, we should have had the
same
orthogonality condition (still by Pythagoras'
Theorem!) but this time there would have been
k
of them.
denotes the dot product in
.
The k equations are called the normal equations and solving them gives the point q. They are simply k linear equations in k unknowns. We leave it to the reader to decide when there is a unique solution.
In the filtering case, the equations are also
known as the Yule-Walker equations, and
the
corresponding equations in the continuous case
are called the Wiener-Hopf equations.
The solution gives a so-called optimal filter
but it should be noted that we are simply
trying to get as close as we can to a known point
in
while staying in a linear subspace.
An obvious practical objection to the process
as described above, is that the filter pays as
much attention to the current values as it does
to the remote past ones. In other words we have
chosen a particular (Euclidean) metric on the
space
given by


This can be done by defining a new inner product.
The choice of
raised to increasing
powers can be varied to taste. A new inner product
can be conveniently represented by a
symmetric, positive definite matrix. If
and
are vectors and A is such a matrix,
the new inner product will be
and
the good old `dot' product is simply the special
case where A is the identity matrix. The above
equations may look slightly different in this
form.
It is common to rewrite the (k+1) normal equations as follows:
![]()
![\begin{displaymath}
\forall j \in [0 \cdots k], \sum_{i=0}^k b_i
(\u_{n-i} \bullet \u_{n-j}) = \v \bullet \u_{n-j} \end{displaymath}](img530.gif)
If you reflect on what n ought to be in practice,
you can see that it should be as long a
sample as you have from starting time to finishing
time, except for the small problem of
shifting outside the region for which we have
data. A certain amount of wastage will occur
at the ends of a finite sample. Note that we do not
in fact need the time series
we need only
its dot product with different shifts of the time
series
.
A slightly different formulation can be obtained by dividing the above equations on both sides by n; in this case we can define the Auto-Correlation Function (ACF) for a time series u as the function which assigns to an integer j the average or expected value of u(n)u(n-j).
We can define the Cross-Correlation Function (CCF) for u and v to be the average or expected value of v(n) u(n-j). If the time series u is stationary then
![]()
![]()
![\begin{displaymath}
\forall j \in [0 \cdots k],\ \sum_{i=0}^k
b_i r_{11}(i,j) = r_{12}(j) \end{displaymath}](img533.gif)
The expected value of the CCF of two functions may be zero; if v is a signal and u is random noise, then this should be the case, and if u is uncorrelated white gaussian noise with zero mean, then the ACF of u (the CCF of u with itself) ought to be zero except at 0 where it should be the variance. This is the sort of thing which is dead obvious, but when said in algebra can leave you wondering whether it is profound or not.
Not.
If we write the time series as a time series of
vectors obtained by storing consecutive windows
of n of them and moving the windows along to
turn the time series of real numbers into a time
series of vectors in
, as described earlier,
then we can compute the autocorrelation
matrix of a piece of the time series regarded
as a set of vectors. The autocorrelation matrix
is
the covariance matrix with a shift, because the
mean is not subtracted from the values when you
compute the autocorrelation matrix. It is easy
to verify that the matrix equations relating
the
covariance matrix with the autocorrelation matrix
are
![]()