

. Now suppose some
noise
is added to this time series, with a zero mean
gaussian with variance about 0.1 generating a
random number which is added to the original series.
This means that iterating from the given
(noise free) initial state will produce four clusters
in Experiment by modifying the example so as to (a) increase the period, (b) take longer time delay vectors, (c) change the noise so that it is MA filtered gaussian white noise. Investigate the kinds of point set which you get and decide how you might filter the noise out.
Experiment by taking some matrix map, some initial point, and some noise process. Iterate the map plus the noise and examine the point set you get. If you choose low dimensions you can easily project the sets onto the screen of a computer and gloat over them. What common sense suggestions do you have for filtering out the noise?


and the Crosses by the data points


First view the data. This should lead you to suspect that using one gaussian for each category would not be a good idea, and you should also rule out the use of a single unit Perceptron. You then have to choose between the hypotheses that the data shows signs of a group action of transformations which might be learnable, and that it doesn't. In order to test this, you take each data set, a radius of about 0.4, select points and compute for all the points within a radius of this resolution the covariance matrix.
You test to see if the results indicate a dimension of one. (Actually, one is the only one worth trying for in the circumstances, so you could give this a miss.) Having found that this is the case, you decide to try to fit a polynomial function to each category.
Compute a low order polynomial to fit each of
the categories. This is your first pass at an
attempt at obtaining a transform under which
the category is invariant. Proceed as follows:
linearise the points of one category by taking
the polynomial for the Noughts y = f(x)
and sending each point
to the point
.Now forget about the first co-ordinate and simply
store each point by its residue. The results
should make pattern recognition rather simpler.
Test the legitimacy of using the same polynomial
on
both data sets by computing the variance of the
residues y-f(x) and seeing if the variance
is
different for each category, or by curve fitting
the residue and testing to see how non-linear
it is.
If the two functions f0 and fX for the two
categories had been very diffferent, what might
you have
tried?
Now can you get back again to recover the embeddings
given the data points? If you can, describe what
happens in the space of embeddings as you go from
one category to another. Is there any reason
for
trying to linearise the space so as to be able
to factor out the transformation? After all,
it would be
simplest to decide what category a datum is in
by measuring its distance from the best fitting
manifold, why should you go to any more trouble
than this? (Hint: There may be more problems
in the
same space coming down the pipeline!)
You might consider the low dimensional case where
one category has y = x2 as a fitting manifold
and
the other has y = x4 + x2 + 0.3 as its. Find
a non-linear transform of
which allows
you to
factor out one component in order to get rid of
the transformation group action.
. Where else might you find points of
type Square?
You take a set of measurements of different speakers
saying the vowel /AA/, as in English Cat and
Bag,
and discover, after binning them into 12 filterbank
values that the sounds occupy a region of the
space
which is approximated by a single gaussian
distribution, and has dimension about six, i.e.
the first six eigenvalues are reasonably large, and the last six are all pretty much the same, small, and can be interpreted as noise. You repeat for seven other vowel sounds and discover that the same is essentially true for each of them, that the centres of the gaussians occupy a plane in the space, and that the gaussians are more or less shifts of each other in this plane. The principal axes of the gaussians all stand out of the plane at an angle of about 30o, like six dimensional whales leaping out of a flat sea, all about half out and all pointing in the same direction. An impressive sight.
You entertain the hypothesis that the distributions are telling you that there is a six dimensional space of transformations which can be performed on a vowel utterance in order to encode information about how loudly it is spoken, the age and sex of the speaker, and so on.