next up previous contents
Next: Into Up: Filters Previous: Autoregressive Time Series

Linear Predictive Coding or ARMA modelling

For reasons which are not altogether convincing, it is not uncommon to model speech as if it were produced by IIR filtering a glottal pulse input (for voiced sounds) or white noise (for unvoiced sounds). Then if we know what is supposed to have gone in, we know what came out, we can calculate the coefficients which give the best fit to the output over some period of time. As the vocal tract changes, these coefficients are also supposed to change in time, but relatively slowly. So we change a fast varying quasi-periodic time series into a vector valued time series, or a bit of one, which I have called the trajectory of an utterance. The argument for Autoregressive modelling suggested above hints at relationship with the Fourier Transform, which emerges with more clarity after some algebra.

This approach is called Linear Predictive Coding in the Speech Recognition literature.

ARMA modelling with its assumptions, implausible or not, is done extensively.[*] A variant is to take a time series and `difference' it by deriving the time series of consecutive differences, v(n) = u(n) - u(n-1). This may be repeated several times. Having modeled the differenced time series, one can get back a model for the original time series, given some data on initial conditions. This is known as ARIMA modeling, with the I short for Integration.

The modelling of a stationary time series by supposing it arrives by filtering a white noise input is a staple of filtering theory. The method is perhaps surprising to the innocent, who are inclined to want to know why this rather unlikely class of models is taken seriously. Would you expect, say, the stock exchange price of pork belly futures to be a linear combination of its past values added to some white noise which has been autocorrelated? The model proposes that there is a random driving process which has short term autocorrelations of a linear sort and arises from the driving process by more autocorrelations, that is dependencies on its own past. Would you believe it for pork bellies? For pork belly futures? Electroencepghalograms? As a model of what is happening to determine prices or anything much else, it seems to fall short of Newtonian dynamics, but do you have a better idea? Much modelling of a statistical sort is done the way it is simply because nobody has a better idea. This approach, because it entails linear combinations of things, can be written out concisely in matrix formulation, and matrix operations can be computed and understood, more or less, by engineers. So something can be done, if not always the right thing. Which beats scratching your head until you get splinters.

Once the reader understands that this is desperation city, and that things are done this way because they can be rather than because there is a solid rationale, he or she may feel much more cheerful about things.[*] For speech, there is a theory which regards the vocal tract as a sequence of resonators made up out of something deformable, and which can, in consequence, present some sort of justification for Linear Predictive Coding. In general, the innocent beginner finds an extraordinary emphasis on linear models throughout physics, engineering and statistics, and may innocently believe that this is because life is generally linear. It is actually because we know how to do the sums in these cases. Sometimes, it more or less works.


next up previous contents
Next: Into Up: Filters Previous: Autoregressive Time Series
Mike Alder
9/19/1997