next up previous contents
Next: Models and Data: Some Up: Statistical Ideas Previous: Models and Probabilistic Models

Probabilistic Models as Data Compression Schemes

Many of the above problems evaporate if one regards probabilistic models in a different light. Instead of seeing them as competitors of Classical mathematical models, we may regard them as simply data compression schemes. I can, and shall, regard a gaussian pdf as simply a compressed way of describing a set of points in ${\fam11\tenbbb R}^n$. This follows Rissanen's approach, see the references for details. This approach is relatively new, but very appealing to those of us who can almost believe in information theoretic ideas.

As has been pointed out by Penttila,[*] another Finn, the Babylonians kept careful records of eclipses and other astronomical events. They also kept careful records of city fires and floods and famines. The records allowed them to eventually become able to predict eclipses but not to predict fires, floods or famines. Thus they only got good results on the things they didn't actually care about. What is done in practice is to follow the example of the Babylonians: you try hard to use the data to make predictions by any method you can devise. If you can't find any pattern at all, you say to yourself `Sod it, the bloody stuff is random'. This is essentially a statement about your intellectual limitations. If I am much cleverer than you, I might be able to figure out a pattern you cannot. What the moron describes as `bad luck', the smarter guy identifies as incompetence. One man's random variable is another's causal system, as Persi Diaconis can demonstrate with a coin.

This leads to one taking a class of models and asking how much information can each member of the class extract from the given data set? We may distinguish between a class of models which we can write down and where we are able to do the sums to answer the question how much does this model compress the data, and a class of models which might be implemented in the neural tissue of a particular human being, concerning which, it must be admitted, little can be said. For each class of models we can imagine a robot implementing that class, and using the data to select a model or subset of models from the class, possibly by maximising the amount of compression of the data. Presumably, human beings, jelly-fish and other elements of creation are also busily trying to compress the data of life's experience into some family of models. It so happens we can say very little about these machines because we didn't build them; the robots we can say a little about. And a robot of a given class will say that some data is random precisely when that robot is unable to effect any significant amount of compression with the family of models at its disposal.

Once one has abandoned attempts to determine whether anything is `realio-trulio' random and observed that randomness is relative to some system trying to extract information, much clarity is gained. The belief that there is some absolute sense of randomness, is one of those curious bits of metaphysics to which certain kinds of impractical, philosophical temperaments are vulnerable, but it makes no particular sense. The definitions of randomness in terms of description length which were initiated by Solomonoff, Kolmogorov and Chaitin gives no practical decision procedure for declaring something to be random, it only allows you, sometimes, to observe that it isn't. I shall expand upon this later. See the writings of Rissanen for the words of the prophet on these matters.



 
next up previous contents
Next: Models and Data: Some Up: Statistical Ideas Previous: Models and Probabilistic Models
Mike Alder
9/19/1997