Many of the above problems evaporate if one regards
probabilistic models in a different light.
Instead of seeing them as competitors of Classical
mathematical models, we may regard them as
simply data compression schemes. I can, and shall,
regard a gaussian pdf as simply a
compressed way of describing a set of points in
. This follows Rissanen's approach, see
the references for details. This approach is relatively
new, but very appealing to those of us who can
almost believe in information theoretic ideas.
As has been pointed out by Penttila,
another Finn, the Babylonians kept careful records
of eclipses and other astronomical
events. They also kept careful records of city
fires and floods and famines. The records allowed
them to eventually become able to predict eclipses
but not to predict fires, floods or famines.
Thus they only got good results on the things
they didn't actually care about. What is done
in practice is to follow the example of the Babylonians:
you try hard to use the data to make
predictions by any method you can devise. If you
can't find any pattern at all, you say to yourself
`Sod it, the bloody stuff is random'. This is
essentially a statement about your intellectual
limitations. If I am
much cleverer than you, I might be able to figure
out a pattern you cannot. What the moron
describes as `bad luck', the smarter guy identifies
as incompetence. One man's random
variable is another's causal system, as Persi
Diaconis can demonstrate with a coin.
This leads to one taking a class of models and asking how much information can each member of the class extract from the given data set? We may distinguish between a class of models which we can write down and where we are able to do the sums to answer the question how much does this model compress the data, and a class of models which might be implemented in the neural tissue of a particular human being, concerning which, it must be admitted, little can be said. For each class of models we can imagine a robot implementing that class, and using the data to select a model or subset of models from the class, possibly by maximising the amount of compression of the data. Presumably, human beings, jelly-fish and other elements of creation are also busily trying to compress the data of life's experience into some family of models. It so happens we can say very little about these machines because we didn't build them; the robots we can say a little about. And a robot of a given class will say that some data is random precisely when that robot is unable to effect any significant amount of compression with the family of models at its disposal.
Once one has abandoned attempts to determine whether anything is `realio-trulio' random and observed that randomness is relative to some system trying to extract information, much clarity is gained. The belief that there is some absolute sense of randomness, is one of those curious bits of metaphysics to which certain kinds of impractical, philosophical temperaments are vulnerable, but it makes no particular sense. The definitions of randomness in terms of description length which were initiated by Solomonoff, Kolmogorov and Chaitin gives no practical decision procedure for declaring something to be random, it only allows you, sometimes, to observe that it isn't. I shall expand upon this later. See the writings of Rissanen for the words of the prophet on these matters.