next up previous contents
Next: Histograms and Probability Density Up: History, and Deep Philosophical Previous: History, and Deep Philosophical

The Origins of Probability: random variables

Probability Theory originated in attempts to answer the practical questions of whether it is better to discard one card to fill a straight or two to fill a flush, and similar questions regarding dice. No branch of Applied Mathematics has better credentials as a practical and necessary part of a gentleman's education. If the theory worked, the customer made a buck, and if it didn't he tried to get his losses back from the theoretician; seldom has the motivation for honest theorising been so pointed.

The problem referred to arises in draw poker where you are one of some number of players, say four for definiteness, who have each been dealt five cards. You look at your hand and see, let us suppose, $\heartsuit 4,5,6; \clubsuit 7; \spadesuit 
Q.$ You are allowed to discard one or two cards and get replacements off the top of what is left of the pack. The aim is, as all but the youngest are aware, to get a `good' hand. Two possibilities immediately suggest themselves: to discard the Queen of Spades and hope to fill the straight by picking either a 3 or an 8; or to discard the club and spade and draw two cards in the hope of filling the flush by getting two more hearts.

The argument proceeds as follows: There are 47 cards not in my hand, and we know nothing about them beyond the plausible assumption that they are from a standard pack, and in a random order- whatever that means. In which case the 10 hearts left are somewhere in those 47 cards you haven't got. So there is a probability of 10/47 that the next card on the stack is a heart, and of 9/46 that the card after that is also a heart. So the probability of getting your flush is 90/2162, approximately 0.0416. If you aim for the straight, there are eight possible cards which can do the job, a 3 or 8 of any suit will do, and there are 47 left, so the probability of the top card being acceptable is 8/47 or approximately 0.1702. In rough terms, its about a one in six shot that you'll fill the straight, and only about one in twenty four that you'll fill the flush.

Which suggests you should go for the straight, but a flush beats a straight, so maybe you should go for the flush and improve your chances of winning more money. Who is to tell?

Now the analysis is very standard but contains a lot of assumptions, such as that the deck is not rigged and the other players will not shoot you under the table if you win. As everybody who has played poker knows, there may be more useful information in the momentary smug look on the face opposite than in the above calculation. The assumptions are not usually much thought about, which is why, perhaps, probabilists are not noticeably richer than the rest of us. [*]

The great Australian contribution to gambling is the game of two-up, which consists of throwing two coins in the air. If they land with the same face showing, two heads or two tails, then one party wins, otherwise he doesn't. While they are in the air, an attempt is made to affect the outcome by shouting such advice as `Come in Spinner' at the coins. Contrary to what one might expect, this works. But only about half the time.

Now there is nothing much random about the next two cards on the pack, either they are both hearts or they aren't, and likewise it is rather strange that throwing coins in the air should have anything random about it, since Newton's laws apply and the initial state determines the final state completely. The fact is, we don't have much useful information about the initial state, and the dynamical calculations on bouncing are somewhat beyond most players. Persi Diaconis, a statistician who started academic life somewhat non-canonically (much to the joy of his students and most of his colleagues), can reliably toss a coin so as to produce whatever outcome he wants. Score one for Isaac and Persi. But most of us will take it that initial states which are, to us, indistinguishable, can and do produce different outcomes.


 
Figure 3.1: A random variable.
\begin{figure}
\vspace{8cm}
\special {psfile=patrecfig3.1.ps}\end{figure}

Now suppose we take the space of initial positions, orientations, momenta and angular momenta for the coin and assign, in accordance with Newton's laws, a colour to each point in this space, black if the coin comes down heads and white if it comes down tails. Then the twelve dimensional space of initial states is divided up into a sort of checker-board, alternate black and white regions, separated by rather thin bits corresponding to the cases where the coin finishes on its edge. We cannot say much about the actual initial state, because the regions of black and white are so small that we see only a grey blur; but symmetry arguments lead us to the belief that there is as much black as there is white. The hypervolume or measure of the black points is equal to the measure of the white points. If we take the total measure to be 1, for convenience only, then we can assign the measure of the Heads outcome as being pretty much 1/2- assuming, in the face of the evidence, that the coin is truly symmetric. We summarise this by saying that if the coin is fair then the probability of getting heads is 0.5. And what this means is that the measure of all the regions in which the coin starts its throw which will lead by the inexorable action of natural law to its coming up Heads, is one half.

Intuitively, in the case of a finite set, the measure of the set is proportional to the number of elements in it; in the case of a subset of the real line, the measure of a set is the length of the set; in two dimensions it is the set's area, in three its volume. It is possible to abstract a collection of properties that all these things have in common, and then we list these properties and call them the axioms for a measure space. Then anything having these properties is an example, including the ones we started with. This is exactly like taking out the common properties of ${\fam11\tenbbb R}^n$ and calling them the axioms for a Vector Space; it is standard Mathematical practice and allows us to focus our minds on the essentials. We thereupon forget about the dimension of any space of initial states, and indeed almost all properties of it, except the fact that it has some collection of subsets each of which has a measure, and that it has total measure 1. The unit square in the plane is simply a particular example of such a space.

It is thought along these lines which leads us to the idea of a random variable (rv).

We imagine a map from some space (of initial conditions?) to a space of outcomes, and we suppose that nothing is known about the domain of the map except the measure of the set which produces each type of outcome. At least, this is what we assume it means if there are only finitely many outcomes; if there is a continuum of outcomes, as when someone throws a dart at a board, we suppose that there is a measure for any measurable set in which the dart might land. By putting sensible conditions on what a measure ought to be, making it behave in the same way as area, volume, et cetera, and similar conditions on how maps ought to treat measures, we come out with a definition of a random variable as a map from some inscrutable space having a measure and nothing much else, into ${\fam11\tenbbb R}^n$ for some n, sometimes with restrictions such as taking values only in 0,1 with n = 1. Now we can define an event as a set of possible outcomes, and the probability of an event A, p(A), to be the measure of all those points in the domain of the rv which lead to us getting an outcome in A. This is why probability behaves mysteriously like area of sets, and why those little diagrams of circles intersecting, which figure in the elementary texts on Probability Theory, actually work.

In short we imagine that any random variable is much like throwing a die or a dart. This imagery keeps probabilists going long after the logic has run dry.

Similarly, your ignorance of the state of the stack of cards from which you will be dealt the next one or two you ask for, is, we shall assume, close to total. Of all the possible arrangements in the stack of cards left, you deem them equally likely, by virtue of your innocent faith in the shuffling process to which the pack was subjected. To obtain the given `probabilities' by the arithmetic process is a trivial combinatorial argument which most people find convincing, both as providing some relative degree of belief in the two outcomes, and also an estimate of what might happen if you were in this situation a large number of times and kept count of the results.

Experiments have been done on packs of cards to see if the results are what the arguments predict. Some anomalies have been reported, but mainly by Psychic investigators with an axe to grind. It is within the capacity of all of us to toss a coin a hundred times. (Persi Diaconis has done it much more often!) I suggest you try the experiment and see how persuasive you find the argument in the light of the experimental data.


next up previous contents
Next: Histograms and Probability Density Up: History, and Deep Philosophical Previous: History, and Deep Philosophical
Mike Alder
9/19/1997