next up previous contents
Next: Bibliography Up: Statistical Ideas Previous: Summary of the chapter

Exercises

These little exercises are of the simplest possible sort, and involve mostly tossing coins. They should therefore be easy for anyone who has even the smallest knowledge of probability theory. Try them on your friendly neighbourhood probabilist if you can't do them yourself.
1.
You are asked by a friend to show him the coins you have in your pocket or purse and you find two ten cent coins. He selects one of them, and then proceeds to toss it ten times, obtaining eight heads and two tails in some order. He then offers to bet on the results of tossing the coin again. Feeling that the data is too restricted to allow you to estimate odds sensibly, you toss the other coin one thousand times (rather quickly) and keep count of the results. You get 503 Heads and 497 Tails, but you note that the variation if you partition them into groups of ten is considerable and accords well with the predictions of a binomial model with p(H) = 0.5. Present a careful argument to determine what odds you would offer your friend in a bet for a dollar on the next throw of his coin coming heads. Repeat for the case where the bet is one hundred dollars. Would it make a difference if the friend was called Persi? If you were allowed to be the one to toss the coin? Could the order of the results make a difference, for example if the two tails came first? Last? What about the possibility that the number of throws was chosen not because ten is a nice round number but because your friend threw two tails at the end because he was getting tired and thought that two bad throws meant it was time to stop?

2.
The same situation as in the last exercise occurs, but this time you have one ten cent coin and one twenty cent coin. Your friend takes the ten and you do your experiment with the twenty. Do you feel that this makes no substantive difference to the process of throwing your coin one thousand times and applying the result to expectations about his coin? Would there be some measure of relevance which you were using implicitly? How about if there was one ten cent coin and (a) a saucer (b) a ten dollar bill and (c) a small piece of green cheese, in your pocket? Would you feel that the relevance of tossing the ten dollar bill in the air so small as to be useless, but a twenty cent coin might give you some information? If so, explain how to program a robot to calculate the relevance. What would you do with the green cheese?[*]

3.
Suppose a martian or some visitor from a distant planet, or maybe an intelligent octopus, were to be watching you and your friend, and suppose the kibitzer had never seen a coin tossed before or anything remotely like it. Would it, do you feel, be inclined to feel that tossing the second coin or possibly the ten dollar bill might be a good idea? Could it conceivably examine the dynamics of the processes and make any inferences as to relevance? If any of the parties present were a subjective Bayesian, would he have have any explanation of where the kibitzer got his, her or its prior from before it did the experiment with a different coin?

4.
You are given 2000 points in the unit interval to a precision of 10 bits. There is thus quite a good chance[*] that several of the points will coincide, up to the given precision. Suppose they have in fact been generated by a uniform distribution; can one expect, on average, to compress them by rather better than by sending 20,000 bits, how might one do this, and ought one to do it?

5.
You work to one bit precision on the unit interval and you have three data points, $\{0,0,1\}$.You wish to find a good model for the data. The maximum likelihood model says that it's twice as likely to be a as a 1. You note however that with three data points the only possibilities are three 0s, two 0s and a 1, two 1s and a 0, and three 1s. Since you wish to be able to get good predictions on the next 1000 points, it seems absurd to restrict yourself to one of a class of four models, the only ones maximum likelihood could provide you with, if you assume a binomial model family to start from and have only three data. You therefore use bayesian methods and assume a uniform prior on the grounds that it contains least prejudice about the way things are. Try the calculation and see how much good it does you when you take the MAP estimate. Are there any grounds for a prior prejudice in favour of a binomial model with equal probabilities for the two outcomes? If the data came from tossing a coin and assigning a zero to Heads and a 1 to Tails, would you alter your answer to the last part? If the bits represented polarisation of a signal from outer space, would you feel differently about things? Do you think learning a bit of Physics might help in the last case? Just how ignorant are you?

6.
Suppose that in the last example, the data comes in sequentially after the first three arrive in a lump, and that we proceed by updating our estimate of the posterior distribution continuously. This procedure amounts to getting a sequence of points in the space of posterior distributions, which bears a passing resemblance to the unit interval. If the data were in fact generated by a binomial process with probability of getting 0 equal to p, there ought to be some grounds for thinking that the sequence will almost surely converge to this value. By formalising what the terms mean, prove that this is indeed the case.

7.
With the assumptions of the last question, how many points are needed in order to justify the use of the appropriate binomial model as a compression device?

8.
If the sequence in fact consists of the bit sequence consisting of (0,0,1) repeated indefinitely, how many points are required before the repetition model beats the binomial model? How many would you need?

9.
Suppose that you have a gaussian mixture model of some data, where the data was in fact generated by simulating k gaussians in the plane. Are there circumstances in which you would expect a really effective method for penalising too many coefficients to (a) underestimate or (b) overestimate k? In other words, could underestimating (overestimating) k ever be the sensible thing to do?


next up previous contents
Next: Bibliography Up: Statistical Ideas Previous: Summary of the chapter
Mike Alder
9/19/1997