Next: Bibliography
Up: Statistical Ideas
Previous: Summary of the chapter
These little exercises are of the simplest possible
sort, and involve mostly tossing coins. They
should therefore be easy for anyone who has even
the smallest knowledge of probability theory.
Try them on your friendly neighbourhood probabilist
if you can't do them yourself.
- 1.
- You are asked by a friend to show him the
coins you have in your pocket or purse and you
find two ten cent coins. He selects one of them,
and then proceeds to toss it ten times, obtaining
eight heads and two tails in some order. He then
offers to bet on the results of tossing the coin
again.
Feeling that the data is too restricted to allow
you to estimate odds sensibly, you toss the other
coin one thousand times (rather quickly) and keep
count of the results. You get 503 Heads and 497
Tails, but you note that the variation if you
partition them into groups of ten is considerable
and accords well with the predictions of a binomial
model with p(H) = 0.5. Present a careful
argument to determine what odds you would offer
your friend in a bet for a dollar on the next
throw of his coin coming heads. Repeat for the
case where the bet is one hundred dollars. Would
it
make a difference if the friend was called Persi?
If you were allowed to be the one to toss the
coin?
Could the order of the results make a difference,
for example if the two tails came first? Last?
What about the possibility that the number of
throws was chosen not because ten is a nice round
number
but because your friend threw two tails at the
end because he was getting tired and thought
that two
bad throws meant it was time to stop?
- 2.
- The same situation as in the last exercise
occurs, but this time you have one ten cent coin
and one twenty cent coin. Your friend takes the ten
and you do your experiment with the twenty. Do
you feel that this makes no substantive difference to the
process of throwing your coin one thousand times
and applying the result to expectations about
his coin? Would there be some measure of relevance
which you were using implicitly? How about if there
was one ten cent coin and (a) a saucer (b) a
ten dollar bill and (c) a small piece of green cheese, in
your pocket? Would you feel that the relevance
of tossing the
ten dollar bill in the air so small as to be useless,
but a twenty cent coin might give you some
information? If so, explain how to program a robot
to calculate the relevance. What would you do
with the green cheese?
- 3.
- Suppose a martian or some visitor from a
distant planet, or maybe an intelligent octopus,
were to be watching you and your friend, and suppose
the kibitzer had never seen a coin tossed before
or anything remotely like it. Would it, do you feel,
be inclined to feel that tossing the second coin
or possibly the ten dollar bill might be a good idea?
Could it conceivably examine the dynamics of
the processes and make any inferences as to relevance?
If any of the parties present were a subjective
Bayesian, would he have have any explanation of
where the kibitzer got his, her or its prior
from before it did the experiment with a different
coin?
- 4.
- You are given 2000 points in the unit interval
to a precision of 10 bits. There is thus quite
a good chance
that several of the points will coincide, up
to the given precision. Suppose they have in
fact been generated by a uniform distribution;
can one expect, on average, to compress them
by rather better than by sending 20,000 bits, how
might one do this, and ought one to do it?
- 5.
- You work to one bit precision on the unit
interval and you have three data points,
.You wish to find a good model for the data. The
maximum likelihood model says that it's twice
as likely to be a as a 1. You note however that with
three data points the only possibilities are
three 0s, two 0s and a 1, two 1s and a 0, and
three 1s. Since you wish to be able to get good
predictions on the next 1000 points, it seems
absurd to restrict yourself to one of a class
of four models, the only ones maximum likelihood
could provide you with, if you assume a
binomial model family to start from and have only three
data. You therefore use bayesian methods and
assume a uniform prior on the grounds that it contains
least prejudice about the way things are. Try
the calculation and see how much good it does you
when you take the MAP estimate. Are there any
grounds for a prior prejudice in favour of a binomial model
with equal probabilities for the two outcomes?
If the data came from tossing a coin and assigning a
zero to Heads and a 1 to Tails, would you alter
your answer to the last part? If the bits represented
polarisation of a signal from outer space, would
you feel differently about things? Do you think learning
a bit of Physics might help in the last case?
Just how ignorant are you?
- 6.
- Suppose that in the last example, the data
comes in sequentially after the first three arrive
in
a lump, and that we proceed by updating our estimate
of the posterior distribution continuously. This
procedure amounts to getting a sequence of points
in the space of posterior distributions, which
bears
a passing resemblance to the unit interval. If
the data were in fact generated by a binomial
process
with probability of getting 0 equal to p, there
ought to be some grounds for thinking that the
sequence will almost surely converge to this value.
By formalising what the terms mean, prove that
this is indeed the case.
- 7.
- With the assumptions of the last question,
how many points are needed in order to justify
the use
of the appropriate binomial model as a compression
device?
- 8.
- If the sequence in fact consists of the
bit sequence consisting of (0,0,1) repeated
indefinitely, how many points are required before
the repetition model beats the binomial model?
How many would you need?
- 9.
- Suppose that you have a gaussian mixture
model of some data, where the data was in fact
generated by
simulating k gaussians in the plane. Are there
circumstances in which you would expect a really
effective method for penalising too many coefficients
to (a) underestimate or (b) overestimate k? In
other words, could underestimating (overestimating)
k ever be the sensible thing to do?
Next: Bibliography
Up: Statistical Ideas
Previous: Summary of the chapter
Mike Alder
9/19/1997