Tasks
(i)
Do a classical t-test using R (this should be very easy). Write down the p-value, and write
down the notation for what this probability is (i.e., it’s the probability of what conditional
on what)
1
.
(ii)
Do Bayesian model comparison of
H
0
vs.
H
1
. You can compute marginal likelihoods using
either Nested Sampling or another method if you prefer. For
H
1
, use a Uniform(-1000000,
1000000) prior for
µ
. Calculate the posterior odds ratio assuming a prior odds ratio of 1,
and write down the probability notation for what this quantity is.
(iii)
The apparent contradiction between the results of (i) and (ii) is sometimes called the
Jeffreys-Lindley paradox. Explain, in words and/or diagrams, why the conclusion of
part (ii) is actually valid for a reasoner whose prior beliefs are really described by the
Uniform(-1000000, 1000000) prior.
(iv)
Harold Jeffreys had a suggestion for problems like this. The parameter
σ
already sets a
scale for the problem (“we’re dealing with stuff around 100–300”), so conditional on
σ
,
we can use
σ
to give us a suitable non-extreme prior for
µ
. He suggested using a Cauchy
distribution because of the heavy tails. Re-do part (ii) using
µ ∼ Cauchy
(0
, σ
) as the prior
for
µ
given
σ
.
Hint
: For R coding purposes, a Cauchy distribution is a
t
distribution with
df=1. Hint 2: Use κ = µ/σ as a parameter instead of µ, and then let µ ← κσ.
1
While a Bayesian can write down what calculation a frequentist has done, they still wouldn’t agree on what
it represents. For a frequentist, the p-value is an experimentally verifiable relative frequency. For a Bayesian,
it’s the degree of belief in a proposition in this particular case, imagining that you knew
H
0
to be true. If
the experimentally verifiable relative frequency exists, the Bayesian could talk about it too, but should give it
another symbol, maybe
f
for frequency instead of
p
for probability, even though the maths relating different
f
s
to each other and the maths relating different
p
s to each other would be the same [i.e., the laws of probability
theory], they would remain conceptually distinct things.
2