Preamble
Let’s sum up what we’ve learnt during the previous session: null-hypothesis signifance testing rests on assuming
- either a null hypothesis \(H_0\), positing an “uninteresting” situation (eg, both treatment and placebo are equally effective)
- or an alternative hypothesis \(H_1\) (sometimes denoted \(H_a\)), claiming something interesting is going on (eg, treatment is more effective than placebo)
In order to make a statistical sort of ad absurdum statement, \(H_0\) is assumed. (This is convenient since assuming equally effective treatments one can write \(\mu_A = \mu_B\), which is much more specific and easy to work with than \(\mu_A \neq \mu_B\).) And assuming \(H_0\), a \(p\) value (a probability between 0 and 1) can be derived that summurises how compatible the observations are with \(H_0\).
- A small \(p\) value (say, less than \(0.01\)) means that the observations look decidedly odd should we assume the null: the result is said to be significant and the null is rejected
- Anything else means the observations are consistent with the null: we “fail” to reject the null hypothesis (so, well, we have reluctantly to stick with it)
Now we’ve seen that \(H_0\) tends to generate uniformly distributed \(p\) values
We’ve seen that under some alternative hypothesis, the distribution of \(p\) tend to be skewed towards 0. The skew is more or less pronounced depending on the alternative hypothesis. There’s a class a distributions called “beta (\(\beta\)) distributions” that can be used to qualitatively produce such as distribution. They necessitate two parameters \(\alpha\) and \(\beta\) (“shape1” and “shape2” in R parlance).
With \(\alpha = 0.7\) and \(\beta = 1\), we get a distribution reminiscent of a weak alternative hypothesis.
With \(\alpha = 0.2\) and \(\beta = 1\), we get a more skewed distribution reminiscent of a strong alternative hypothesis.
Problem
Part 1
Let’s consider a more complex and truer-to-life situation where we carry out a thousand statistical tests and obtain 1000 \(p\) values from 900 null hypotheses and 100 alternative hypotheses. Assume that the alternative hypotheses are weak and that the corresponding \(p\) values are distributed according to a beta distribution Beta\((\alpha=0.7, \beta=1)\).
- The \(p\) values for the null can be obtained using
runif()
- The \(p\) values for \(H_1\) can be obtained using
rbeta()
Here’s what we want to know:
If the significance threshold is chosen to be \(0.05\), how true positives, false positives, true negatives and false negatives are there?
If the significance threshold is chosen to be \(0.05\), how true positives, false positives, true negatives and false negatives are there?
Adjusting the nominal \(p\) values using
p.adjust(p, method="bonferroni")
, how true positives, false positives, true negatives and false negatives are there chosing a \(0.05\) threshold?Computing the \(q\) values (expected fraction of false positives among) from the nominal \(p\) values using
p.adjust(p, method="fdr")
, how true positives, false positives, true negatives and false negatives are there chosing a \(0.25\) threshold?
Part 2
Redo this assuming that the alternative hypotheses are strong and that the corresponding \(p\) values are distributed according to a beta distribution Beta\((\alpha=0.2, \beta=1)\).
Addendum
There are lots of caveats with the NHST approach:
- What’s a small \(p\) value anyway: is \(0.06\) small enough
- The threshold creates a situation where things lack nuance: \(p = 0.04\) hurray! significant and interesting… \(p = 0.055\) d’oh! not significant and uninteresting
- We could reject the null hypothesis based on a small \(p\) value although the problem could lie with some minor-looking assumptions of the null rather than the null as a whole
- We could reject the null because the observations are difficult to reconcile with the null but we haven’t even considered how reconciliable they are with the alternative hypothesis
- We could fail to reject the null because the observations are somewhat in line with what \(H_0\) would predict but lots of alternative hypotheses are behind data compatible with the null (there’s another practical session for this looking at statistical power)