The objective of this module is to continue our discussion of classical hypothesis testing from a frequentist statistics approach.
Classical or frequentist hypothesis testing (a.k.a. parametric statistics) involves formally stating a claim - the null hypothesis - which is then followed up by statistical evaluation of the null versus an alternative hypotheses. The null hypothesis is interpreted as a baseline hypothesis and is the claim that is assumed to be true. The alternative hypothesis is the conjecture that we are testing.
We need some kind of statistical evidence to reject the null hypothesis in favor of an alternative hypothesis. This evidence is, in classical frequentist approaches, some measure of how unexpected it would be for the sample to have been drawn from a given null distribution.
\(H_0\) = null hypothesis = a sample statistic shows no deviation from what is expected or neutral
\(H_A\) = alternative hypothesis = a sample statistic deviates more than expected by chance from what is expected or neutral
We can test several different comparisons between \(H_0\) and \(H_A\).
\(H_A\) > \(H_0\), which constitutes an "upper one-tailed test (i.e., our sample statistic is greater than that expected under the null)
\(H_A\) < \(H_0\), which constitutes an "lower one-tailed test (i.e., our sample statistic is less than that expected under the null)
\(H_A\) ≠ \(H_0\), which constitutes an "two-tailed test (i.e., our sample statistic is different, maybe greater maybe less, than that expected under the null)
There are then four possible outcomes to our statistical decision:
What is True | What We Decide | Result |
---|---|---|
\(H_0\) | \(H_0\) | Correctly ‘accept’ the null |
\(H_0\) | \(H_A\) | Falsely reject the null (Type I error) |
\(H_A\) | \(H_0\) | Falsely ‘accept’ the null (Type II error) |
\(H_A\) | \(H_A\) | Correctly reject the null |
In classical frequentist (a.k.a. parametric) inference, we perform hypothesis testing by trying to minimize our probability of Type I error… we aim for having a high bar for falsely rejecting the null (e.g., for incorrectly finding an innocent person guilty). When we set a high bar for falsely rejecting the null, we lower the bar for falsely ‘accepting’ (failing to reject) the null (e.g., for concluding that a guilty person is innocent).
To do any statistical test, we typically calculate a test statistic based on our data, which we compare to some appropriate standardized sampling distribution to yield a p value.
The p value = the probability of our obtaining a test statistic that is as high or higher than our calculated one by chance, assuming the null hypothesis is true.
The test statistic basically summarizes the “location” of our data relative to an expected distribution based on our null model. The particular value of our test statistic is determined by both the difference between the original sample statistic and the expected null value (e.g., the difference between the mean of our sample and the expected population mean) and the standard error of the sample statistic. The value of the test statistic (i.e., the distance of the test statistic from zero) and the shape of the null distribution are the sole drivers of the smallness of the p value. The p value effectively represents the area under the sampling distribution associated with test statistic values as or more extreme than the one we observed.
We compare the p value associated with our test statistic to some significance level, \(\alpha\), typically 0.05 or 0.01, to determine whether we reject or fail to reject the null. If p < \(\alpha\), we decide that there is sufficient statistical evidence for rejecting \(H_0\).
How do we calculate the p value?
Specify a test statistic (e.g., the mean)
Specify our null distribution
Calculate the tail probability, i.e., the probability of obtaining a statistic (e.g., a mean) as or more extreme than was observed assuming that null distribution
Let’s do an example where we try to evaluate whether the mean of a single set of observations is significantly different than expected under a null hypothesis… i.e., this is a ONE-SAMPLE test.