One Sample t-test
data:  x
t = -3.679, df = 1013, p-value = 0.0002465
alternative hypothesis: true mean is not equal to 0.99
99 percent confidence interval:
 0.9566753 0.9841531
sample estimates:
mean of x 
0.9704142 Suppose your buddy claims he shoots 90% from the free throw line. Since we are all statisticians, we a) don’t believe them, and b) insist upon testing their claim empirically. So we collect some data. They step up and start shooting. At what point do we reject their claim?
Implicitly, we used logical framework to evaluate our buddy’s claim. Let’s unpack that framework and give it a name.
Establish two hypotheses
Collect data
Analyze the data
Make a conclusion based on your analysis
Suppose Giggle wants to test their browser Chime. They sample 1014 randomly selected websites and make a simple decision, this website was displayed properly on Chime or it was not. They found that 984 websites displayed correctly. Test whether or not Chime displays 99% of webpages correctly, and compare your conclusion to a confidence interval. Choose \(\alpha = 0.05\).
Hypothesis Test:
\[H_0: p = .99 \quad \text{ versus } \quad H_1: p \ne .99\]
    One Sample t-test
data:  x
t = -3.679, df = 1013, p-value = 0.0002465
alternative hypothesis: true mean is not equal to 0.99
99 percent confidence interval:
 0.9566753 0.9841531
sample estimates:
mean of x 
0.9704142 We call this framework hypothesis testing. Let’s rephrase hypothesis testing into the language of statistics.
The null and alternative hypotheses generally follow some conventions.
\(H_0\) and \(H_1\) are statements about population parameters
\(H_0\) declares the parameter of interest to be equal to some value.
\(H_1\) declares the (same) parameter of interest to be less than, greater than, or not equal to the same value in the null hypothesis \(H_0\) – the researcher chooses one before collecting data, let alone conducting the test.
Note
It’s too easy to think you proved something when p-value \(< \alpha\), however statistics rarely proves anything. At best, statistics, via p-values, provides evidence against a specific conclusion, namely \(H_0\).
Note
Because the p-value is a probability, there’s two sides to it. The other side results in an error in decision making.
For an alternative hypothesis of \(\ne\), namely \(H_1: mu \ne 0\),
We define a largest level at which we are willing to incorrectly conclude. We call this value the level of significance, and give it the symbol \(\alpha\)1.
level of significance. The largest probability of incorrectly rejecting \(H_0\) when in fact \(H_0\) is true.
We evaluate hypotheses by comparing the p-value to the significance level, \(\alpha\). If our sample statistic (observed data) is so unusual with respect to the null hypothesis that it casts doubt on the validity of \(H_0\), then we have some evidence against \(H_0\) (but not confirming \(H_1\)).
Hypothesis testing never confirms anything, it provides some evidence against the null hypothesis.
When
Note
Despite the overly cautious words above (some, evidence, probably), the world of statistics continues to use the strong phrases “reject” and “fail to reject”.
Consider again Darwin’s finch data set. Formally test that the mean beak height is equal to 9mm versus not equal to, at \(\alpha = 0.05\).
Given \(H_0: \mu = 9\) and \(H_1: \mu \ne 9\), we reject the null hypothesis because the p-value \(2.2e-16 < \alpha = 0.05\). There is sufficient evidence to say that the true population mean beak height of finches from the Galapagos islands is not equal to 9mm.
We conclude a formal hypothesis test by making a decision between \(H_0\) and \(H_1\). But did we decide correctly?
We essentially made a choice, but our choice could be correct or incorrect.
| fail to reject \(H_0\) | reject \(H_0\) | |
|---|---|---|
| \(H_0\) true | correct | type 1 error | 
| \(H_1\) true | type 2 error | correct | 
OS4 Example 5.26. How could we reduce the Type 2 Error rate in US courts? What influence would this have on the Type 1 Error rate?
| found innocent | found guilty | |
|---|---|---|
| didn’t commit crime | correct | type 1 error | 
| did commit crime | type 2 error | correct | 
Confidence intervals can be used to conclude hypothesis tests if
Re finch beak height. So what if we tested \(H_0: \mu = 13\) versus \(H_1: \mu \ne 13\)? Would we reject or fail to reject \(H_0\)?
Hypothesis testing