Point estimates are random variables. Random variables follow shapes, called distributions. Therefore, point estimates follow distributions (and have shapes).
The Central Limit Theorem says, “If our sample size is large enough, the sample mean will be approximately Normally distributed.”
From the CLT, we can approximate confidence intervals
From the CLT, we can approximate area in tails (p-values)
The two sample \(t\)-test, compares the means of two groups with the hypothesis
\[ \begin{align*} H_0: \quad & \mu_1 = \mu_2 \\ H_A: \quad & \mu_1 \ne \mu_2. \\ \end{align*} \]
R’s function t.test() did all the hard work for us.
If there were three or more groups, the two sample \(t\)-test would not work. We could force the test on the data by comparing two groups at a time, but this has dangerous implications. We thus require a new statistical method, analysis of variance.
The ANOVA hypothesis test for \(k\) groups is
\[ \begin{align*} H_0: \quad & \mu_1 = \mu_2 = \mu_3 = \ldots = \mu_k \\ H_A: \quad & \text{at least one mean is different.} \end{align*} \]
Note
ANOVA tests equality of means across groups, despite its name.
With ANOVA you can compare the means by groups for many different data sets.
Let’s visualize what is going on with ANOVA. It starts with box plots by groups – draw more pictures on board.
Analysis of variance tells us about means by groups, despite its name. Large variation amongst the groups relative to small variation within the groups indicates different population means.
What do you think of these means – think of variation within and amongst groups?
What do you think of these means – think of variation within and amongst groups?
ANOVA calculates one fraction based on two numbers, variation amongst (between) groups and variation within groups. These two numbers are generally referred to as mean square values.
Are baseball players paid on average differently by position?
\[ \begin{align*} H_0: \quad & \mu_{catcher} = \mu_{dh} = \mu_{first} = \ldots = \mu_{third} \\ H_A: \quad & \text{at least one mean salary is different.} \end{align*} \]
with \(\alpha = 0.05\).
Load the data and make a plot.
The R code to run ANOVA
The tilde \(\sim\) is meant to be read as, predict the left hand side with the right hand side. Hence, we read the following as, predict salary by different levels of position.
The degrees of freedom, F-statistic, and p-value are the most important pieces of information to extract from an ANOVA table.
Because the p-value \(= 10^{-4} < \alpha = 0.05\) we reject the null in favor of the alternative. There is sufficient evidence to claim that the population of mean salaray of baseball players probably varies by position.
Say you’ve got some data and you want to test equality of the means. What to do? ANOVA!
What not to do? Immediately compare all pairwise combinations of the \(k = 9\) groups. That would result in 36 tests, and an increase in your Type I Error rate.
Suppose you chose a level of significance of \(\alpha = 0.05\). What’s the probability of observing at least one significant result just by chance?
This is just a binomial distribution, \(X \sim\) Binomial\((36, 0.05)\), and we want to know \(P(X \ge 1)\).
[1] 0.8422208Many solutions exist. A simple one, in R, is Tukey’s honest significant difference (HSD). Only use this if you have first run ANOVA and rejected \(H_0\).
Consider brain weights of the Families of the order Carnivora. Let’s compare mean brain weights by Family.
Since some groups don’t have enough data, let’s remove them.
\[ \begin{align*} H_0: \quad & \mu_c = \mu_f = \mu_h = \mu_m = \mu_u \\ H_A: \quad & \text{at least one mean is different.} \end{align*} \]
with \(\alpha = 0.01\).
Because the p-value is tiny, we reject \(H_0\) in favor of the alternative. There is sufficient evidence to say that at least one mean is different from the rest.