So far, we have used the t-distribution for confidence intervals or hypothesis tests of one mean (proportions are means).
Paired data are somehow intimately connected.
Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.
Tell me whether or not these data are paired.
If the data are paired, their difference has direct and interpretable meaning both in English and in statistics; \(X_{i,diff} = X_{i,a} - X_{i,b}\) has meaning. Therefore
\[\bar{X}_{diff} \quad \text{ and } \quad s_{\bar{X}_{diff}}\]
are simply fancy ways to write new random variables.
Are textbooks actually cheaper online? Compare the price of textbooks at the University of California, Los Angeles’ (UCLA’s) bookstore and prices at Amazon.com. Seventy-three UCLA courses were randomly sampled in Spring 2010.
Plot the data!
Calculate and interpret a 95% confidence interval of the difference in Amazon.com versus UCLA’s book prices.
    One Sample t-test
data:  d
t = -7.6488, df = 72, p-value = 6.928e-11
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -16.087652  -9.435636
sample estimates:
mean of x 
-12.76164 We are \(95\)% confident that the true mean difference in price between Amazon.com and UCLA’s books is between -16.09 and -9.44.
Set up, evaluate, and conclude in context a hypothesis test at \(\alpha = 0.05\).
The natural hypotheses are
\[ H_0: \mu_{diff} = 0 \text{ versus } H_1: \mu_{diff} \ne 0. \]
    One Sample t-test
data:  d
t = -7.6488, df = 72, p-value = 6.928e-11
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -16.087652  -9.435636
sample estimates:
mean of x 
-12.76164 Becuase p-value\(<0.0001 < \alpha = 0.05\), we reject \(H_0\). There is insufficient evidence to claim that Amazon.com and UCLA’s book prices are the same.
Two sample t-tests estimate the difference between two population means from two independent samples of data. We estimate \(\mu_a - \mu_b\) with the point estimator \(\bar{X}_a - \bar{X}_b\).
Confidence intervals for two sample t-tests follow the same pattern as before,
\[ (\bar{X}_a - \bar{X}_b) \pm t^*_{df} \cdot s_{\bar{X}_a - \bar{X}_b}. \]
Test statistics for two sample t-tests follow the same pattern as before, and p-values are exactly the same.
Consider the data set ape::carnivora. Calculate a \(98\)% confidence interval for the difference in mean longevity between the two SuperFamilies Caniformia and Feliformia.
A \(98\)% CI, difference in longevity by Caniformia and Feliformia.
    Welch Two Sample t-test
data:  LY by SuperFamily
t = 1.0243, df = 37.394, p-value = 0.3123
alternative hypothesis: true difference in means between group Caniformia and group Feliformia is not equal to 0
98 percent confidence interval:
 -28.13845  69.13511
sample estimates:
mean in group Caniformia mean in group Feliformia 
                192.4583                 171.9600 We are \(98\)% confidence that the population difference in mean longevity between the SuperFamlies Caniformia and Feliformia is between -28.1 and 69.1.
Set up, evaluate, and conclude in context a hypothesis test at \(\alpha = 0.02\).
The natural hypotheses are
\[ H_0: \mu_C = \mu_F \text{ versus } H_1: \mu_C \ne \mu_F \]
and \(\alpha = 0.02\).
Because p-value \(=0.31 > \alpha = 0.02\), we fail to reject \(H_0\). There is insufficient evidence to claim that the true difference in mean longevity between Caniformia and Feliformia is different.
Overall things have stayed pretty much the same: confidence intervals, hypothesis tests, and interpretations. Now we have new types of data we can work with.