MATH 456 Practice Exam 01, Spring 26

Consider a data set about finches (little birds) from the Galapagos islands. We’ll consider the variables beakwidth measured in millimeters (mm), taillength also in mm, and island.

          Df   Sum Sq Mean Sq F value Pr(>F)
island     -  14.6734  7.3367  1.0632 0.3513
Residuals 65 448.5518  6.9008       -      -


Call:
lm(formula = taillength ~ island, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4667 -1.4769  0.3333  1.5026  4.4923 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)         44.0077     0.5152  85.421   <2e-16 ***
islandsancristobal   0.4590     0.7218   0.636    0.527    
islandsantacruz     -0.7744     0.8517  -0.909    0.367    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.627 on 65 degrees of freedom
Multiple R-squared:  0.03168,   Adjusted R-squared:  0.001882 
F-statistic: 1.063 on 2 and 65 DF,  p-value: 0.3513

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = fita)

$island
                             diff       lwr       upr     p adj
sancristobal-floreana   0.4589744 -1.272307 2.1902559 0.8009421
santacruz-floreana     -0.7743590 -2.817310 1.2685920 0.6365176
santacruz-sancristobal -1.2333333 -3.262396 0.7957293 0.3178053

Write three sentences about three different statistics from the box plot above. Each sentence must be about a different island.
Write the null and alternative hypotheses for ANOVA based on the box plot above. Specify a level of significance.
Compare the p-value to the level of significance and make a conclusion.
Interpret your conclusion in context of the data.
What should the degrees of freedom for island be for the ANOVA above?
Reproduce the F statistic (named F.value in the output above).
Write down the fitted regression equation for the model above.
Using Tukey’s HSD, write down appropriate null and alternative hypotheses for one comparison. Make a conclusion by quoting an appropriate p-value and then interpret your conclusion in context of the data.

predict(fita, newdata = data.frame(island = "floreana"))

       1 
44.00769

Still using the ANOVA model from above, named fita, interpret the prediction in context of the data.
Explain why all finches from the island Floreana have the same prediction for this model.


Call:
lm(formula = taillength ~ beakwidth, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0270 -0.7880 -0.0112  1.3757  3.8055 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  31.5885     1.4682  21.515  < 2e-16 ***
beakwidth     1.2217     0.1426   8.564 2.59e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.823 on 66 degrees of freedom
Multiple R-squared:  0.5264,    Adjusted R-squared:  0.5192 
F-statistic: 73.35 on 1 and 66 DF,  p-value: 2.594e-12

                   5 %      95 %
(Intercept) 29.1392236 34.037862
beakwidth    0.9837041  1.459652

Write the null and alternative hypotheses for the hypothesis test of the intercept. Specify a level of significance.
Make the appropriate conclusion from the hypothesis test above. Quote the appropriate p-value.
Interpret your conclusion in context of the data.
Write down the fitted regression equation for the model above.
Interpret adjusted \(R^2\) in context of the data.
Provide two reasons \(R^2\) is worse than adjusted \(R^2\)?
Interpret in context of the data a 90% confidence interval for the intercept.
Does the intercept make sense in context of the data. Why or why not?

predict(fitl, newdata=data.frame(beakwidth=5), interval="confidence")

       fit      lwr      upr
1 37.69693 36.15838 39.23548

Interpret in context of the data a 95% confidence interval for a prediction of the mean taillength based on the output above.
Without doing any calculations, write the regression equation for the prediction above.
Name two ways to make more narrow the width of a confidence interval for a mean taillength?
Here is pseudo-code for a function that would be passed to optim, R’s built in optimization function, in order to calculate linear regression coefficients. Explain what each line of code is doing.

ll_lm <- function(theta, data) {
    x <- data$x
    y <- data$y
    yhat <- theta[1] + theta[2] * x
    r <- y - yhat
    return(sum( r ^ 2 ))
}