Consider a data set about finches (little birds) from the Galapagos islands. We’ll consider the variables beakwidth measured in millimeters (mm), taillength also in mm, and island.
Df Sum Sq Mean Sq F value Pr(>F)
island - 14.6734 7.3367 1.0632 0.3513
Residuals 65 448.5518 6.9008 - -
Call:
lm(formula = taillength ~ island, data = df)
Residuals:
Min 1Q Median 3Q Max
-9.4667 -1.4769 0.3333 1.5026 4.4923
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.0077 0.5152 85.421 <2e-16 ***
islandsancristobal 0.4590 0.7218 0.636 0.527
islandsantacruz -0.7744 0.8517 -0.909 0.367
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.627 on 65 degrees of freedom
Multiple R-squared: 0.03168, Adjusted R-squared: 0.001882
F-statistic: 1.063 on 2 and 65 DF, p-value: 0.3513
Write three sentences about three different statistics from the box plot above. Each sentence must be about a different island.
Write the null and alternative hypotheses for ANOVA based on the box plot above. Specify a level of significance.
Compare the p-value to the level of significance and make a conclusion.
Interpret your conclusion in context of the data.
What should the degrees of freedom for island be for the ANOVA above?
Reproduce the F statistic (named F.value in the output above).
Write down the fitted regression equation for the model above.
Using Tukey’s HSD, write down appropriate null and alternative hypotheses for one comparison. Make a conclusion by quoting an appropriate p-value and then interpret your conclusion in context of the data.
Interpret in context of the data a 95% confidence interval for a prediction of the mean taillength based on the output above.
Without doing any calculations, write the regression equation for the prediction above.
Name two ways to make more narrow the width of a confidence interval for a mean taillength?
Here is pseudo-code for a function that would be passed to optim, R’s built in optimization function, in order to calculate linear regression coefficients. Explain what each line of code is doing.
ll_lm <-function(theta, data) { x <- data$x y <- data$y yhat <- theta[1] + theta[2] * x r <- y - yhatreturn(sum( r ^2 ))}
Source Code
---title: "MATH 456 Practice Exam 01, Spring 26"format: htmleditor: source---```{r, echo=FALSE}library(ggplot2)suppressMessages(library(dplyr))df <-read.csv("https://raw.githubusercontent.com/roualdes/data/refs/heads/master/finches.csv")```Consider a data set about finches (little birds) from the Galapagos islands. We'll consider the variables `beakwidth` measured in millimeters (mm), `taillength` also in mm, and `island`.```{r}#| echo: false#| fig-width: 3#| fig-height: 2#| fig-align: center#| warning: falseggplot(df, aes(island, taillength)) +geom_boxplot() +theme_minimal()fita <-lm(taillength ~ island, data = df)dfa <-as.data.frame(lapply(as.data.frame(anova(fita)), \(x) if(is.numeric(x)) { round(x, 4)} else {x}))colnames(dfa) <-c("Df", "Sum Sq", "Mean Sq", "F value", "Pr(>F)")rownames(dfa) <-c("island", "Residuals")dfa$`F value`[2] ="-"dfa$`Pr(>F)`[2] ="-"dfa$`Df`[1] ="-"print(dfa)summary(fita)TukeyHSD(aov(fita))```1. Write three sentences about three different statistics from the box plot above. Each sentence must be about a different island.2. Write the null and alternative hypotheses for ANOVA based on the box plot above. Specify a level of significance.3. Compare the p-value to the level of significance and make a conclusion.4. Interpret your conclusion in context of the data.5. What should the degrees of freedom for `island` be for the ANOVA above?6. Reproduce the F statistic (named F.value in the output above).7. Write down the fitted regression equation for the model above.8. Using Tukey's HSD, write down appropriate null and alternative hypotheses for one comparison. Make a conclusion by quoting an appropriate p-value and then interpret your conclusion in context of the data.```{r}predict(fita, newdata =data.frame(island ="floreana"))```9. Still using the ANOVA model from above, named `fita`, interpret the prediction in context of the data.10. Explain why all finches from the island Floreana have the same prediction for this model.```{r}#| echo: false#| fig-width: 3#| fig-height: 2#| fig-align: center#| warning: falseggplot(df, aes(beakwidth, taillength)) +geom_point() +theme_minimal()fitl <-lm(taillength ~ beakwidth, data = df)summary(fitl)confint(fitl, level =0.9)```11. Write the null and alternative hypotheses for the hypothesis test of the intercept. Specify a level of significance.12. Make the appropriate conclusion from the hypothesis test above. Quote the appropriate p-value.13. Interpret your conclusion in context of the data.14. Write down the fitted regression equation for the model above.15. Interpret adjusted $R^2$ in context of the data.16. Provide two reasons $R^2$ is worse than adjusted $R^2$?17. Interpret in context of the data a 90% confidence interval for the intercept.18. Does the intercept make sense in context of the data. Why or why not?```{r}predict(fitl, newdata=data.frame(beakwidth=5), interval="confidence")```19. Interpret in context of the data a 95% confidence interval for a prediction of the mean `taillength` based on the output above.20. Without doing any calculations, write the regression equation for the prediction above.21. Name two ways to make more narrow the width of a confidence interval for a mean `taillength`?22. Here is pseudo-code for a function that would be passed to`optim`, `R`'s built in optimization function, in order to calculate linear regression coefficients. Explain what each line of code is doing.```{r}ll_lm <-function(theta, data) { x <- data$x y <- data$y yhat <- theta[1] + theta[2] * x r <- y - yhatreturn(sum( r ^2 ))}```