MATH 315 Homework 10

Due 2025-11-19 by 11:59pm

  1. Load the dataset on penguins. Our goal is to predict body_mass_g (grams) using both species and bill_length_mm (mm). Assume a level of significance of 0.05.

    a. Perform step 1 of any regression analysis, using ggplot2. Please color the points by species and draw the linear regression lines over the plot.

    b. Produce the code to fit the linear regression model to predict body_mass_g using both species and bill_length_mm that gives unique intercepts and slopes to each level of species.

    c. Make a data frame that stores the standardized residuals and fitted values from this model.

    d. Make a ggplot2 scatter plot of the standardized residuals (y-axis) on the fitted values (x-axis). What assupmtions of linear regression does this plot help us check? Do the assumptions seem reasonably met? Why or why not?

    e. Make a ggplot2 histogram of the standardized residuals. What assumption of linear regression does this help us check? Does the assumption seem reasonably met? Why or why not?

    f. Are there any potential outliers that we need to be concerned with? Explain.

    Theoretically, if the assumptions of linear regression aren't satisfactorily met, you'd adjust your model and try again.

    g. Use adjusted R-squared to determine if including species as an explanatory variable improves the overall model fit, as compared to not including species. Report two adjusted R-squared numbers to justify your conclusion.

    h. Calculate the unique intercepts for each level of species.

    i. Interpret an intercept for a level of species in context of the data. Does this intercept value make logical sense?

    j. Calculate the unique slopes for each level of species.

    j. Interpret a slope for a level of species in context of the data. Please be specific about which level of species this slope is referring to.

    k. Using the p-values for the slope/offsets, what can you say about the differences between the species's body_mass_g?