MATH 315 Homework 11

Due 2025-12-08 by 11:59pm

Load the dataset about penguins.

  1. Goal: use multiple linear regression to predict body weight using any combination of the other variables you want (except body_mass_g).

    a. Perform step 1 of any regression analysis, using ggplot2. Your plot does not have to match your final model exactly, but it should account for one of the (potentially many) numerical explanatory variables and one of the (potentially many) categorical explanatory variables.

    b. Fit multiple linear regression using whichever explanatory variables you want (other than body_mass_g). You should include at least two numeric variables. If you are feeling competitive, your goal is to find an adjusted greater than 0.85.

    c. Interpret the adjusted of your model in context of the data.

    d. Interpret an intercept in context of the data. Be specific.

    e. Does the interpretted intercept make sense? Why or why not?

    f. Interpret a slope in context of the data. Be specific.

    g. Calculate a prediction of body weight from your model.

    h. Interpret the prediction in context of the data. Be specific.

  2. Goal: use logistic regression to predict the penguin sex using any combination of the other variables you want (other than sex).

    a. Create a new column in the dataset named sexb that stores 1 for females and 0 otherwise.

    b. Perform step 1 of any regression analysis, using ggplot2. Try to use color and geom_jitter(width = w, height = h) where w and h are some values you choose to make the plot look better.

    c. Fit logistic regression using whichever explanatory variables you want (other than sex and sexb) to predict the variable sexb. You should include at least two numeric variables.

    d. Insert the following code into a new code chunk. Then call this function on the variable you created from glm, e.g. if you defined fit <- glm(...), then in a new code chunk call aR2(fit).

    aR2 <- function(fitl) {
        llf <- logLik(fitl)
        llnull <- logLik(update(fitl, .~1))
        lr <- as.numeric(llf - llnull)
        y <- fitl$y
        n <- length(y)
        p <- mean(y)
        m <- 3 * n * p * (1 - p)
        k <- length(fitl$coefficients)
        return(1 - exp(-(lr - k) / m))
    }
    

    e. Interpret the number you got from part d. as you would adjusted in context of this model.

    There's some disagreement on how to best calculate something like adjusted for logistic regression. After some searching around I found a recommended calculation from someone I trust. So I wrote up the recommendation into the function above.

    d. Calculate and interpret, in context of the data, a slope for one of your numeric variables.

    e. Calculate and interpret, in context of the data, two predicted probabilities from your model. You should have one probability greater than 0.5, thus (probably) predicting a female, and one probability less than 0.5, thus (probably) predicting a male.