MATH 456 Homework 07
Due 2026-04-02 by 11:59pm
Download the following dataset into your Homework 07 repository:
penguins.
Here's the
metadata.
Please push this dataset along with your qmd file for
Homework 07. I don't need the output file nor any of the output
file's dependencies.
-
Use our function
ll_linregandoptimto find the estimated coefficients for predicting body mass with unique intercepts for each sex and species and a shared slope across bill length. -
Use our functions
ll_logistic,ll_logistic_grad, andoptimto find the estimated coefficients for predicting the probability that a penguin's body mass exceeds the mean body mass of all penguins with unique "intercepts" across species and a shared "slope" across bill length.I understand the language unique/shared intercepts and/or slopes does not fit logistic regression well. Nonetheless, it does help describe how to include explanatory variables in the R
lm/glmformulae for all regression methods (linear, logistic, poisson, beta, etc.). For this reason, the world sticks with it even though it's technically inaccurate. -
For the rest of this assignment, you should use
glmto fit logistic regression.a. Fit a logistic regression model that predicts the probability of a penguin being male with unique "intercepts" for species and shared slopes across bill length and body mass.
b. Calculate the means of body mass and bill length and store them in variables.
c. Although the values in the column "Estimate", of the output from the summary of a
glmmodel, are hard to interpret, the signs and magnitudes of the estimates generally make sense. Usepredictto help you figure out which species are more likely to be male given mean values of the numeric predictors. Describe your findings.d. Use
predictto help you figure out which values of the numeric predictors correspond to higher probabilities that a penguin is male. Fix body mass at its mean, and change the value of bill length. Then fix bill length at its mean, and change the value of body mass. Describe your findings.e. Calculate and interpret a prediction for an Adelie penguin with a mean body mass and mean bill length.
f. Calculate and interpret a prediction for an Gentoo penguin with a mean body mass and mean bill length.
g. Calculate and interpret two slopes: the change in probability that an Adelie penguin is male given a one unit change in body mass. Choose values for your calculations such that one slope is (relatively) large and one is (relatively) small.
h. Calculate and interpret two slopes: the change in probability that an Chinstrap penguin is male given a one unit change in bill length. Choose values for your calculations such that one slope is (relatively) large and one is (relatively) small.
i. There exists an
-like thing for logistic regression, if you try hard enough. If you are interested, see here. The edit from December 03, 2025 was based on an email I sent him :) Bragging aside, what simpler criteria might help you evaluate the goodness of a logistic regression model? We've learned we can't simply rely on the p-values associated with each coefficient. Could we use predicted probabilities to guess when a response variable will be 1 or 0? What could we compare such predictions to?