MATH 314 Homework 09
Due 2025-12-08 by 11:59pm
Load the dataset about penguins.
-
Goal: use multiple linear regression to predict body weight using any combination of the other variables you want (except
body_mass_g).a. Make a plot that at least vaguely represents your model. Your plot does not have to match your final model exactly, but it should account for one of the (potentially many) numerical explanatory variables and one of the (potentially many) categorical explanatory variables.
b. Fit multiple linear regression using whichever explanatory variables you want (other than
body_mass_g). You should include at least two numeric variables. If you are feeling competitive, your goal is to find an adjustedgreater than 0.85. c. Calculate a prediction for body weight from your model.
d. Interpret the prediction in context of the data.
-
Goal: use logistic regression to predict the penguin sex using any combination of the other variables you want (other than
sex).a. Create a new column in the dataset named
sexbthat stores 1 for females and 0 otherwise.b. Make a plot that vaguely represents your model. Try to use color and uniform random noise along either/both the x,y-axes to make the plot look better.
c. Fit logistic regression using whichever explanatory variables you want (other than
sexandsexb) to predict the variablesexb. You should include at least two numeric variables. Your goal is to find a model that attains the largest value for adjusted, where the function below calculates adjusted for you. Insert the following code into a new code cell. Then call this function on the variable you created from
glm, e.g. if you definedfit = smf.glm(...).fit(), then in a new code chunk callaR2(fit).def aR2(fitl): lr = fit.llf - fit.llnull n = fit.nobs p = np.mean(fit._endog) m = 3 * n * p * (1 - p) k = np.size(fit.params) return 1 - np.exp(-(lr - k) / m)There's some disagreement on how to best calculate something like adjusted
for logistic regression. After some searching around I found a recommended calculation from someone I trust. So I wrote up the recommendation into the function above. d. Calculate and interpret, in context of the data, a slope for one of your numeric variables.
e. Calculate and interpret, in context of the data, two predicted probabilities from your model. You should have one probability greater than 0.5, thus (probably) predicting a female, and one probability less than 0.5, thus (probably) predicting a male.