MATH 314 Homework 13

Due 2026-05-05 by 11:59pm

Download the following dataset into your Homework 13 repository: breast cancer. Here's the associated metadata.

Please push this data set along with your ipynb file for Homework 13.

  1. Fit a logistic regression model to predict whether or not tumor is malignant (diagnosis == "M"). If you are up for a challenge, I could find a model that had accuracy/true positive rate of 93.5%, but the metadata says they found a model (that I could not recreate) which acheived 97.5% accuracy. If you are not up for a challenge, pick a numerical explanatory variable and use it.

  2. Plot the model along one numerical explanatory variable.

  3. Make two predictions, one for which the predicted probability of a malignant tumor is high and one for which the predicted probability is low. Did you have to extrapolate to make such a prediction? In either case, what does that tell you about that quality of the numerical predictor?

  4. Calculate a "slope" relative to the mean of your numerical predictor.

  5. Use our bootstrap function to calculate a 95% confidence interval for the slope you calculated in 4.