MATH 314 Practice Exam 02/Final

Exam 02/Final on Wednesday 2026-05-13 from 2 - 3.50pm in Holt 291

I tried really hard to make exactly 5 things wrong in each code chuck of each of the problems below, but counting is hard.

Please cross out the wrong character(s) and write an ordered, comma separated list of replacement characters after the comment #. Place a line through nothing to denote that you want to add characters at that point. To delete characters without replacement, use under-scores (e.g. ____) as empty replacements of the characters you cross out.

If you don't know the correct Python syntax for the replacement characters you want, make something not unreasonable up.

You will not receive credit if you cross out an entire line, even if your fix is correct.

You should assume the following import statements precede each code chunk in each question.

import plotnine as pn
import numpy as np
import pandas as pd
import patsy as pt
import statsmodels.api as sm
import scipy.stats as spicy
from scipy.optimize import minimize
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-07-07/coffee_ratings.csv"
df = pd.read_csv(url)

def bootstrap(arr, T, R = 1_000):
    N = np.shape(arr)[0]
    Ts = np.zeros(R)
    rng = np.random.default_rng()
    for r in range(R):
        idx = rng.integers(N, size = N)
        if type(arr) is np.ndarray:
            Ts[r] = T(arr[idx])
        else:
            Ts[r] = T(arr.iloc[idx])
    return Ts
  1. Using the dataset df, filter the dataset such that the variable total_cup_points has only values greater than 40 and the variable altitude_mean_meters has only values less than 2000. Remove any rows containing NaNs in only either of these columns. Plot these variables using plotnine, making sure that total_cup_points is the response variable, and color the points by species.

  2. Fit linear regression to predict total_cup_points using a shared slope on aftertaste and unique intercepts and slopes on altitude_mean_meters by species. Species has values Arabica and Robusta. Make a prediction for an Arabica coffee's total cup points when it has mean aftertaste and mean altitude_mean_meters for only Arabica coffees.

  3. Use the function bootstrap above to calculate an 87% confidence interval for the slope on altitude_mean_meters, for the species Robusta and a median aftertaste, of the linear regression model in 2.

  4. Fit logistic regression to predict whether or not a coffee is the species Arabica. You first have to create an appropriate response variable. Use an interaction term between aftertaste and altitude_mean_meters. Predict the probability that a coffee is a Robusta given mean values for both after taste and altitude.

  5. Write a function that can be used with Scipy's minimize() to fit a linear regression model of an arbitrary number of predictors. Use this function, together with minimize() to fit a linear regression model predicting total_cup_points with unique intercepts by species and a shared slope on aftertaste. Use the Python library patsy to obtain the design matrices appropriate for this model and then use minimize() to estimate the coeficients.