## Let's try to predict average hospital infection risk using all the
## variables save infection_risk as explanatory variables. We'll use
## the data set https://roualdes.us/data/hospital.csv
## Calculate correlations amongst all the appropriate explanatory
## variables. Pick at least one variable to throw out and explain why
## it is reasonable to do so.
## Use lm() to fit multiple linear regression with the remaining explanatory
## variables, fitting multiple intercepts and multiple slopes across
## the numerical, explanatory variable stay.
## If you were to drop any explanatory variable(s) from the model,
## which would you drop first and why?
## Check the assumptions of your linear model.
## Each coefficient is being tested with a default hypothesis test.
## Write out one example of this test in symbols.
## Why are only three of the four levels of region output? Interpret
## the coefficient estimate of region 1 in the context of these data.
## Are there are any regions for which an increase in a patient's stay
## does significantly increase the infection_risk? Explain.
## Interpret one of the statistically significant slopes in the
## context of these data.
## Justify the calculation of the p-value you used above using the
## appropriate p* function. You may quote without justification the
## Estimate and Std. Error columns from you linear regression output.
## Interpret one of the not statistically significant slopes in the
## context of these data. What does this tell us about this
## variable's ability to predict infection_risk?
## Justify the calculation of the p-value you used above using the
## appropriate p* function. You may quote without justification the
## Estimate and Std. Error columns from you linear regression output.
## Interpret the adjusted $R^2$ value in the context of these data.
## Why is the $R^2$ value larger than the adjusted $R^2$.
## Calculate confidence intervals for a slope and interpret it in
## context.
## Predict the value for infection_risk for the following values of
## the explanatory variables:
## stay age xray beds region nurses
## 8.34 56.9 74 107 3 54
## Try using vectors c(...), *, and sum().
## Try using a data.frame data.frame(stay=8.34, age=56.9, ..., nurses=54) and
## predict.lm; you can confidence intervals from this
## Give an example of extrapolation based on these data. Explain.
## The above data is the third row of the hospital data. Did the
## model under or over predict?
## Calculate the residuals for the third observation. Does your
## answer match the third element of the residuals?