## Consider the dataset ape::carnivora.
library(ape)
data(carnivora)
library(tidyverse)
## From the variable BW, remove out any observations smaller than
## 0.01.
carn <- carnivora %>% filter(BW > 0.01)
## Fit multiple linear regression, predicting BW with GL and
## SuperFamily.
fit <- lm(BW ~ SuperFamily + GL, data=carn)
summary(fit)
## Write the fitted regression equation.
## hat{BW} = -341.82 + 15.29*Feliformia + 8.58*GL
## Translate an intercept in context.
When a member of Caniformia has a gestation length of zero days, average
birth weight is -341.8. This obviously doesnt make sense.
## Translate the slope in context.
For every one day increase in gestation length, average birth weight
increases by 8.58 grams for members of the order Carnivora.
## What value of the response does the above model predict for a
## member of the super family Feliformia that has a gestation length
## of 59?
predict(fit, newdata=carnivora[82,])
## Calculate the residual for a member of Feliformia that has a
## gestation length of 59 and weights 20 grams at birth.
residuals(fit)[42]
## Calculate the fitted values and (standardized, if using lm()) residuals
yhat <- fitted(fit)
r <- rstandard(fit)
## With plots, check the assumptions on multiple linear regression.
qplot(yhat, r)
qplot(r, geom="histogram", binwidth=1/3)
## Fit multiple linear regression, predicting log(BW) with log(GL) and SuperFamily.
fitl <- lm(log(BW) ~ SuperFamily + log(GL), data=carn)
summary(fitl)
## Write the fitted regression equation.
hat{log{BW}} = -6.32 + -0.02*Feliformia + 2.64*\log{GL}
## Translate an intercept in context.
When a member of Feliforma has a log gestation length of 0 (gestiation
length is equal to 1), then average log of birth weight is -6.32 + -0.02.
## Translate the slope in context.
For every one unit increase in log(gestation length), average
log(birth weight) increases by 2.64.
or
For every one percent increase in gestation length, birth weight
increases by 2.64 percent.
## What value of the response does the above model predict for a
## member of the super family Feliformia that has a gestation length
## of 59?
predict(fitl, newdata=carnivora[82,])
## Calculate the residual for a member of Feliformia that has a
## gestation length of 59 and weights 20 grams at birth.
residuals(fitl)[42]
## Calculate the fitted values and (standardized, if using lm()) residuals
yhat <- fitted(fit)
r <- rstandard(fit)
## With plots, check the assumptions on multiple linear regression.
qplot(yhat, r)
qplot(r, geom="histogram", binwidth=1/3)
## Compare your two fitted models -- adjusted R^2, assumptions,
## prediction accuracy, ... Note which assumptions seem more
## important and why. Is there any grand theorem we have that allows
## us to tolerate irregularities in a particular assumption?
Adjusted R^2 increases by about 11 percent with the transformed linear
model. The assumptions of linearity and constant variation seem much
better in the transformed model. These assumptions are more important
to use, since the Central Limit Theorem will help ensure normality
even if the residuals look a touch worse in the log-transformed model.