## The dataset bike consists of 731 observations of the number of
## bikes that were rented from the company (Capital Bikeshare system,
## Washington D.C., USA)[https://www.capitalbikeshare.com/]. A number
## of variables related to the count (cnt) of bicycles rented each day
## within years 2011 and 2012 were recorded. Build a multiple
## regression model to predict cnt using the variables holiday and
## temp.
## More information about the dataset is available here:
## https://github.com/roualdes/data/blob/master/bike.txt
## Identify the response variable(s) and its(their) statistical type(s).
## Identify the explanatory variable(s) and its(their) statistical type(s).
## Provide R code to make an informative plot of your data/model.
## Write 1 complete English sentence describing the estimated
## intercept for holidays.
## Does the estimated intercept for holidays make sense in context of
## these data. Explain why or why not.
## Write 1 complete English sentence describing the estimated
## intercept for not holidays.
## Does the estimated intercept for not holidays make sense in context
## of these data. Explain why or why not.
## Write 1 complete English sentence describing the estimated slope
## across temp. State clearly to which type(s) of days this applies.
## Provide R code to calculate the mean of temp by levels of the
## categorical variable holiday. If you use any library, be sure to
## load it.
## Write down both R code and mathematical symbols that would make a
## prediction for the cnt of bicycles rented on a holiday when temp is
## equal to its mean, call it temp_bar.
## Interpret your prediction in context of these data.
## Write down both R code and mathematical symbols that would make a
## prediction for the cnt of bicycles rented on a not holiday when
## temp is equal to 3.
## Using words from our class, which prediction is more reasonable at
## temp equal to temp_bar or temp equal to 3? Why?
## Calculate and interpret a 90% confidence interval for an offset
## in context of these data.
## Do these data suggest a significant difference in number of bikes
## rented between holidays and not holidays? Explain.
## Using the dataset
## (opossum)[https://raw.githubusercontent.com/roualdes/data/master/possum.csv]
## create a categorical response variable S with two levels where the
## variable takes on the value 1 for females and 0 otherwise.
## Use the code below to predict the sex of the opossum based on a
## numerical explanatory variable of your choice.
X <- model.matrix()
ll <- function(beta, y, mX) {
lin <- apply(mX, 1, function(row) {sum(beta * row)})
sum( log1p(exp(lin)) - y*lin )
}
beta_hat <- optim()$par
## Use the code below estimate the probability that a opossum is
## female based on the mean of your numerical explanatory
## variable. Interpret, in context of the data, the predicted
## probability.
pred_logistic <- function(mX, betahat) {
lin <- apply(mX, 1, function(row) {sum(betahat * row)})
1 / (1 + exp(-lin))
}
pred_logistic(matrix(c(1, ?), ncol=?), beta_hat)
## Estimate the change in probability given some change in the
## numerical explanatory variable. Interpret, in context of the data,
## the change in predicted probability.
blogistic <- function(data, idx) {
?
diff(pred_logistic(matrix(c(1, ?
1, ?),
ncol=?, byrow=TRUE), beta_hat))
b <- boot::boot() #, ncpus=3, parallel="multicore")
boot::boot.ci()
## Estimate the change in probability given some other change in the
## numerical explanatory variable. Interpret, in context of the data,
## the change in predicted probability.
## Are your confidence intervals the same? Should they be in logistic
## regression? Explain.