Due: 2019-11-05 by 11:59pm


You are to fit a two sample mean model. For this you need at least one numerical variable and one categorical variable with two levels (two values the categorical variable takes one). You are encouraged to find a dataset that interests you and then create a categorical variable with two levels, if necessary.

Perform a short two sample mean analysis on your two variables. Your analysis should include:

  1. A sentence or two, in your own words (ie not directly copied from the README), explaining what the dataset is all about, what variables you will investigate in your analysis, and why this model is appropriate.

  2. A well labeled, units and all, plot of your variable. Put axis labels on your plot by using bp.labels(...).

  3. A point estimate of the population mean for each level of the categorical variable. Use Scipy’s function minimize(...) along with the simplified log-likelihood from the Normal random variable to estimate the population means.

  4. Add the means for each group to your plot. Color the means differently to make them obvious.

  5. Write one complete English sentence for each population mean, explaining the values you just found, in context of the data.

  6. Use the bootstrap method to produce a confidence interval for the difference of means. Pick any reasonable confidence percentage you want.

  7. Write one complete English sentence describing the confidence interval you just found, in context of the data.

  8. Add to or make a separate well labeled plot that includes a visualization of your analysis. Explain your plot in one complete English sentence, in context of the data.