1. Pick a dataset (CSV file) from my GitHub repository named data. Most files READMEs as .txt files1. Perform a short analysis on a single numerical variable of your choice. A short analysis should include at least:

    1. A sentence or two, in your own words (ie not directly copied from the README), explaining what the dataset is all about and what variable you will investigate in your analysis. Be sure to be explicit about what the observations are.

    2. A well labeled, units and all, plot of your variable. Put axis labels on your plot by using bp.labels(...).

    3. A point estimate of the population mean. Use Scipy’s function minimize(...) along with the simplified log-likelihood from the Normal random variable to estimate the population mean.

    4. Write one complete English sentence explaining the value you just found, in context of the data.

    5. Use the bootstrap method to produce a confidence interval, for a percent confidence of your choice.

    6. Write one complete English sentence describing the confidence interval you just found, in context of the data.

    7. Use the bootstrap method to produce a confidence interval, for a different percent confidence of your choice.

    8. Write one complete English sentence describing this new confidence interval you just found, in context of the data.

    9. Which confidence interval is wider? Explain why this makes sense.

    10. Does either confidence interval contain the true population mean? Which is more likely to contain the true population mean? In what sense are you more confident that one of the confidence intervals is more likely to contain the true population mean?

    11. Add to or make a separate well labeled plot that includes a visualization of your analysis.

  2. Use the dataset email to perform a short analysis using the paired data model. Provide an answer to the question: Which is the more common word in the average email, “viagra” or “password”?

    Your analysis should include all the same components as above, plus a justification for your choice of using a paired data model, ie what makes these data paired? or what are they paired on? Please note that the code is not terribly different from the code above, so my main concern is your ability to interpret your data and your analysis of these data in context of the dataset.

  3. Use the dataset finches to perform a short analysis using our multiple means models. This means you must choose one numerical variable and one categorical variable with more than two levels.

    Explain the crucial steps and the purpose of the bootstrap procedure.


  1. If there isn’t an associated README consider helping me out by writing one and filing a PR.