Distribution of \(X\) density function \(\mathbb{E}[X]\) \(\mathbb{V}[X]\) parameter bounds
Binomial\((K, p)\) \({K \choose x}p^x(1-p)^{K-x}\) \(Kp\) \(Kp(1 - p)\) \(0 \leq p \leq 1\)
  1. Repeat yesterday’s class on your own.

    1. Choose a distribution to generate data from, it doesn’t have to be Binomial. Let’s refer to it as \(F\).

    2. Generate N = 100 observations from \(F\) and store them in a variable named x.

    3. Calculate and store the sample mean of x.

    4. Wrap parts b., and c. in a for loop of length R = 500. Don’t forget to pre-allcoate as necessary.

    5. Put your collection of R sample means into a dataframe.

    6. Make a density plot of your R sample means.

    7. Share and discuss with a neighbor. Are your distributions \(F\) the same? Does the density plot still look Normal anyway? Are they same Normal distributions?

    8. Change the sample size N and see how the almost Normal distribution changes? Does the almost Normal distribution widen or narrow as your sample size increases?

  2. Consider a \(\text{Binomial}(K, p)\) distribution.

    1. Pick values for the population parameters \(K\) and \(p\), store them as variables.

    2. Calculate the expected value and variance using the (population) formulas above. Store these in variables named E and V.

    3. Use rbinom(N = 1001, K, p) to randomly generate observations from the Binomial distribution and store them in a variable.

    4. Put your observations in a dataframe, alongside two other variables in your dataframe. The three variables should be your data, a variable for an index of observations, and one variable for the cumulative mean of observations.

    5. Call ggplot() with no arguments to create an empty plot, store this as a variable gp.

    6. Add to gp the geometry of a line, by passing to geom_line() your dataframe as the named argument data and passing aes(...) as a second argument.

    7. Confirm that your plot makes sense with a friend. Recall the value your sample mean should converge to.

    8. Explain in one complete English sentence, what this plot describes.

    9. Wrap parts c., d., and f. in a for loop of length R = 50 to generate one plot with multiple converging sample cumulative means.

    10. As the sample size increases, what do you notice about the variation in the R lines?

    11. Discuss with a friend how this plot connects to the plot in part 1..

    12. Challenge. Add a geom_pointrange at your choice of sample size along the x-axis. Look for an example using geom_pointrange towards the bottom of the following page: ggplot2 vertical intervals. This geometry should be based on a new dataframe with four variables. N, the sample size along the x-axis, is the point at which this vertical pointrange will be drawn. y, the expected value, records the point on the y-axis to draw. ymin and ymax, calculated as below, are the extents of the line range. Connecting the named elements in the dataframe to the argument names inside aes(...) within geom_pointrange() is delicate.

    x y ymin ymax
    \(N\) \(E\) \(E - 2 * \sqrt{V/N}\) \(E + 2 * \sqrt{V/N}\)
    1. Challengier challenge. Repeat i. for evenly spaced values along the x-axis. Aim for 10 point ranges.