Distribution of \(X\) density function \(\mathbb{E}[X]\) \(\mathbb{V}[X]\) parameter bounds
Binomial\((K, p)\) \({K \choose x}p^x(1-p)^{K-x}\) \(Kp\) \(Kp(1 - p)\) \(0 \leq p \leq 1\)
  1. Assume the random variable \(X\) follows the Normal distribution, \(X \sim \text{N}(\mu, \sigma^2)\), with \(\mathbb{E}[X] = \mu\) and \(\mathbb{V}[X] = \sigma^2\).

    1. Choose a value of the mean \(\mu\) and standard deviation \(\sigma\), and store them as variables. If you want to use the English language equivalents, use the variable names mu and sigma.

    2. Using seq() to generate a sequence of 501 values from mu - 4 * sigma to mu + 4 * sigma named x. The argument length.out allows you specify how long you want the vector x to be.

    3. Put x into a dataframe, along with a column of the evaluation of the Normal distribution’s density function, dnorm(x, mean = mu, sd = sigma).

    4. Make a plot of your Normal distribution’s density function using geom_line().

    5. Randomly generate N = 1001 observations from your Normal distribution, using rnorm(...).

    6. Estimate the following probabilities using your vector of observations,
      1. \(\mathbb{P}[\) mu - sigma \(< X <\) mu + sigma \(]\).
      2. \(\mathbb{P}[\) mu - 2*sigma \(< X <\) mu + 2*sigma \(]\).
      3. \(\mathbb{P}[\) mu - 3*sigma \(< X <\) mu + 3*sigma\(]\).
  2. Choose parameters \(K, p\) for a Binomial distribution to generate random data from. Let’s call this distribution \(F\).

    1. Generate N = 100 observations from \(F\) and store them in a variable named x.

    2. Calculate and store the sample mean of x, call it Ehat.

    3. Wrap parts a., and b. in a for loop of length R = 1000. Don’t forget to pre-allcoate as necessary.

    4. Put your collection of R sample means into a dataframe.

    5. Make a density plot of your R sample means.

    6. Use var() to calculate the sample variance of your R sample means, call it Vhat_Ehat.

    7. Estimate the following probabilities using your vector of R sample means. I’m using the symbol \(K\hat{p}\) to represent as a random variable (recall estimates are now random variables) your vector of sample means.

      1. \(\mathbb{P}[\) K*p - sqrt(Vhat_Ehat) \(< K\hat{p} <\) K*p + sqrt(Vhat_Ehat) \(]\).
      2. \(\mathbb{P}[\) K*p - 2*sqrt(Vhat_Ehat) \(< K\hat{p} <\) K*p + 2*sqrt(Vhat_Ehat) \(]\).
      3. \(\mathbb{P}[\) K*p - 3*sqrt(Vhat_Ehat) \(< K\hat{p} <\) K*p + 3*sqrt(Vhat_Ehat) \(]\).
    8. Your answers from h. should be close to the calculations in 1.f.. Compare the numbers and decide how far off they are. Try increasing or decreasing R to see how close these probabilities are to the ones above.