Distribution of \(X\) | density function | \(\mathbb{E}[X]\) | \(\mathbb{V}[X]\) | parameter bounds | |
---|---|---|---|---|---|

Binomial\((K, p)\) | \({K \choose x}p^x(1-p)^{K-x}\) | \(Kp\) | \(Kp(1 - p)\) | \(0 \leq p \leq 1\) |

Repeat yesterday’s class on your own.

Choose a distribution to generate data from, it doesn’t have to be Binomial. Let’s refer to it as \(F\).

Generate

`N = 100`

observations from \(F\) and store them in a variable named`x`

.Calculate and store the sample mean of

`x`

.Wrap parts

`b.,`

and`c.`

in a for loop of length`R = 500`

. Don’t forget to pre-allcoate as necessary.Put your collection of

`R`

sample means into a dataframe.Make a density plot of your

`R`

sample means.Share and discuss with a neighbor. Are your distributions \(F\) the same? Does the density plot still look Normal anyway? Are they same Normal distributions?

Change the sample size

`N`

and see how the almost Normal distribution changes? Does the almost Normal distribution widen or narrow as your sample size increases?

Consider a \(\text{Binomial}(K, p)\) distribution.

Pick values for the population parameters \(K\) and \(p\), store them as variables.

Calculate the expected value and variance using the (population) formulas above. Store these in variables named

`E`

and`V`

.Use

`rbinom(N = 1001, K, p)`

to randomly generate observations from the Binomial distribution and store them in a variable.Put your observations in a dataframe, alongside two other variables in your dataframe. The three variables should be your data, a variable for an index of observations, and one variable for the cumulative mean of observations.

Call

`ggplot()`

with no arguments to create an empty plot, store this as a variable`gp`

.Add to

`gp`

the geometry of a line, by passing to`geom_line()`

your dataframe as the named argument`data`

and passing`aes(...)`

as a second argument.Confirm that your plot makes sense with a friend. Recall the value your sample mean should converge to.

Explain in one complete English sentence, what this plot describes.

Wrap parts

`c., d.,`

and`f.`

in a for loop of length`R = 50`

to generate one plot with multiple converging sample cumulative means.As the sample size increases, what do you notice about the variation in the

`R`

lines?Discuss with a friend how this plot connects to the plot in part

`1.`

.Challenge. Add a

`geom_pointrange`

at your choice of sample size along the x-axis. Look for an example using`geom_pointrange`

towards the bottom of the following page: ggplot2 vertical intervals. This geometry should be based on a new dataframe with four variables.`N`

, the sample size along the x-axis, is the point at which this vertical pointrange will be drawn.`y`

, the expected value, records the point on the y-axis to draw.`ymin`

and`ymax`

, calculated as below, are the extents of the line range. Connecting the named elements in the dataframe to the argument names inside`aes(...)`

within`geom_pointrange()`

is delicate.

x y ymin ymax \(N\) \(E\) \(E - 2 * \sqrt{V/N}\) \(E + 2 * \sqrt{V/N}\) - Challengier challenge. Repeat
`i.`

for evenly spaced values along the x-axis. Aim for 10 point ranges.