```
<- # some number
N <- 501
R <- rep(NA, R) # instantiate a vector of R NAs
means for (r in 1:R) {
<- rF(N, ...) # generate N data from F(...); you'll need to change F
data <- mean(data) # store the rth mean
means[r] }
```

# Worksheet 02: Expectations

Please create a folder named `worksheet02`

and create a QMD file named `main.qmd`

within it. Put your solutions to Worksheet 02 into `main.qmd`

. When finished, Render the file, and submit by dragging and dropping `worksheet02`

, the entire folder along with all its contents, into our shared Google Drive folder. Thus, you’ve successfully submit Worksheet 02 when our shared Google Drive folder has just one folder, named `worksheet02`

, in it and your Worksheet 02 solutions contained within `worksheet02`

.

#### Expectation

Let \(X\) be a discrete random variable with density function \(f\). For an arbitray function \(g\), the expectation of \(g(X)\) is defined as

\[ \mathbb{E}[g(X)] = \sum_{x \in S} g(x) f(x) \]

If \(g(x) = x\), then

\[ \mathbb{E}[X] = \sum_{x \in S} x \cdot f(x) \]

is the expectation of the random variable itself.

If \(g(x) = (x - \mathbb{E}[X])^2\), then we recover the variance

\[ \mathbb{V}[X] = \mathbb{E}[(X - \mathbb{E}[X])^2] = \sum_{x \in S} (x - \mathbb{E}[X])^2 f(x) \]

The variance of a random variable can be re-written as

\[ \mathbb{V}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2 \]

which is generally an easier formula to use for calculations by hand.

The standard deviation of a random variable is defined as

\[ \sqrt{\mathbb{V}[X]} \]

If \(X\) is a continuous random variable, then all sums \(\sum\) are replaced with integrals \(\int\) where the bounds of integration are set by the values the random variable can take on.

## Bernoulli Distribution

The Bernoulli distribution has density function

\[ f(x | p) = p^x (1 - p)^{(1 - x)} \]

for \(x \in \{0, 1\}\) and any \(p \in [0, 1]\). Let \(X \sim \text{Bernoulli}(p)\).

Show, using mathematics, that the expectation of \(X\) is equal to \(p\), \(\mathbb{E}[X] = p\).

Show, using mathematics, that the standard deviation of \(X\) is equal to \(\sqrt{p (1 - p)}\), \(\sqrt{\mathbb{V}[X]} = \sqrt{p (1 - p)}\).

Pick a value for \(p\) and store it into a variable named

`p`

. Using the R function`rbinom(N, 1, p)`

generate \(N\) Bernoulli observations. Approximate`p`

by calculating the mean of the Bernoulli observations.If you increase \(N\), will your approximation tend to get closer to the true value

`p`

or further away? Write your answer and provide R code to justify it.If you decrease \(N\), will your approximation tend to get closer to the true value

`p`

or further away? Write your answer and provide R code to justify it.

## Non-Uniform Discrete Distribution

Suppose the class MATH \(-\infty\) has course grades assigned by the following distribution

Component | Percentage |
---|---|

Final | 40% |

Test 2 | 25% |

Test 1 | 25% |

Quizzes | 5% |

Homework | 5% |

Make up two imaginary students, one who tests well but doesn’t show up to class (no quizzes are taken) and doesn’t do the homework, and one student who does all the homework and attends class regularly but doesn’t test well. For each student make up course component percentages. For instance, student 1 might have earned an average of 10% for component Homework (imagine they only complete 2 of 20 homeoworks), but earned As on all tests.

Which student is likely to do better in the course MATH \(-\infty\)? Explain why, using expectations and the course’s grade distribution as your evidence. Calculate the grades (as expecations) for each student to backup your explanation.

## Continuous Uniform Distribution

The continuous Uniform distribution has density function

\[ f(x | a, b) = \frac{1}{b - a} \]

for \(x \in [a, b]\) and \(a < b\). Let \(X \sim \text{Uniform}(a, b)\).

Show, using mathematics, that the expectation of \(X\) is equal to \(\frac{b + a}{2}\), \(\mathbb{E}[X] = \frac{b + a}{2}\).

Show, using mathematics, that the variance of \(X\) is equal to \(\frac{(b - a)^2}{12}\), \(\mathbb{V}[X] = \frac{(b - a)^2}{12}\).

Pick values for \(a\) and \(b\) and store them into variables named

`a`

and`b`

. Using the R function`runif(N, a, b)`

generate \(N\) Uniform observations. Calculate the mean and variace of the observations using the R functions`mean`

and`var`

. Store these calculations into variables named`m`

and`v`

.In mathematics (I’d start on pen and paper), find solutions (a system of two equations) for \(a\) and \(b\) in terms of the expectations

`m`

\(= \mathbb{E}[X]\) and`v`

\(= \mathbb{V}[X]\).Provide code to estimate \(a\) and \(b\) using only the values of

`m`

and`v`

. They call this cute trick the Method of Moments.If you increase/decrease \(N\), will your approximations of \(a\) and \(b\) tend to get closer to the true values or further away? Write your answer and provide R code to justify your answer in each case (increase and decrease \(N\)).

## The Sample Mean

Pick your favorite distribution with a finite expectation, \(\mathbb{E}[X] < \infty\). Let’s call your distribution \(F\) (for favorite 😃).

Generate \(N\) observations from \(F\) with whatever finite parameter values you want – you can either ask me or Google about how to generate random data from \(F\). Calculate the mean of these observations.

Repeat part a. \(R = 20\) times and store the \(R\) means into a variable named

`means`

. I’d recommend here that you write a for-loop to do this. Here’s some starter code.

Calculate the mean of

`means`

. Is it close to the true expectation based on the parameters your chose?Using the R function

`hist(means, breaks = "fd")`

, make a histogram of your means. Describe the shape, location, and spread of this distribution.Repeart parts b. through d. with \(R = 400\), but with the same \(N\).

What are the major differences (in terms of the shape, location, and spread) between the histogram with \(R = 20\) and \(R = 400\)?