Worksheet 02: Expectations

Due Date

2023-02-17 Friday @ 11:59pm

Please create a folder named worksheet02 and create a QMD file named main.qmd within it. Put your solutions to Worksheet 02 into main.qmd. When finished, Render the file, and submit by dragging and dropping worksheet02, the entire folder along with all its contents, into our shared Google Drive folder. Thus, you’ve successfully submit Worksheet 02 when our shared Google Drive folder has just one folder, named worksheet02, in it and your Worksheet 02 solutions contained within worksheet02.


Let \(X\) be a discrete random variable with density function \(f\). For an arbitray function \(g\), the expectation of \(g(X)\) is defined as

\[ \mathbb{E}[g(X)] = \sum_{x \in S} g(x) f(x) \]

If \(g(x) = x\), then

\[ \mathbb{E}[X] = \sum_{x \in S} x \cdot f(x) \]

is the expectation of the random variable itself.

If \(g(x) = (x - \mathbb{E}[X])^2\), then we recover the variance

\[ \mathbb{V}[X] = \mathbb{E}[(X - \mathbb{E}[X])^2] = \sum_{x \in S} (x - \mathbb{E}[X])^2 f(x) \]

The variance of a random variable can be re-written as

\[ \mathbb{V}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2 \]

which is generally an easier formula to use for calculations by hand.

The standard deviation of a random variable is defined as

\[ \sqrt{\mathbb{V}[X]} \]

If \(X\) is a continuous random variable, then all sums \(\sum\) are replaced with integrals \(\int\) where the bounds of integration are set by the values the random variable can take on.

Bernoulli Distribution

The Bernoulli distribution has density function

\[ f(x | p) = p^x (1 - p)^{(1 - x)} \]

for \(x \in \{0, 1\}\) and any \(p \in [0, 1]\). Let \(X \sim \text{Bernoulli}(p)\).

  1. Show, using mathematics, that the expectation of \(X\) is equal to \(p\), \(\mathbb{E}[X] = p\).

  2. Show, using mathematics, that the standard deviation of \(X\) is equal to \(\sqrt{p (1 - p)}\), \(\sqrt{\mathbb{V}[X]} = \sqrt{p (1 - p)}\).

  3. Pick a value for \(p\) and store it into a variable named p. Using the R function rbinom(N, 1, p) generate \(N\) Bernoulli observations. Approximate p by calculating the mean of the Bernoulli observations.

  4. If you increase \(N\), will your approximation tend to get closer to the true value p or further away? Write your answer and provide R code to justify it.

  5. If you decrease \(N\), will your approximation tend to get closer to the true value p or further away? Write your answer and provide R code to justify it.

Non-Uniform Discrete Distribution

Suppose the class MATH \(-\infty\) has course grades assigned by the following distribution

Component Percentage
Final 40%
Test 2 25%
Test 1 25%
Quizzes 5%
Homework 5%

Make up two imaginary students, one who tests well but doesn’t show up to class (no quizzes are taken) and doesn’t do the homework, and one student who does all the homework and attends class regularly but doesn’t test well. For each student make up course component percentages. For instance, student 1 might have earned an average of 10% for component Homework (imagine they only complete 2 of 20 homeoworks), but earned As on all tests.

Which student is likely to do better in the course MATH \(-\infty\)? Explain why, using expectations and the course’s grade distribution as your evidence. Calculate the grades (as expecations) for each student to backup your explanation.

Continuous Uniform Distribution

The continuous Uniform distribution has density function

\[ f(x | a, b) = \frac{1}{b - a} \]

for \(x \in [a, b]\) and \(a < b\). Let \(X \sim \text{Uniform}(a, b)\).

  1. Show, using mathematics, that the expectation of \(X\) is equal to \(\frac{b + a}{2}\), \(\mathbb{E}[X] = \frac{b + a}{2}\).

  2. Show, using mathematics, that the variance of \(X\) is equal to \(\frac{(b - a)^2}{12}\), \(\mathbb{V}[X] = \frac{(b - a)^2}{12}\).


The difference of two cubes rule establishes \[ x^3 - y^3 = (x - y)(x^2 + xy + y^2) \]

  1. Pick values for \(a\) and \(b\) and store them into variables named a and b. Using the R function runif(N, a, b) generate \(N\) Uniform observations. Calculate the mean and variace of the observations using the R functions mean and var. Store these calculations into variables named m and v.

  2. In mathematics (I’d start on pen and paper), find solutions (a system of two equations) for \(a\) and \(b\) in terms of the expectations m \(= \mathbb{E}[X]\) and v \(= \mathbb{V}[X]\).

  3. Provide code to estimate \(a\) and \(b\) using only the values of m and v. They call this cute trick the Method of Moments.

  4. If you increase/decrease \(N\), will your approximations of \(a\) and \(b\) tend to get closer to the true values or further away? Write your answer and provide R code to justify your answer in each case (increase and decrease \(N\)).

The Sample Mean

Pick your favorite distribution with a finite expectation, \(\mathbb{E}[X] < \infty\). Let’s call your distribution \(F\) (for favorite 😃).


If you’re unsure, check the Wikipedia page for your distribution and compare it to the page about the Cauchy Distribution. Within the table on the right, there should be a row labeled Mean; it’s undefined for the Cauchy distribution. If your distribution’s mean is defined, then it’s finite.

  1. Generate \(N\) observations from \(F\) with whatever finite parameter values you want – you can either ask me or Google about how to generate random data from \(F\). Calculate the mean of these observations.

  2. Repeat part a. \(R = 20\) times and store the \(R\) means into a variable named means. I’d recommend here that you write a for-loop to do this. Here’s some starter code.

N <- # some number
R <- 501
means <- rep(NA, R) # instantiate a vector of R NAs
for (r in 1:R) {
    data <- rF(N, ...)     # generate N data from F(...); you'll need to change F
    means[r] <- mean(data) # store the rth mean
  1. Calculate the mean of means. Is it close to the true expectation based on the parameters your chose?

  2. Using the R function hist(means, breaks = "fd"), make a histogram of your means. Describe the shape, location, and spread of this distribution.

  3. Repeart parts b. through d. with \(R = 400\), but with the same \(N\).

  4. What are the major differences (in terms of the shape, location, and spread) between the histogram with \(R = 20\) and \(R = 400\)?