Worksheet 01: Distributions, Density Functions, and Data

Due Date

2023-02-03 Friday @ 11:59pm

Please create a folder named worksheet01 and create a QMD file named main.qmd within it. Put your solutions to Worksheet 01 into main.qmd. When finished, Render the file, and submit by dragging and dropping worksheet01, the entire folder along with all its contents, into our shared Google Drive folder. Thus, you’ve successfully submit Worksheet 01 when our shared Google Drive folder has just one folder, named worksheet01, in it and your Worksheet 01 solutions contained within worksheet01.

For each section below, you should produce exactly one plot. So despite the length of this assignment, you will submit exactly 2 plots.

Bernoulli Distribution

Think of a process which could reasonably follow a Bernoulli distribution.

  1. Make up a value for \(p\). Write at least one sentence describing this process and what the value \(p\) suggests.

  2. Store your chosen value of \(p\) into a variable named p.

  3. Assume a random variable \(X\) follows the Bernoulli distribution with your chosen value of \(p\), \(X \sim \text{Bernoulli}(p)\). Show that the expectation of \(X\) is equal to \(p\), \(\mathbb{E}[X] = p\). Type out your solution in mathematics.

  4. Make a plot of the Bernoulli distribution for your chosen value of \(p\). Recall that the only values a Bernoulli random variable takes on are \(0\) and \(1\). The R function dbinom(x, 1, p) will evaluate density function for you, which recall has mathematical expression

\[ f(x | p) = p^x (1 - p) ^ {(1 - x)} \]

  1. Use the R function rug to put a little tick along the x-axis of your plot at the value of the expectation of \(X\), \(\mathbb{E}[X] = p\).

  2. Use the R function rbinom(N, 1, p) to generate \(N = 50\) random data from the Bernoulli distribution with your chosen value of \(p\). Store the data into a variable named data.

  3. Calculate the mean of data using the R function mean() and store it into a variable named phat.

  4. Add to your plot, using the R function points(), the values phat and 1 - phat. Color these points red by adding col = "red" to the function points.

  5. Add to your plot, using the R function rug, the value phat. Color this line red.

  6. If you increase the value of \(N\) from 50 to something bigger, and then regenerate the plots, is phat closer or further from p on average? Why?

  7. If you decrease the value of \(N\) from 50 to something smaller, and then regenerate the plots, is phat closer or further from p on average? Why?

Exponential Distribution

Think of a process which could reasonably follow a Exponential Distribution.

  1. Make up a value for \(\lambda\). Write at least one sentence describing this process and what the value \(\lambda\) suggests.

  2. Store your chosen value of \(\lambda\) into a variable named l.

  3. Assume a random variable \(Y\) follows the Exponential distribution with your chosen value of \(\lambda\), \(Y \sim \text{Exponential}(\lambda)\). Note that the expectation of \(Y\) is equal to \(1 / \lambda\),

\[ \mathbb{E}[Y] = 1 / \lambda \]

  1. Use the R function plot(x, y, type = "l") to make a plot of the Exponential distribution for your chosen value of \(\lambda\). Recall that a random variable that follows the Exponential distribution can take on any non-negative value, \(x \geq 0\). The R function dexp(x, l) will evaluate the density function for you, which recall has mathematical expression

\[ f(x | \lambda) = \lambda e ^{-x \lambda} \]

  1. Use the R function rug to put a little tick along the x-axis of your plot at the value of the expectation of \(Y\), \(\mathbb{E}[Y] = 1 / \lambda\).

  2. Use the R function rexp(N, l) to generate \(N = 40\) random data from the Exponential distribution with your chosen value of \(\lambda\). Store the data into a variable named data.

  3. Calculate the mean of data using the R function mean and store it into a variable named invlhat, which reads as “inverse of l-hat”; think \(1 / \hat{l}\).

  4. Create a new variable lhat which is calculated as the 1 / invlhat; think \(\hat{l} = 1 / (1 / \hat{l})\).

  5. Add to your plot, using the R function lines, the density function of the Exponential distribution defined by the value of lhat computed from the random data.

  6. Add to your plot, using the R function rug, the value invlhat. Color this line red.

  7. If you increase the value of \(N\) from 40 to something bigger, and then regenerate the plots, is lhat closer or further from l on average?

  8. If you decrease the value of \(N\) from 40 to something smaller, and then regenerate the plots, is lhat closer or further from l on average?