Worksheet 01: Distributions, Density Functions, and Data
2023-02-03 Friday @ 11:59pm
Please create a folder named worksheet01
and create a QMD file named main.qmd
within it. Put your solutions to Worksheet 01 into main.qmd
. When finished, Render the file, and submit by dragging and dropping worksheet01
, the entire folder along with all its contents, into our shared Google Drive folder. Thus, you’ve successfully submit Worksheet 01 when our shared Google Drive folder has just one folder, named worksheet01
, in it and your Worksheet 01 solutions contained within worksheet01
.
For each section below, you should produce exactly one plot. So despite the length of this assignment, you will submit exactly 2 plots.
Bernoulli Distribution
Think of a process which could reasonably follow a Bernoulli distribution.
Make up a value for \(p\). Write at least one sentence describing this process and what the value \(p\) suggests.
Store your chosen value of \(p\) into a variable named
p
.Assume a random variable \(X\) follows the Bernoulli distribution with your chosen value of \(p\), \(X \sim \text{Bernoulli}(p)\). Show that the expectation of \(X\) is equal to \(p\), \(\mathbb{E}[X] = p\). Type out your solution in mathematics.
Make a plot of the Bernoulli distribution for your chosen value of \(p\). Recall that the only values a Bernoulli random variable takes on are \(0\) and \(1\). The R function
dbinom(x, 1, p)
will evaluate density function for you, which recall has mathematical expression
\[ f(x | p) = p^x (1 - p) ^ {(1 - x)} \]
Use the R function
rug
to put a little tick along the x-axis of your plot at the value of the expectation of \(X\), \(\mathbb{E}[X] = p\).Use the R function
rbinom(N, 1, p)
to generate \(N = 50\) random data from the Bernoulli distribution with your chosen value of \(p\). Store the data into a variable nameddata
.Calculate the mean of
data
using the R functionmean()
and store it into a variable namedphat
.Add to your plot, using the R function
points()
, the valuesphat
and1 - phat
. Color these points red by addingcol = "red"
to the functionpoints
.Add to your plot, using the R function
rug
, the valuephat
. Color this line red.If you increase the value of \(N\) from 50 to something bigger, and then regenerate the plots, is
phat
closer or further fromp
on average? Why?If you decrease the value of \(N\) from 50 to something smaller, and then regenerate the plots, is
phat
closer or further fromp
on average? Why?
Exponential Distribution
Think of a process which could reasonably follow a Exponential Distribution.
Make up a value for \(\lambda\). Write at least one sentence describing this process and what the value \(\lambda\) suggests.
Store your chosen value of \(\lambda\) into a variable named
l
.Assume a random variable \(Y\) follows the Exponential distribution with your chosen value of \(\lambda\), \(Y \sim \text{Exponential}(\lambda)\). Note that the expectation of \(Y\) is equal to \(1 / \lambda\),
\[ \mathbb{E}[Y] = 1 / \lambda \]
- Use the R function
plot(x, y, type = "l")
to make a plot of the Exponential distribution for your chosen value of \(\lambda\). Recall that a random variable that follows the Exponential distribution can take on any non-negative value, \(x \geq 0\). The R functiondexp(x, l)
will evaluate the density function for you, which recall has mathematical expression
\[ f(x | \lambda) = \lambda e ^{-x \lambda} \]
Use the R function
rug
to put a little tick along the x-axis of your plot at the value of the expectation of \(Y\), \(\mathbb{E}[Y] = 1 / \lambda\).Use the R function
rexp(N, l)
to generate \(N = 40\) random data from the Exponential distribution with your chosen value of \(\lambda\). Store the data into a variable nameddata
.Calculate the mean of
data
using the R functionmean
and store it into a variable namedinvlhat
, which reads as “inverse of l-hat”; think \(1 / \hat{l}\).Create a new variable
lhat
which is calculated as the1 / invlhat
; think \(\hat{l} = 1 / (1 / \hat{l})\).Add to your plot, using the R function
lines
, the density function of the Exponential distribution defined by the value oflhat
computed from the random data.Add to your plot, using the R function
rug
, the valueinvlhat
. Color this line red.If you increase the value of \(N\) from 40 to something bigger, and then regenerate the plots, is
lhat
closer or further froml
on average?If you decrease the value of \(N\) from 40 to something smaller, and then regenerate the plots, is
lhat
closer or further froml
on average?