---
title: "Worksheet 01: Distributions, Density Functions, and Data"
format:
html:
self-contained: true
toc: true
toc-location: right
---
::: {.callout-note}
## Due Date
2023-02-03 Friday @ 11:59pm
:::
Please create a folder named `worksheet01` and create a QMD file named
`main.qmd` within it. Put your solutions to Worksheet 01 into
`main.qmd`. When finished, Render the file, and submit by dragging
and dropping `worksheet01`, the entire folder along with all its
contents, into our shared Google Drive folder. Thus, you've
successfully submit Worksheet 01 when our shared Google Drive folder
has just one folder, named `worksheet01`, in it and your Worksheet 01
solutions contained within `worksheet01`.
For each section below, you should produce exactly one plot. So
despite the length of this assignment, you will submit exactly 2
plots.
## Bernoulli Distribution
Think of a process which could reasonably follow a Bernoulli
distribution.
a. Make up a value for $p$. Write at least one sentence describing this
process and what the value $p$ suggests.
a. Store your chosen value of $p$ into a variable named `p`.
b. Assume a random variable $X$ follows the Bernoulli distribution
with your chosen value of $p$, $X \sim \text{Bernoulli}(p)$. Show
that the expectation of $X$ is equal to $p$, $\mathbb{E}[X] = p$.
Type out your solution in mathematics.
c. Make a plot of the Bernoulli distribution for your chosen value
of $p$. Recall that the only values a Bernoulli random variable
takes on are $0$ and $1$. The R function `dbinom(x, 1, p)` will
evaluate density function for you, which recall has mathematical
expression
$$ f(x | p) = p^x (1 - p) ^ {(1 - x)} $$
d. Use the R function `rug` to put a little tick along the x-axis of
your plot at the value of the expectation of $X$, $\mathbb{E}[X] = p$.
c. Use the R function `rbinom(N, 1, p)` to generate $N = 50$
random data from the Bernoulli distribution with your chosen value
of $p$. Store the data into a variable named `data`.
d. Calculate the mean of `data` using the R function `mean()` and store
it into a variable named `phat`.
a. Add to your plot, using the R function `points()`, the values
`phat` and `1 - phat`. Color these points red by adding `col =
"red"` to the function `points`.
a. Add to your plot, using the R function `rug`, the value
`phat`. Color this line red.
a. If you increase the value of $N$ from 50 to something bigger,
and then regenerate the plots, is `phat` closer or further from
`p` on average? Why?
a. If you decrease the value of $N$ from 50 to something smaller,
and then regenerate the plots, is `phat` closer or further from
`p` on average? Why?
## Exponential Distribution
Think of a process which could reasonably follow a Exponential
Distribution.
a. Make up a value for $\lambda$. Write at least one sentence describing this
process and what the value $\lambda$ suggests.
a. Store your chosen value of $\lambda$ into a variable named `l`.
a. Assume a random variable $Y$ follows the Exponential distribution
with your chosen value of $\lambda$, $Y \sim
\text{Exponential}(\lambda)$. Note that the expectation of $Y$ is
equal to $1 / \lambda$,
$$ \mathbb{E}[Y] = 1 / \lambda $$
a. Use the R function `plot(x, y, type = "l")` to make a plot of the
Exponential distribution for your chosen value of $\lambda$. Recall
that a random variable that follows the Exponential distribution can
take on any non-negative value, $x \geq 0$. The R function `dexp(x,
l)` will evaluate the density function for you, which recall has
mathematical expression
$$ f(x | \lambda) = \lambda e ^{-x \lambda} $$
a. Use the R function `rug` to put a little tick along the x-axis of
your plot at the value of the expectation of $Y$, $\mathbb{E}[Y] = 1 /
\lambda$.
a. Use the R function `rexp(N, l)` to generate $N = 40$ random data
from the Exponential distribution with your chosen value of $\lambda$.
Store the data into a variable named `data`.
a. Calculate the mean of `data` using the R function `mean` and store
it into a variable named `invlhat`, which reads as "inverse of l-hat";
think $1 / \hat{l}$.
a. Create a new variable `lhat` which is calculated as the `1 /
invlhat`; think $\hat{l} = 1 / (1 / \hat{l})$.
a. Add to your plot, using the R function `lines`, the density
function of the Exponential distribution defined by the value of
`lhat` computed from the random data.
a. Add to your plot, using the R function `rug`, the value
`invlhat`. Color this line red.
a. If you increase the value of $N$ from 40 to something bigger, and
then regenerate the plots, is `lhat` closer or further from `l` on
average?
a. If you decrease the value of $N$ from 40 to something smaller, and
then regenerate the plots, is `lhat` closer or further from `l` on
average?