# Worksheet 01: Distributions, Density Functions, and Data

Please create a folder named `worksheet01`

and create a QMD file named `main.qmd`

within it. Put your solutions to Worksheet 01 into `main.qmd`

. When finished, Render the file, and submit by dragging and dropping `worksheet01`

, the entire folder along with all its contents, into our shared Google Drive folder. Thus, you’ve successfully submit Worksheet 01 when our shared Google Drive folder has just one folder, named `worksheet01`

, in it and your Worksheet 01 solutions contained within `worksheet01`

.

For each section below, you should produce exactly one plot. So despite the length of this assignment, you will submit exactly 2 plots.

## Bernoulli Distribution

Think of a process which could reasonably follow a Bernoulli distribution.

Make up a value for \(p\). Write at least one sentence describing this process and what the value \(p\) suggests.

Store your chosen value of \(p\) into a variable named

`p`

.Assume a random variable \(X\) follows the Bernoulli distribution with your chosen value of \(p\), \(X \sim \text{Bernoulli}(p)\). Show that the expectation of \(X\) is equal to \(p\), \(\mathbb{E}[X] = p\). Type out your solution in mathematics.

Make a plot of the Bernoulli distribution for your chosen value of \(p\). Recall that the only values a Bernoulli random variable takes on are \(0\) and \(1\). The R function

`dbinom(x, 1, p)`

will evaluate density function for you, which recall has mathematical expression

\[ f(x | p) = p^x (1 - p) ^ {(1 - x)} \]

Use the R function

`rug`

to put a little tick along the x-axis of your plot at the value of the expectation of \(X\), \(\mathbb{E}[X] = p\).Use the R function

`rbinom(N, 1, p)`

to generate \(N = 50\) random data from the Bernoulli distribution with your chosen value of \(p\). Store the data into a variable named`data`

.Calculate the mean of

`data`

using the R function`mean()`

and store it into a variable named`phat`

.Add to your plot, using the R function

`points()`

, the values`phat`

and`1 - phat`

. Color these points red by adding`col = "red"`

to the function`points`

.Add to your plot, using the R function

`rug`

, the value`phat`

. Color this line red.If you increase the value of \(N\) from 50 to something bigger, and then regenerate the plots, is

`phat`

closer or further from`p`

on average? Why?If you decrease the value of \(N\) from 50 to something smaller, and then regenerate the plots, is

`phat`

closer or further from`p`

on average? Why?

## Exponential Distribution

Think of a process which could reasonably follow a Exponential Distribution.

Make up a value for \(\lambda\). Write at least one sentence describing this process and what the value \(\lambda\) suggests.

Store your chosen value of \(\lambda\) into a variable named

`l`

.Assume a random variable \(Y\) follows the Exponential distribution with your chosen value of \(\lambda\), \(Y \sim \text{Exponential}(\lambda)\). Note that the expectation of \(Y\) is equal to \(1 / \lambda\),

\[ \mathbb{E}[Y] = 1 / \lambda \]

- Use the R function
`plot(x, y, type = "l")`

to make a plot of the Exponential distribution for your chosen value of \(\lambda\). Recall that a random variable that follows the Exponential distribution can take on any non-negative value, \(x \geq 0\). The R function`dexp(x, l)`

will evaluate the density function for you, which recall has mathematical expression

\[ f(x | \lambda) = \lambda e ^{-x \lambda} \]

Use the R function

`rug`

to put a little tick along the x-axis of your plot at the value of the expectation of \(Y\), \(\mathbb{E}[Y] = 1 / \lambda\).Use the R function

`rexp(N, l)`

to generate \(N = 40\) random data from the Exponential distribution with your chosen value of \(\lambda\). Store the data into a variable named`data`

.Calculate the mean of

`data`

using the R function`mean`

and store it into a variable named`invlhat`

, which reads as “inverse of l-hat”; think \(1 / \hat{l}\).Create a new variable

`lhat`

which is calculated as the`1 / invlhat`

; think \(\hat{l} = 1 / (1 / \hat{l})\).Add to your plot, using the R function

`lines`

, the density function of the Exponential distribution defined by the value of`lhat`

computed from the random data.Add to your plot, using the R function

`rug`

, the value`invlhat`

. Color this line red.If you increase the value of \(N\) from 40 to something bigger, and then regenerate the plots, is

`lhat`

closer or further from`l`

on average?If you decrease the value of \(N\) from 40 to something smaller, and then regenerate the plots, is

`lhat`

closer or further from`l`

on average?