Bernoulli Distribution

Edward A. Roualdes

Contents

Introduction
Density Function
Estimation

Introduction

The Bernoulli distribution is a simple and flexible statistical model capturing any event that has only two outcomes. In fact, the Bernoulli distribution is the basis of some very sophisticaed binary classifiers in machine learning.

There are two key features of the Bernoulli distribution. This distribution can only take on two values, commonly referred to as failure and success. Failure gets recorded as \(0\) and success as \(1\). The probability of a success has a constant probability \(p\). Because there are only two outcomes under this distribution, the probability of a failure is \(1 - p\).

To model a real world process with the Bernoulli distribution, the researcher is to identify a process that has only two outcomes. Label the outcome of interest a success and the other outcome a failure. If it's reasonable to believe that the probability of success is constant in time and/or space, then there's a reasonable chance the Bernoulli distribution will be an appropriate model.

Density Function

The Bernoulli distribution is one of the simpler density functions because the variable \(x\) can only take on two possible valuesWe write this mathematically as \(x \in \{0, 1\}\) and read this as \(x\) is an element of the set that consists of \(0\) and \(1\). \(0\) or \(1\). The Bernoulli density function is \[ f(x) = p^x (1 - p)^{(1 - x)}. \]

Try on your own to evaluate the density function across both values that \(x\) can take on. Do you understand now why the value \(1\) is associated with the probability \(p\)?

The plot below is interactive. Pick a value for the population parameter \(p\) = . Observe how the plot changes as you change \(p\). Below the plot, you can simulate data from the specified Bernoulli distribution. \(N\), which defaults to \(101\), random observations will appear when you click "sample". The closer \(p\) is to \(1\), the more \(1\)s will appear on the scale below.

Randomly sample Bernoulli observations.

The blue triangle indicates the value of the true population parameter \(p\), and the point at which we expect the scale, if you will, to balance when an infinite number of observations pile up. The pink triangle indicates the estimate \(\hat{p}\) of \(p\). The pink triangle identifies the the value of the estimate \(\hat{p}\) for the randomly sampled data, and hence indicates the fulcrum point of the scale for the finite number of randomly sampled observations.

Repeatedly sample above with varying sample sizes \(N\). Notice how for smaller values of \(N\), the estimated value realized by the pink triangle is less often close to the true population parameter, the blue triangle. That is, with a smaller sample size there is more error in estimation. For larger \(N\), mathematical statistics dictates that an estimate on average will be closer to the true population mean.

Estimation

Estimating the population parameter \(p\) is as simple as adding up all the values Bernoulli observations and dividing by the sample size. That is, calculate the sample meanWe read \( \hat{p} \) as p hat. \( \hat{p} = \frac{1}{N}\sum_{n = 1}^N x_n \). This is exactly what the pink triangle is representing above, in the interactive display.

The plot below attempts to display the fact that larger sample sizes lead to greater precision is estimation. If an estimator, with infinite data, exactly identifies the true population parameter of interest, then statisticians call this estimator consistent.

Sample from a Bernoulli distribution with probability \(p\) = .

Consider one displayedIf you haven't yet clicked "sample" just above this plot, please do so at least once. line . Along the x-axis is the size of a random sample of Bernoulli observations. On the y-axis is the sample mean corresponding to the sample size on the x-axis. Upon each observation, calculate the sum of the observations and divide by the sample size. Each displayed line is then a cumulative mean for a sample of 500 observations. As the sample sizes increase, along the x-axis, the sample mean converges towards the true population mean.


Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International