Basic Probability

Edward A. Roualdes

Contents

Introduction
Probability Over Finite Sets
Axioms of Probability
Independent Events
Conditional Probability
Bayes' Theorem
Law of Total Probability

Introduction

Probability is a seemingly simple and yet deceptively complicated subject. Though much of it can be concealed and this is the route we'll take here.

These notes first introduce some notation of probability, alongside a core set of beliefs that most people tend to agree on.This is not to say that most people agree on the definition of probability. See for example Dr. Martha K. Smith's short essay title "What is Probability?". With this notation in hand, we state same basic rules of working with probabilities. Note that these two sections are inherently different than the last section Estimating Probabilities. In Axioms of Probabilities and Rules of Probabilities, we claim knowledge of the probabilities of interest. This is not always the case in the real world, and thus one must also attend to making educated guesses about probabilities that we claim exist. In the last section Estimating Probabilities, we discuss how to estimate unknown probabilities.

Probability Over Finite Sets

Before we get to a more theoretical treatment of probability, let's introduce the notation of probability. Syntactically, probability works like a function. We'll use the bold, capital letter p, \( \mathbb{P} \).

A probability function \( \mathbb{P} \) acts on sets, called events in statistics, instead of numbers like you're probably used to. Hence, probability is a set function.Here's a refresher on sets and set theory, in case you want it. A function \( \mathbb{P} \) acts on sets and returns a real number guaranteed to be in the set \( [0, 1] \).Mathematically, we'd write \( \mathbb{P}: S \mapsto [0, 1] \)

Notice that we have yet to specify how a function \( \mathbb{P} \) maps sets to real numbers between \( 0 \) and \(1 \). So far, we've just said that it does this. In fact, the way a function \( \mathbb{P} \) maps sets to real numbers between \( 0 \) and \( 1 \) can be quite complex. Specifying more complex examples of \( \mathbb{P} \) will be discussed in greater detail under the topic of probability distributions. For now, let's consider a case of \( \mathbb{P} \) specified on finite sets.

Consider the set \( S = \{1, 2, 3, 4, 5, 6\} \) and \( A = \{2, 4, 6 \} \) a susbet of \( S \). We seek to define a set function that produces the probability \( 1/2 \) when applied to \( A \), \( \mathbb{P}[A] = 1/2 \). We'd also like this same set function to be more general, such that it produces similarly intuitive results for other subsets of \( S \).

Luckily, since \( S \) is finite, a relatively simple solution works here. Put \( \mathbb{P} \) to be the set function that maps arbitrary sets \( B \subset S \) to the fraction \[ \mathbb{P}[B] = \frac{|B|}{|S|}. \] Recall from the notes on Basic Set Theory that the cardinality \( | \cdot | \) of a finite set counts the elements. Thus, if we applied this set function to \( A \subset S \), we'd get \( \mathbb{P}[A] = 3/6 = 1/2 \), as desired. With this more general definition of \( \mathbb{P} \) applied to finite sets, we will hence forward refer to it as a probability distribution.Such functions \( \mathbb{P} \) are referred to as probability distributions because they distribute probability across subsets of the set \( S \) on which they act.

The probability distribution above also yields other intuitive results. For instance, \( \mathbb{P}[\{1\}] = 1/6 \). Or more generally, \( \mathbb{P}[\{s\}] = 1/6 \) for any single element \(s \in S \). This is the same logic that yields equal probabilities and thus fairness in a coin, a die, or a standard deck of cards.

There are other ways one could distribute probability across a set \( S \) than what is defined above. The specific choice above yields intuitive results, but is not otherwise special. You could, for instance, define your own probability distribution that assigns unequal weights to the two sides of a coin.

We'll defer discussion about more complex versions of \( \mathbb{P} \) until the notes Probability Distributions. Below, we describe some general properties of arbitrary probability distributions. The three axioms of probability are what separate general set functions from probability distributions. As an exercise throughout the next section, verify that our probability distribution defined above meets all the axioms of probability.

Axioms of Probability

Despite lacking one definition that satisfies all statisticians, there are a few well established statements about probability. These are often called the axioms of probability.

Let \( S \) be the set of all possible outcomes of interest, often called the sample space. Let \( A, A_1, A_2, \ldots, \) be subsets of \( S \). The first axiom of probability\( \mathbb{P}[A] \geq 0 \) for any set \( A \subseteq S \). states that the probability of any set, say \( A \) must be greater than or equal to \( 0 \). The second axiom of probability\( \mathbb{P}[S] = 1 \). states that the probability of all possible events of interest is equal to \( 1 \). The third axiom of probability\( \mathbb{P}[\cup_{n = 1}^{\infty} A_n] = \sum_{n = 1}^{\infty} \mathbb{P}[A_n] \) if \( A_1, A_2, \ldots \) are a countable sequence of pairwise disjoint sets. state the probabilty of disjoint sets is equal to the sum of the probabilities of the sets.

The first and second axioms of probability insist that probabilities must be bounded between \( 0 \) and \( 1 \). Let \( A \) be the blue set below. As first drawn, it has probability \( \mathbb{P}[{\color{#00BFFF} A}] = \) , but feel free to change it.

The third axiom of probability will be better understood with a simple example. Consider the sample space of a fair die. Let \( S = \{1, 2, 3, 4, 5, 6 \} \) and let \( A_n = \{ n \} \) for \(n = 2, 3, 4 \). Notice that \({\color{#FF6DAE}\cup_{n = 2}^4 A_n = \{ 2, 3, 4 \}} \). We expect the probability of rolling a \( 2 \), a \( 3 \), or a \( 4 \) to be \( 1/2 \) for a fair die. This is exactly what the third axiom of probability is telling us, that you can sum together probabilities across disjoint sets. Visually we can think of this as one half of the sample space split into three equally sized disjoint sets, counted as one event.

Mathematically, it would look like this. \[ \begin{aligned} \mathbb{P}[ \{ 2, 3, 4 \} ] & = \mathbb{P}\left[\bigcup_{n = 2}^4 A_n \right] \\ & = \sum_{n=2}^4 \mathbb{P}[A_n] \\ & = \mathbb{P}[\{ 2 \}] + \mathbb{P}[\{ 3 \}] + \mathbb{P}[\{ 4 \}] \\ & = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} \\ & = \frac{1}{2} \\ \end{aligned} \]

Independent Events

Let \( A, B \subseteq S \). Events \( A \) and \( B \) are said to be independent events if \[ \mathbb{P}[A \cap B] = \mathbb{P}[A]\mathbb{P}[B]. \] Let \( A \) be the event that a randomly selected card from a standard deck is a Queen. Let \( B \) be the event that a randomly selected card from a standard deck is a heart. Note that only one card is being drawn from one standard deck of cards. Convince yourself that \( \mathbb{P}[A] = 1/4 \) and \( \mathbb{P}[B] = 1/13 \). Further, \( \mathbb{P}[A \cap B] = 1/52 \), since there there is only one Queen of hearts. Because \[ \frac{1}{52} = \mathbb{P}[A \cap B] = \mathbb{P}[A] \mathbb{P}[B] = \frac{1}{4}\frac{1}{13} \] the events \( A \) and \( B \) are independent.

Note that in the example above, both sides of the equation are calculated first and then compared. If the equation holds with equality, the events in question are said to be independent. If the equation does not hold, the events are said to be depenedent.

Further, independence is in general different than the probability associated with disjoint sets. For two nonempty events to be independent, they need to have some nonempty intersection.

Conditional Probability

Let \( A, B \subseteq S \) such that \( \mathbb{P}[B] > 0 \). The probability of the event \(A \) given that the event \( B \) has already taken place is \[ \mathbb{P}[A | B] = \frac{\mathbb{P}[A \cap B]}{\mathbb{P}[B]}\] and is known as conditional probability. The display below allows you to drag the two events (circles) to increase or decrease the intersection.

Notice that conditional probability has the intersection of the events \( A \) and \( B \) in the numerator. When \( \mathbb{P}[A \cap B] = 0 \), then \( \mathbb{P}[A | B] = 0 \) also. When the two events do overlap, conditional probability scales the probability of the intersection relative to \( \mathbb{P}[B] \) instead of relative to \( \mathbb{P}[S] = 1 \).

What conditions on \( B \) are necessary for \( \mathbb{P}[A \cap B] = \mathbb{P}[A | B] \)?

Bayes' Theorem

Let \( A, B \) be any events with positive probability. Bayes' theorem states \[ \mathbb{P}[A | B] = \frac{\mathbb{P}[B | A] \mathbb{P}[A]}{ \mathbb{P}[B]}. \]

TODO (ear) example

Law of Total Probability

Let \( B_1, B_2, \ldots, B_k \) be an exhaustive and disjoint collection of subsets of \( S \), where \( \mathbb{P}[B_k] > 0 \) for all \( k \). For any set \( A \), the law of total probability states \[ \mathbb{P}[A] = \sum_{k = 1}^K \mathbb{P}[A | B_k] \mathbb{P}[B_k]. \]

It is perhaps easiest to understand the law of total probability in a picture and by noticing that the summand, via conditional probability, ammounts to \( \mathbb{P}[A | B_k] \mathbb{P}[B_k] = \mathbb{P}[A \cap B_k] \). Then \( \mathbb{P}[{\color{#D4CA3A} A}] \) can be found by summing the probability of disjoint sets, each one of which is the intersection of \( A \) with one of the \( B_k \)s.

Suppose there are 3 urns, each containing 2 balls. Urn 1 contains 2 white balls, urn 2 contains 1 white ball and 1 red ball, and urn 3 contains 2 red balls. Let \( A = \{ \text{ a red abll is chosen } \} \) and \( B_k = \{ \text{ urn } k \text{ is chosen } \} \) for \( k = 1, 2, 3 \). Then \( \mathbb{P}[B_k] = 1/3, \mathbb{P}[A | B_1] = 0, \mathbb{P}[A | B_2] = 1/2, \text{ and } \mathbb{P}[A|B_3] = 1 \), such that \[ \mathbb{P}[A] = 0 \frac{1}{3} + \frac{1}{2} \frac{1}{3} + 1 \frac{1}{3} = \frac{1}{2}. \ \]


Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International