Likelihood Method

Introduction

The likelihood method estimates parameters using an assumed distribution on the data and a randomly sampled dataset from this distribution. To estimate the parameters, one can use either standard methods of calculus or a computer. This page will cover the aspects of calculus behind the likelihood method.

The goal is to find the most likely value of the parameter(s) given a set of random variables, . When the solution to the likelihood method is found using random variables, we call the solution the maximum likelihood estimator. Once you observe some values, and the data assume the role of the random variables, then you can plug in the data and calculate an estimate of the parameter(s). The actual value, using a specific dataset, is called the maximum likelihood estimate.

The logic underlying the likelihood method goes like this. Set up the likelihood function. The maximum likelihood estimator is the argument of the likelihood function that maximizes the likelihood function. Often, this is written as

to denote that the best guess is the maximal argument to the likelihood function given the data . The calculus is then left to the practioner, where either pen and paper or a computer will do. These notes aim to provide a short introduction to the intuition behind the likelihood function setup and to show the most common analytical strategy for finding the maximum likelihood estimates.

The likelihood function is defined relative to the density function of the distribution that is assumed to have generated the data. The likelihood is defined to be the product of the density function evaluated at each datum in the dataset. We think of the likelihood function as a function of the parameter(s), generalized as given the random variables .

Intuition (Bernoulli)

The intuition behind the product of the density functions goes like this. Imagine you have random variables from a Bernoulli distribution with , think a fair coin where represents flipped heads. Suppose the outcomes are . Assuming the random variables are independent, the probability associated with this event is .

Now, imagine that you don't know that the value for , instead all you know is that the probability of is some number . The probability above can is then rewritten as

Next, since we know that Bernoulli distribution is an appropriate model of coin flips, write this probability using the density function of a Bernoulli distribution. Since the Bernoulli distribution's density function maps to and to , we have

The last step in understanding the setup of the likelihood function is to recognize that until we observe data such as we might as well treat these observations as random variables, . In this case, the functional form is

The discussion above captures the intuition behind the setup of the likelihood function. From here the main differences are notational and a conceptual understanding of how we can treat this product as a function of the unknown parameter .

To get from

to the general definition of the likelihood function, we generalize the unknown parameter to , thinking that this method should apply to any distribution's density function. Further, we use product notation, which is analogous to summation notation, to generalize to an arbitrary number of random variables

Once we have observartions, our collection of random variables is bound to specific values . On the other hand, the unknown parameter is not specified. The conceptual jump of the likelihood function is to treat the form

as a function of the unknown parameter . We name the likelihood function and think of it as a function of the unknown parameter(s) given a fixed set of data . The specific value of that maximizes the likelihood function is the best guess of the unknown parameter.

In an attempt to bring the general likelihood function back down to earth, consider the following plot depicting the scenario introduced above: the observations from a Bernoulli distribution with unknown parameter . From exactly these four observations, the argument that maximizes the likelihood function is .

Intuition (Normal)

Consider three data from a Normal distribution with parameter values that I'm keeping secret: . Your job is to guess the values of and that I have in mind.

With only three data points, displayed on the plot below as empty circles, you are unlikely to guess exactly the values of that I have in mind, but you can still form a guess. The point of this page, is that an informed (via the Likelihood Method) guess will be the values of that maximize the likelihood:

Try it; slide and around until you can maximize the likelihood. The maximum value of the likelihood is .

Example

The last way we'll' demonstrate the maximum likelihood method is by walking through an example. Suppose you have a sample of observations all randomly sampled from the same distribution. We'll assume we know that a member of the family of Rayleigh distributions generated our data, but that we don't know which member, i.e. we don't know the parameter . We seek to estimate from the data. The density function of the Rayleigh distribution is

for and .

To find the maximum likelihood estimate of start by writing out the likelihood funciotn.

The goal is to find the value that maximizes the likelihood function .

Both humans and computers have difficulty working with products and exponents of functions. Therefore, it is common take the natural log of the likelihood function. This is so common, the log of the likelihood hood function has its own name, the log-likelihood function. The log-likelihood function is written as

where we've used properties of the natural log to turn the product into a sum.

Below is a plot of the log-likelihood for random data.

Recall from calculus that we can find local maxima/minima by differentiating a function, setting the derivative equal to zero, and solving for the variable of interest. In this scenario, the variable of interest is the unknown parameter, .

Often it's helpful to simplify the log-likelihood function to aid differentiation. In this case, the most helpful simplification is to realize that the first term within the sum is constant with respect to and so it can be dropped

The symbol is read as proportional to: the log-likelihood function for the Rayleigh distribution is proportional to the term on the right with respect to . We call the symbol propto (prop-to), short for proportional to.

To find the maximum of , we'll take the derivative with respect to

Next, set the derivative equal to zero and solve for

Manipulate the expression until you find a solution for the parameter of interest. At this point, we put a hat over the parameter to recognize that it is our best guess of the unknown parameter based on the random variables .

The maximum likelihood estimator is the final solution. With data from a Rayleigh distribution, this solution tells you how to best estimate the unknown parameter .