Logistic Regression Models

moar wrods

Likelihood

The likelihood for logistic regression models begins with the Bernoulli distribution, since the response variable is binary, . We assume the probability from the Bernoulli distribution is decomposed into a linear combination of predictors . Let . To map to the probability , logistic regression uses the function

known as the sigmoid function, or the cumulative distribution function of the logistic distribution.

In logistic regression, we treat the probability of a Bernoulli distribution as a function of the coefficients

Suppose we have observations of a response variable and corresponding predictors from which we will attempt to predict the probability using a linear combination of the predictors. Plug into the density function for the Bernoulli distribution to get the log-likelihood function for logistic regression.

Numerically Stable Log-Likelihood

Simplifying the expression above using properties of the natural log and takes some care. Below, we detail some good check points along the way and then present a numerically stable version.

Using properties of the natural log, we find

Substitute in to get

When looking at the above expression, remember that we're using as a shorthand for a linear combination of the predictors.

The form above can be simplified further using properties (of the common implementation) of the function log1p. A numerically stable, simplified log-likelihood for logistic regression models is

Gradient of Log-Likelihood

If you find yourself cross-checking the simplified log-likelihod in Julia/Python/R/what-have-you, then

don't forget : most numerical software has minimization routines, so you should minimize the negative log-likelihood, ,
the gradient helps.

Let be a model matrix and , such that .