Logistic regression

Summary

Item	Value
Type of variable predicted	Binary (yes/no)
Format of prediction	Probabilistic. Rather than simply returning a binary answer, the prediction gives the respective probabilities of the two answers.
Functional form of model	Computes the probability by applying the logistic function (coordinate-wise) to a linear combination of the features. The coefficients used in the linear combination are the unknown parameters that need to be determined by the learning algorithm. It is an example of a generalized linear model.
Typical cost function	As with most probabilistic binary prediction models, logistic regression models are typically scored using the logarithmic cost function. However, they could in principle be scored using the squared error cost function. Note that this still wouldn't be least-squares regression, because the least-squares is being computed after applying the logistic function.
Typical regularization choices	Both $L^{1}$ - and $L^{2}$ -regularization, as well as combined regularization using $L^{1}$ and $L^{2}$ terms, are common.
Learning algorithms	See here for more (to eventually fill in here).

Definition

The term logistic regression is used for a model as well as the act of finding the parameters of the model whose goal is to predict binary outputs. It is therefore better viewed as solving a classification problem than a regression problem. However, because the model shares many basic components with linear regression, and is an example of a generalized linear model, it has historically gone by the name of logistic regression.

The logistic regression problem attempts to predict a binary output (yes/no) based on a set of inputs (called features). Rather than just predicting a yes/no answer, the logistic regression problem predicts a probability of yes. This is a number in $[0, 1]$ . By using a threshold probability (such as 0.5, or another value depending on what sorts of risks we want to avoid) this can make a yes/no prediction.

The probability is computed as follows:

Probability = logistic function evaluated at (linear combination of features with initially unknown parameters)

The logistic function is the function:

$g (x) = \frac{1}{1 + e^{- x}}$

The values of the unknown parameters are determined empirically so as to best fit the training set.

Cost function used

The typical cost function used is the logarithmic cost function (also known as logarithmic scoring): This assigns a score of $- \log p$ if the event happened and a score of $- \log (1 - p)$ if the event did not happen. The lower the score, the better.

The logarithmic cost function is computed for each of the predictions made by the logistic regression model. We then average the values of the cost functions across all instances to obtain the logarithmic cost function for the specific choice of parameter values on the specific data set.

There are two standard choices of labels for describing whether the event did or did not occur. One choice is to assign a label of 0 if the event did not occur and 1 if the event occurred. Another choice is to assign a label of -1 if the event did not occur and 1 if the event occurred.

Closed form expression for cost function using 0,1-encoding

Suppose we assign a label $y$ with value 0 if the event did not occur and 1 if the event occurred. Then, if $p$ is the predicted probability, the score associated with $p$ is:

$- (y \log p + (1 - y) \log (1 - p))$

Suppose there are $m$ data points. The probability vector is the vector $\vec{y} = (y_{1}, y_{2}, \dots, y_{m})$ and the probability vector is the vector $\vec{p} = (p_{1}, p_{2}, \dots, p_{m})$ . The cost function is:

$\frac{1}{m} [\sum_{i = 1}^{m} - (y_{i} \log p_{i} + (1 - y_{i}) \log (1 - p_{i}))]$

Closed form expression for cost function using -1,1-encoding

Suppose we assign a label $l$ with value -1 if the event did not occur and 1 if the event occurred. Then, if $p$ is the predicted probability, the score associated with $p$ is:

$- \frac{1}{2} ((1 + l) \log p + (1 - l) \log (1 - p))$

Description as a generalized linear model

The logistic regression model can be viewed as a special case of the generalized linear model, namely a case where the link function is the logistic function and where the cost function is the logarithmic cost function.

The inverse of the logistic function is the log-odds function, and applying it to the probability gives the log-odds (logarithm of odds). Explicitly, we have:

$g^{- 1} (p) = \ln (\frac{p}{1 - p})$

Therefore, the logistic regression problem can be viewed as a linear regression problem:

Log-odds function = Linear combination of features with unknown parameters

However, the cost function now changes as well: we now need to apply the logistic function and then do logarithmic scoring to compute the cost function.