Understanding the LOGIT function

In logistic regression to goal is to predict the probability of an outcome like YES vs NO or the probability that the regression equation predicts membership in group A or B.

The linear logistic-regression or the linear logit model is given by this equation

\[ \pi_i = \frac{1}{1+exp[-(\alpha + \beta X_i)]} \]

where \(\pi_i\) is the probability of the desired outcome.

Given this equation then

\[ Odds = \frac{\pi}{1 - \pi} \]

and

\[ Logit = log_e(\frac{\pi}{1 - \pi}) \]

and

\[ Logit = \alpha + \beta X_i \]

So, let’s look at a hypothetical logistic regression equation where

\[ Logit = 2 + 1.5 X_i \]

We’ll look at this function over X’s ranging from -5 to 2 and we’ll look at each step in between by computing the Logit, Odds and Probability and then we’ll make a plot of each one.

x <- seq(from=-5, to=2, by=0.1)
y.logit <- 2 + 1.5*x
y.odds <- exp(y.logit)
y.prob <- y.odds/(1+y.odds)
x.df <- data.frame(x,y.logit,y.odds,y.prob)

Table of X, Logit, Odds, Probability (excerpt of 20 rows)

knitr::kable(x.df[30:50,],
             col.names = c("X",
                           "Logit = 2+1.5x",
                           "Odds = P/1-P",
                           "Probability P"))
X Logit = 2+1.5x Odds = P/1-P Probability P
30 -2.1 -1.15 0.3166368 0.2404891
31 -2.0 -1.00 0.3678794 0.2689414
32 -1.9 -0.85 0.4274149 0.2994329
33 -1.8 -0.70 0.4965853 0.3318122
34 -1.7 -0.55 0.5769498 0.3658644
35 -1.6 -0.40 0.6703200 0.4013123
36 -1.5 -0.25 0.7788008 0.4378235
37 -1.4 -0.10 0.9048374 0.4750208
38 -1.3 0.05 1.0512711 0.5124974
39 -1.2 0.20 1.2214028 0.5498340
40 -1.1 0.35 1.4190675 0.5866176
41 -1.0 0.50 1.6487213 0.6224593
42 -0.9 0.65 1.9155408 0.6570105
43 -0.8 0.80 2.2255409 0.6899745
44 -0.7 0.95 2.5857097 0.7211152
45 -0.6 1.10 3.0041660 0.7502601
46 -0.5 1.25 3.4903430 0.7772999
47 -0.4 1.40 4.0552000 0.8021839
48 -0.3 1.55 4.7114702 0.8249137
49 -0.2 1.70 5.4739474 0.8455347
50 -0.1 1.85 6.3598195 0.8641271

Plot of the Logit

plot(x,y.logit,
     xlab = "X values",
     ylab = "Logit = 2 + 1.5*X")
lines(x,y.logit)

Plot of the Odds

plot(x,y.odds,
     xlab = "X values",
     ylab = "Odds = exp(2 + 1.5*X)")
lines(x,y.odds)

Plot of the Probability

plot(x,y.prob,
     xlab = "X values",
     ylab = "Probability = Odds/(1+Odds)")
lines(x,y.prob)

So, when we “fit” a logistic regression model, we are solving for the best fit line for this equation:

\[ Logit = log_e(\frac{\pi}{1 - \pi}) = \alpha + \beta X_i \]

where the “logit” function LINKS the outcome (or a mathematical transformation of the outcome) - in this case the \(\pi\) with the linear predictor equation \(\alpha + \beta X_i\).

Similarly, if we take the “exponent” of both sides of this equation we get:

\[ Odds = \frac{\pi}{1 - \pi} = e^{\alpha + \beta X_i} \]

This is why logistic regression yields “ODDS RATIOS” or some software lists these as “exp B”.

More on Generalized Linear Models

The LOGIT (logistic regression) is your first introduction to the “Generalized” Linear Model. There are several LINK functions that are useful to know for other kinds of outcomes:

Family Link Function Type of Outcome
Gaussian Identity \(\mu_i\) Continuous - Normal
Binomial Logit \(log_e(\frac{\pi}{1 - \pi})\) Dichotomous; 2 categories
Poisson Log \(log_e(\mu_i)\) Count
Gamma Inverse \(\mu_i^-1\) Time to event - Survival
Inverse-Gamma Inverse-square \(\mu_i^-2\) Inverse of the Gamma

Poisson Regression

For the Poisson distribution for a count variable, the probability of any given count occuring is given by this equation, where

\[ P(Y=k) = \frac{\lambda^k e^{- \lambda}}{k!} \]

So, when we fit a Poisson regression equation, we are solving this equation

\[ log_e(Y) = \alpha + \beta_1 X_i \]

So, the actual count Y is equal to

\[ Y = e^{\alpha + \beta_1 X_i} \]