Processing math: 100%

Understanding the LOGIT function

In logistic regression to goal is to predict the probability of an outcome like YES vs NO or the probability that the regression equation predicts membership in group A or B.

The linear logistic-regression or the linear logit model is given by this equation

πi=11+exp[(α+βXi)]

where πi is the probability of the desired outcome.

Given this equation then

Odds=π1π

and

Logit=loge(π1π)

and

Logit=α+βXi

So, let’s look at a hypothetical logistic regression equation where

Logit=2+1.5Xi

We’ll look at this function over X’s ranging from -5 to 2 and we’ll look at each step in between by computing the Logit, Odds and Probability and then we’ll make a plot of each one.

x <- seq(from=-5, to=2, by=0.1)
y.logit <- 2 + 1.5*x
y.odds <- exp(y.logit)
y.prob <- y.odds/(1+y.odds)
x.df <- data.frame(x,y.logit,y.odds,y.prob)

Table of X, Logit, Odds, Probability (excerpt of 20 rows)

knitr::kable(x.df[30:50,],
             col.names = c("X",
                           "Logit = 2+1.5x",
                           "Odds = P/1-P",
                           "Probability P"))
X Logit = 2+1.5x Odds = P/1-P Probability P
30 -2.1 -1.15 0.3166368 0.2404891
31 -2.0 -1.00 0.3678794 0.2689414
32 -1.9 -0.85 0.4274149 0.2994329
33 -1.8 -0.70 0.4965853 0.3318122
34 -1.7 -0.55 0.5769498 0.3658644
35 -1.6 -0.40 0.6703200 0.4013123
36 -1.5 -0.25 0.7788008 0.4378235
37 -1.4 -0.10 0.9048374 0.4750208
38 -1.3 0.05 1.0512711 0.5124974
39 -1.2 0.20 1.2214028 0.5498340
40 -1.1 0.35 1.4190675 0.5866176
41 -1.0 0.50 1.6487213 0.6224593
42 -0.9 0.65 1.9155408 0.6570105
43 -0.8 0.80 2.2255409 0.6899745
44 -0.7 0.95 2.5857097 0.7211152
45 -0.6 1.10 3.0041660 0.7502601
46 -0.5 1.25 3.4903430 0.7772999
47 -0.4 1.40 4.0552000 0.8021839
48 -0.3 1.55 4.7114702 0.8249137
49 -0.2 1.70 5.4739474 0.8455347
50 -0.1 1.85 6.3598195 0.8641271

Plot of the Logit

plot(x,y.logit,
     xlab = "X values",
     ylab = "Logit = 2 + 1.5*X")
lines(x,y.logit)

Plot of the Odds

plot(x,y.odds,
     xlab = "X values",
     ylab = "Odds = exp(2 + 1.5*X)")
lines(x,y.odds)

Plot of the Probability

plot(x,y.prob,
     xlab = "X values",
     ylab = "Probability = Odds/(1+Odds)")
lines(x,y.prob)

So, when we “fit” a logistic regression model, we are solving for the best fit line for this equation:

Logit=loge(π1π)=α+βXi

where the “logit” function LINKS the outcome (or a mathematical transformation of the outcome) - in this case the π with the linear predictor equation α+βXi.

Similarly, if we take the “exponent” of both sides of this equation we get:

Odds=π1π=eα+βXi

This is why logistic regression yields “ODDS RATIOS” or some software lists these as “exp B”.

More on Generalized Linear Models

The LOGIT (logistic regression) is your first introduction to the “Generalized” Linear Model. There are several LINK functions that are useful to know for other kinds of outcomes:

Family Link Function Type of Outcome
Gaussian Identity μi Continuous - Normal
Binomial Logit loge(π1π) Dichotomous; 2 categories
Poisson Log loge(μi) Count
Gamma Inverse μi1 Time to event - Survival
Inverse-Gamma Inverse-square μi2 Inverse of the Gamma

Poisson Regression

For the Poisson distribution for a count variable, the probability of any given count occuring is given by this equation, where

P(Y=k)=λkeλk!

So, when we fit a Poisson regression equation, we are solving this equation

loge(Y)=α+β1Xi

So, the actual count Y is equal to

Y=eα+β1Xi