In logistic regression to goal is to predict the probability of an outcome like YES vs NO or the probability that the regression equation predicts membership in group A or B.
The linear logistic-regression or the linear logit model is given by this equation
πi=11+exp[−(α+βXi)]
where πi is the probability of the desired outcome.
Given this equation then
Odds=π1−π
and
Logit=loge(π1−π)
and
Logit=α+βXi
So, let’s look at a hypothetical logistic regression equation where
Logit=2+1.5Xi
We’ll look at this function over X’s ranging from -5 to 2 and we’ll look at each step in between by computing the Logit, Odds and Probability and then we’ll make a plot of each one.
x <- seq(from=-5, to=2, by=0.1)
y.logit <- 2 + 1.5*x
y.odds <- exp(y.logit)
y.prob <- y.odds/(1+y.odds)
x.df <- data.frame(x,y.logit,y.odds,y.prob)
knitr::kable(x.df[30:50,],
col.names = c("X",
"Logit = 2+1.5x",
"Odds = P/1-P",
"Probability P"))
X | Logit = 2+1.5x | Odds = P/1-P | Probability P | |
---|---|---|---|---|
30 | -2.1 | -1.15 | 0.3166368 | 0.2404891 |
31 | -2.0 | -1.00 | 0.3678794 | 0.2689414 |
32 | -1.9 | -0.85 | 0.4274149 | 0.2994329 |
33 | -1.8 | -0.70 | 0.4965853 | 0.3318122 |
34 | -1.7 | -0.55 | 0.5769498 | 0.3658644 |
35 | -1.6 | -0.40 | 0.6703200 | 0.4013123 |
36 | -1.5 | -0.25 | 0.7788008 | 0.4378235 |
37 | -1.4 | -0.10 | 0.9048374 | 0.4750208 |
38 | -1.3 | 0.05 | 1.0512711 | 0.5124974 |
39 | -1.2 | 0.20 | 1.2214028 | 0.5498340 |
40 | -1.1 | 0.35 | 1.4190675 | 0.5866176 |
41 | -1.0 | 0.50 | 1.6487213 | 0.6224593 |
42 | -0.9 | 0.65 | 1.9155408 | 0.6570105 |
43 | -0.8 | 0.80 | 2.2255409 | 0.6899745 |
44 | -0.7 | 0.95 | 2.5857097 | 0.7211152 |
45 | -0.6 | 1.10 | 3.0041660 | 0.7502601 |
46 | -0.5 | 1.25 | 3.4903430 | 0.7772999 |
47 | -0.4 | 1.40 | 4.0552000 | 0.8021839 |
48 | -0.3 | 1.55 | 4.7114702 | 0.8249137 |
49 | -0.2 | 1.70 | 5.4739474 | 0.8455347 |
50 | -0.1 | 1.85 | 6.3598195 | 0.8641271 |
plot(x,y.logit,
xlab = "X values",
ylab = "Logit = 2 + 1.5*X")
lines(x,y.logit)
plot(x,y.odds,
xlab = "X values",
ylab = "Odds = exp(2 + 1.5*X)")
lines(x,y.odds)
plot(x,y.prob,
xlab = "X values",
ylab = "Probability = Odds/(1+Odds)")
lines(x,y.prob)
So, when we “fit” a logistic regression model, we are solving for the best fit line for this equation:
Logit=loge(π1−π)=α+βXi
where the “logit” function LINKS the outcome (or a mathematical transformation of the outcome) - in this case the π with the linear predictor equation α+βXi.
Similarly, if we take the “exponent” of both sides of this equation we get:
Odds=π1−π=eα+βXi
This is why logistic regression yields “ODDS RATIOS” or some software lists these as “exp B”.
The LOGIT (logistic regression) is your first introduction to the “Generalized” Linear Model. There are several LINK functions that are useful to know for other kinds of outcomes:
Family | Link | Function | Type of Outcome |
---|---|---|---|
Gaussian | Identity | μi | Continuous - Normal |
Binomial | Logit | loge(π1−π) | Dichotomous; 2 categories |
Poisson | Log | loge(μi) | Count |
Gamma | Inverse | μ−i1 | Time to event - Survival |
Inverse-Gamma | Inverse-square | μ−i2 | Inverse of the Gamma |
For the Poisson distribution for a count variable, the probability of any given count occuring is given by this equation, where
k*(k-1)*(k-2)*...*3*2*1
P(Y=k)=λke−λk!
So, when we fit a Poisson regression equation, we are solving this equation
loge(Y)=α+β1Xi
So, the actual count Y is equal to
Y=eα+β1Xi