Poisson and Negative Binomial
Number of absences in school per year
Number of minutes on the internet per day
Number of purchases per year
Number of days since last job
Several options for modeling counts. The old school choice is Poisson distribution.
Assumes that events happen at constant rate over a fixed amount of time.
Important implication is that poisson distribution assumes the mean and the variance is the same (often problematic in practice).
\[ log(Y) \sim Poisson(\beta \cdot X) \]
We are using a log link. Why?
Counts have bottom bound at zero, but no upper one.
E.g. predicting school absences with math scores and training program type
\[ log(absences) \sim Poisson(\beta_0 + \beta_1 \cdot math + \beta_2 \cdot program) \]
Coefficients are in logs. Example:
1 unit increase in math score is associated with -0.1 log days of absence.
Students in the vocational program have on average -1.28 log days of absence compared to the ones in the general program.
Coefficient | |
---|---|
(Intercept) | 2.65 |
Program: General | - |
Program: Academic | -0.44 |
Program: Vocational | -1.28 |
Math score | -0.01 |
We can exponentiate the coefficients to get (approximate) additive increases in natural units.
This approximation only works for small changes, i.e. when the exponentiated coefficients are close 1 (e.g. between 0.9-1.1)
Exp(Coefficient) | |
---|---|
(Intercept) | 14.18 |
Program: General | - |
Program: Academic | 0.64 |
Program: Vocational | 0.28 |
Math score | 0.99 |
You can also use (average) marginal effects the way you are used to.
On average, students in vocational program have 7.5 less days of absence than those in the general program.
On average, one point increase in the math test is associated with 0.04 less days of absence.
Average Marginal Effect | |
---|---|
General - Academic | -3.69 |
General - Vocational | -7.49 |
Math score | -0.04 |
Poisson regression assumes the mean and variance is the same.
If not, the estimates are biased - both the point estimates and standard errors!
In practice, variance is usually higher than the mean (overdispersion). Rarely, it’s lower (underdispersion).
(An) Solution - Use Negative binomial distribution instead.
Similar to Poisson, but mean and variance are decoupled - Over/Under-dispersion is estimated from the data
If the mean and variance are actually the same, both models give the same results.
Poisson | Negative Binomial | |
---|---|---|
(Intercept) | 2.65 | 2.62 |
Program: General | - | - |
Program: Academic | -0.44 | -0.44 |
Program: Vocational | -1.28 | -1.28 |
Math score | -0.01 | -0.01 |
Which one to use? Just use Negative binomial.
The results will be the same or better.
Elementary | Highschool (without diploma) | Highschool (with diploma) | University | |
---|---|---|---|---|
Male | 85 | 407 | 384 | 201 |
Female | 105 | 396 | 619 | 272 |
That’s right a Chi squared test.
Results: \(\chi^2\) p value = 3.23e-06
Gender | Education | Freq |
---|---|---|
Male | Elementary | 85 |
Female | Elementary | 105 |
Male | Highschool (without diploma) | 407 |
Female | Highschool (without diploma) | 396 |
Male | Highschool (with diploma) | 384 |
Female | Highschool (with diploma) | 619 |
Male | University | 201 |
Female | University | 272 |
And fitted poisson model with an interaction:
\[ log(Freq) \sim Poisson(Gender + Education + Gender \cdot Education) \]
Term | Df | Deviance | Resid. Df | Resid. Dev | Pr(>Chi) |
---|---|---|---|---|---|
NULL | NA | NA | 7 | 765.366 | NA |
Education | 3 | 696.833 | 4 | 68.533 | 1.02e−150 |
Gender | 1 | 40.298 | 3 | 28.235 | 2.18e−10 |
Education:Gender | 3 | 28.235 | 0 | 0.000 | 3.24e−06 |
The p value from poisson model is 3.24e-06, the \(\chi^2\) one was 3.23e-06. They are (practically) the same!
Why?
\(\chi^2\) test is an approximate way to compute poisson regression (before it even existed)!
Useful to know for two reason:
It’s cool.
A good way to model categorical data.
Unlike \(\chi^2\) test, poisson regression:
Can control for numerical variables
Can include more than 2 categorical variables
A good way to model categorical data in general
The same old story.
Linearity between log counts and predictors.
Conditional distribution follow Poisson/Negative Binomial distribution.