Modeling counts 1

Poisson and Negative Binomial

Aleš Vomáčka

Counts are Pretty Common

  • Number of absences in school per year

  • Number of minutes on the internet per day

  • Number of purchases per year

  • Number of days since last job

Poisson Distribution

  • Several options for modeling counts. The old school choice is Poisson distribution.

  • Assumes that events happen at constant rate over a fixed amount of time.

  • Important implication is that poisson distribution assumes the mean and the variance is the same (often problematic in practice).

Poisson Regression

\[ log(Y) \sim Poisson(\beta \cdot X) \]

  • We are using a log link. Why?

  • Counts have bottom bound at zero, but no upper one.

  • E.g. predicting school absences with math scores and training program type

  • \[ log(absences) \sim Poisson(\beta_0 + \beta_1 \cdot math + \beta_2 \cdot program) \]

Interpreting Poisson models

  • Coefficients are in logs. Example:

    • 1 unit increase in math score is associated with -0.1 log days of absence.

    • Students in the vocational program have on average -1.28 log days of absence compared to the ones in the general program.

Coefficient
(Intercept) 2.65
Program: General -
Program: Academic -0.44
Program: Vocational -1.28
Math score -0.01

Interpreting Poisson models - The Better Way

  • We can exponentiate the coefficients to get (approximate) additive increases in natural units.

    • 1 unit increase in math score is associated with 1% decrease in the days of absence.
  • This approximation only works for small changes, i.e. when the exponentiated coefficients are close 1 (e.g. between 0.9-1.1)

Exp(Coefficient)
(Intercept) 14.18
Program: General -
Program: Academic 0.64
Program: Vocational 0.28
Math score 0.99

Interpreting Poisson models - The Best(?) Way

  • You can also use (average) marginal effects the way you are used to.

    • On average, students in vocational program have 7.5 less days of absence than those in the general program.

    • On average, one point increase in the math test is associated with 0.04 less days of absence.

Average Marginal Effect
General - Academic -3.69
General - Vocational -7.49
Math score -0.04

Poisson model visually

Questions?

Over- and Under-dispersion

  • Poisson regression assumes the mean and variance is the same.

  • If not, the estimates are biased - both the point estimates and standard errors!

  • In practice, variance is usually higher than the mean (overdispersion). Rarely, it’s lower (underdispersion).

  • (An) Solution - Use Negative binomial distribution instead.

Negative Binomial distribution

  • Similar to Poisson, but mean and variance are decoupled - Over/Under-dispersion is estimated from the data

  • If the mean and variance are actually the same, both models give the same results.

Poisson Negative Binomial
(Intercept) 2.65 2.62
Program: General - -
Program: Academic -0.44 -0.44
Program: Vocational -1.28 -1.28
Math score -0.01 -0.01

Poisson vs Negative Binomial

  • Which one to use? Just use Negative binomial.

  • The results will be the same or better.

  • So why learn about Poisson distribution?

It Was Poisson All Along

  • You have a contingency table, how do you test for a relationship between the two variables?
Elementary Highschool (without diploma) Highschool (with diploma) University
Male 85 407 384 201
Female 105 396 619 272
  • That’s right a Chi squared test.

  • Results: \(\chi^2\) p value = 3.23e-06

It Was Poisson All Along

  • Now imagine we transformed our data into long format:
Gender Education Freq
Male Elementary 85
Female Elementary 105
Male Highschool (without diploma) 407
Female Highschool (without diploma) 396
Male Highschool (with diploma) 384
Female Highschool (with diploma) 619
Male University 201
Female University 272

And fitted poisson model with an interaction:

\[ log(Freq) \sim Poisson(Gender + Education + Gender \cdot Education) \]

It Was Poisson All Along

  • We can then test whether adding the interaction led to statistically significant improvement in model fit.
Term Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL NA NA 7 765.366 NA
Education 3 696.833 4 68.533 1.02e−150
Gender 1 40.298 3 28.235 2.18e−10
Education:Gender 3 28.235 0 0.000 3.24e−06
  • The p value from poisson model is 3.24e-06, the \(\chi^2\) one was 3.23e-06. They are (practically) the same!

  • Why?

It Was Poisson All Along

  • \(\chi^2\) test is an approximate way to compute poisson regression (before it even existed)!

  • Useful to know for two reason:

    • It’s cool.

    • A good way to model categorical data.


  • Unlike \(\chi^2\) test, poisson regression:

    • Can control for numerical variables

    • Can include more than 2 categorical variables

    • A good way to model categorical data in general

Questions?

Assumptions for Poisson/Negative Binomial

  • The same old story.

  • Linearity between log counts and predictors.

  • Conditional distribution follow Poisson/Negative Binomial distribution.

    • For Poisson, this means no over/underdispersion.

Questions?

InteRmezzo!