The amazing, all-encompassing generalized linear models
- Choosing the right regression model for your data is simpler than it seems, if you have a basic understanding of the generalized linear model
- GzLMs have 3 components: stochastic, systematic, link function.
- The systematic component never changes (it means: add the parts together). Otherwise, it's not a "linear" model.
- The stochastic component should be chosen to represent the distribution of the observations.
- for linear regression (GLM), the stochastic component is a normal distribution (continuous outcome variable, like height)
- for logistic regression, the stochastic component is a Bernoulli distribution (binary outcome variable, like hired/not hired)
- for poisson regression, the stochastic component is a Poisson distribution (count outcome, like number of parking tickets, or attendees)
- the link function makes it possible to represent the outcome variables as continuous probability distributions.
- linear regression uses the identity function (the outcome mean == the linear predictor)
- logstic regression uses the logistic function
- poisson regression uses log function
More & Special cases: Probit link allows extension of binary outcome variables to categorical (# categories > 2), and ordinal categorical (ranked categories).
Gamma distribution for proportional count response variable (like election results): [helpful ipython notebook from statsmodels] (scroll down)
Funky models tobit, censored, and truncated regression are for data that are continuous but do not meet the stochastic assumption for normal distribution due to restrictions on the range of the response variable.
Handy blog post
for doing GzLM in
This reading has a good narrative and is relatively easier to follow: glm.pdf (90.2 KB)
This one is denser (though probably the gentlest lecture from this whole series) - lecture notes from Princeton CS's [Introduction to Probabilistic Modeling] class: generalized_linear_model.pdf (146.6 KB)
Murphy ([Machine Learning: a Probabilistic Perspective]), ch 9 (p283- )