Review: linear regression model

\(Y \in \mathbb{R}\), \(X_j \in \mathbb{R}\) with \(j = \{1, \ldots, p\}\), \(\beta_j \in \mathbb{R}\) with \(j = \{0, \ldots, p\}\), \(\varepsilon \in \mathbb{R}\)

\[Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \ldots + \beta_pX_p + \varepsilon, \text{with }\varepsilon \sim N(0, \sigma^2)\]

Y: outcome variable or response.
\(X_1, X_2, \ldots, X_p\) are called predictors.
\(\beta_0, \beta_1, \ldots, \beta_p\) are called coefficient, for which we will estimate.

Intepretation of linear model coefficients:

\(\beta_1\): unit increase of \(X_1\) will result in \(\beta_1\) increase in \(Y\).
p-value of \(\beta_1\) is significant: \(X_1\) is a significant predictor to explain the varaince in \(Y\) adjusting for the other covaraites.

Limitation of linear model:

\(\varepsilon\) has to be normally distributed.
\(Y\) has to be a linear function of \(X^\top\beta\).

How about other data types?

Binary data
- 0, 1
Count data
- 0, 4, 12, …
Oridinal data
- 1,2,3,…

Generalize the “linear regression model” (1)

\[Y = X^\top\beta + \varepsilon, \text{with }\varepsilon \sim N(0, \sigma^2)\]

Random (stochastic) component
- specifies the conditional distribution of the response variable \(Y\) given the explanatory variables \(X\)
- \(\mu = E(Y|X), Y|X \sim N(\mu, \sigma^2)\)
Systematic component – the covariates \(X\) combine linearly with the coefficients to form the linear predictor.
- \(X^\top\beta\)
Link function
- \(X^\top\beta = \mu\)

Generalize the “linear regression model” (2)

We can generalize these conditions for linear regression model

Random (stochastic) component
- specifies the conditional distribution of the response variable \(Y\) given the explanatory variables \(X\)
- \(\mu = E(Y|X), Y|X \sim f(\mu)\), \(f\) is a distribution from the exponential family.
- In the linear model case, \(f\) is Normal distribution.
Systematic component – Still assume covariates \(X\) combine linearly with the coefficients to form the linear predictor.
- \(X^\top\beta\)
Link function
- an invertible, monotone and twice-differentiable function \(g\) which transforms the expectation of the response to the linear predictor
- \(X^\top\beta = g(\mu)\)
- In the linear model case, \(g\) is an identity function.

Commonly used distribution \(f\) in GLM

Gaussian distribution: for continuous data
Binomial distribution: for binary data
Possion distribution: for count data

Standard link functions and their inverses

Link	\(X^\top\beta = g(\mu)\)	\(\mu = g^{-1} (X^\top\beta)\)
identity	\(X^\top\beta = \mu\)	\(\mu = X^\top\beta\)
log	\(X^\top\beta = \log(\mu)\)	\(\mu = \exp(X^\top\beta)\)
logit	\(X^\top\beta = \log \frac{\mu}{1 - \mu}\)	\(\mu = \frac{\exp(X^\top\beta)}{1 + \exp(X^\top\beta)}\)
probit	\(X^\top\beta = \Phi^{-1}({\mu})\)	\(\mu = \Phi (X^\top\beta)\)

\(\Phi\): CDF of standard normal distribution.

logistic regression motivating example

X <- matrix(c(189, 104, 10845, 10933), nrow=2,
            dimnames=list(Treatment=c("Placebo","Aspirin"), 
                          "Myocardial Infarction"=c("Yes", "No")))
X

##          Myocardial Infarction
## Treatment Yes    No
##   Placebo 189 10845
##   Aspirin 104 10933

odds.ratio0 <- function(X){
  result <- X[1,1]*X[2,2]/(X[1,2]*X[2,1])
  return(result)
}
odds.ratio0(X)

## [1] 1.832054

logistic regression motivating example

The input data is usually formatted as this way:

ID	Myocardial Infarction (Yes/No)	Treatment(1/0)
1	1	0
2	0	1
3	1	1
4	1	0
…	…	…

logistic regression

Random component: \(Y|X \sim Bern(\mu)\), Y is binary, either 0 or 1.
- PDF of Bernoulli distribution: \(f_\mu(y) = \mu^y(1 - \mu)^{1-y}\)
  - \(0 < \mu < 1\)
- \(p(Y = 1 | X) = \mu\)
- \(p(Y = 0 | X) = 1 - \mu\)
- \(E(Y|X) = 0 \times (1 - \mu) + 1 \times \mu = \mu\)
Link function:
- link \(\mu\) and \(X^\top \beta\): logit link.
- logit link: \(logit(\mu) = \log \frac{\mu}{1 - \mu}\)
- \(X^\top \beta = \log \frac{\mu}{1 - \mu}\)
- \(\mu = E(Y|X) = \frac{\exp (X^\top \beta)}{1 + \exp (X^\top \beta)}\)

Prostate cancer data

The data is from the book element of statistical learning

library(ElemStatLearn)
str(prostate)

## 'data.frame':    97 obs. of  10 variables:
##  $ lcavol : num  -0.58 -0.994 -0.511 -1.204 0.751 ...
##  $ lweight: num  2.77 3.32 2.69 3.28 3.43 ...
##  $ age    : int  50 58 74 58 62 50 64 58 47 63 ...
##  $ lbph   : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ svi    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lcp    : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ gleason: int  6 6 7 6 6 6 6 6 6 6 ...
##  $ pgg45  : int  0 0 20 0 0 0 0 0 0 0 ...
##  $ lpsa   : num  -0.431 -0.163 -0.163 -0.163 0.372 ...
##  $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

prostate$train <- NULL

fit logistic regression in R (logit link)

glm_binomial_logit <- glm(svi ~ lcavol, data = prostate,  family = binomial(link = "logit"))
summary(glm_binomial_logit)

## 
## Call:
## glm(formula = svi ~ lcavol, family = binomial(link = "logit"), 
##     data = prostate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.79924  -0.48354  -0.21025  -0.04274   2.32135  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -5.0296     1.0429  -4.823 1.42e-06 ***
## lcavol        1.9798     0.4543   4.358 1.31e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101.35  on 96  degrees of freedom
## Residual deviance:  64.14  on 95  degrees of freedom
## AIC: 68.14
## 
## Number of Fisher Scoring iterations: 6

fit logistic regression in R (probit link)

glm_binomial_probit <- glm(svi ~ lcavol, data = prostate,  family = binomial(link = "probit"))
summary(glm_binomial_probit)

## 
## Call:
## glm(formula = svi ~ lcavol, family = binomial(link = "probit"), 
##     data = prostate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.77321  -0.49129  -0.16705  -0.00706   2.34194  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.9142     0.5508  -5.290 1.22e-07 ***
## lcavol        1.1486     0.2474   4.643 3.43e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101.353  on 96  degrees of freedom
## Residual deviance:  63.654  on 95  degrees of freedom
## AIC: 67.654
## 
## Number of Fisher Scoring iterations: 7

Compare logit link and probit link

flogit <- function(x) exp(x)/(exp(x) + 1)
fprobit <- pnorm 

curve(flogit, -5, 5, ylab = "f(x)", lwd = 2)
curve(fprobit, -5, 5, add = T, col = 2 , lwd = 2)
legend("topleft", c("logit", "probit"), col=c(1,2),lwd = 2)

logit link function is more spread than probit link function.
both of them are useful
Which one is natural?

Exponential family (1)

Generalized liner model (GLM) allows the user to select a distribution from the exponential family.
The exponential family comprises a set of flexible distribution ranging both continuous and discrete random varaibles.
Many of probability distributions that we commonly used are specific cases of this family.

Exponential family (2)

The natural form of the probability density functuion (pdf) of a distribution in the exponential family is \[f(y) = \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi))\]
- \(\theta\) is called the canonical parameter or location parameter of the distribution.
- \(\phi\) is called the dispersion parameter or the scale parameter of the distribution, and is constant for all \(y\).
- \(a(\cdot), b(\cdot)\) and \(c(\cdot)\) are known functions.

The role of \(b(\theta)\) in the exponential family

\(E(Y|X) = \mu = b'(\theta)\)
\(Var(Y|X) = a(\phi)b''(\theta)\)

Comments:

The first derivative of \(b(\theta)\) is the mean of the distribution.
The second derivative of \(b(\theta)\) is related to the variance of the distribution.

Prove \(E(Y|X) = b'(\theta)\)

\(f(y; \theta) = \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi))\)
\(\int f(y; \theta) dy = 1\)
\(\frac{\partial}{\partial \theta}\int f(y; \theta) dy = 0\)
\(\int \frac{\partial}{\partial \theta}f(y; \theta) dy = 0\)
\(\frac{\partial}{\partial \theta}f(y; \theta) = f(y; \theta) \frac{y - b'(\theta)}{a}\)
\(\int f(y; \theta) \frac{y - b'(\theta)}{a} dy = 0\)
\(\int y f(y; \theta) dy = \int b'(\theta) f(y; \theta) dy\)
\(E(Y) = b'(\theta) \int f(y; \theta) dy = b'(\theta)\)

Prove \(Var(Y|X) = a(\phi) b''(\theta)\)

\(f(y; \theta) = \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi))\)
\(\int f(y; \theta) dy = 1\)
\(\frac{\partial}{\partial \theta}\int f(y; \theta) dy = 0\)
\(\int \frac{\partial}{\partial \theta}f(y; \theta) dy = 0\)
\(\int \frac{\partial^2}{\partial \theta^2}f(y; \theta) dy = 0\)
\(\frac{\partial}{\partial \theta}f(y; \theta) = f(y; \theta) \frac{y - b'(\theta)}{a}\)
\(\frac{\partial^2}{\partial \theta^2}f(y; \theta) = f(y; \theta) (\frac{y - b'(\theta)}{a})^2 - f(y; \theta) \frac{b''(\theta)}{a}\)
\(\int (f(y; \theta) (\frac{y - b'(\theta)}{a})^2 - f(y; \theta) \frac{b''(\theta)}{a}) dy = 0\)
\(\int f(y; \theta) (\frac{y - b'(\theta)}{a})^2 dy = \int f(y; \theta) \frac{b''(\theta)}{a} dy\)
\(\int f(y; \theta) (y - E(Y))^2 dy = a b''(\theta) \int f(y; \theta) dy\)
\(Var(Y) = a b''(\theta)\)

Canonical link

Canonical link is a function that links canonical parameter \(\theta\) in terms of the mean of the distribution \(\mu = E(Y|X)\)
The canonical link for each family is the one most commonly used
It arises naturally from the general formula for distributions in the exponential families.

\[f(y) = \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi))\]

Canonical link (normal distribution)

\[\begin{aligned} f(y) &= \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi)) \\ &= \frac{1}{\sqrt{2\pi \sigma^2}} \exp(- \frac{-(y - \mu)^2}{2\sigma^2}) \\ &= \exp( \frac{y \mu - \frac{\mu^2}{2}}{\sigma^2} - \frac{1}{2} [\frac{y^2}{\sigma^2 + \log(2\pi \sigma^2)}] ) \end{aligned}\]

Canonical parameter \(\theta = \mu\)

Canonical link (binomial distribution)

\[f(y) = \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi))\] \[f(y) = \mu^y (1 - \mu)^{1 - y}\]

\[f(y) = \exp(\frac{y \log(\frac{\mu}{1 - \mu}) - \log(\frac{1}{1 - \mu})}{1} + 0)\]

Canonical parameter \(\theta = \log(\frac{\mu}{1 - \mu})\)

Canonical link (Poisson distribution)

\[f(y) = \exp (\frac{y\theta - b(\theta)}{a(\phi)} + c(y; \phi))\] \[f(y) = \frac{\mu^y \exp (-\mu)}{y!}\]

\[f(y) = \exp(\frac{y \log(\mu) - \mu}{1} - \log(y!))\]

Canonical parameter \(\theta = \log(\mu)\)

Canonical link functions for distributions in the exponential families

Family	Canonical link	Canonical parameter \(\theta\)	Link function	Mean function
Gaussian	identity	\(\theta = \mu\)	\(X^\top\beta = \mu\)	\(\mu = X^\top\beta\)
Binomial	logit	\(\theta = \log(\frac{\mu}{1 - \mu})\)	\(X^\top\beta = \log(\frac{\mu}{1 - \mu})\)	\(\mu = \frac{\exp (X^\top\beta)}{1 + \exp (X^\top\beta)}\)
Poisson	log	\(\theta = \log(\mu)\)	\(X^\top\beta = \log(\mu)\)	\(\mu = \exp(X^\top\beta)\)

fit logistic regression in R (by default using logit link)

glm_binomial_logit <- glm(svi ~ lcavol, data = prostate,  family = binomial())
summary(glm_binomial_logit)

## 
## Call:
## glm(formula = svi ~ lcavol, family = binomial(), data = prostate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.79924  -0.48354  -0.21025  -0.04274   2.32135  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -5.0296     1.0429  -4.823 1.42e-06 ***
## lcavol        1.9798     0.4543   4.358 1.31e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101.35  on 96  degrees of freedom
## Residual deviance:  64.14  on 95  degrees of freedom
## AIC: 68.14
## 
## Number of Fisher Scoring iterations: 6

interpret logistic regression coefficient \(\beta\)

\(logit(\mu_i) = \log(\frac{\mu_i}{1 - \mu_i}) = X_i^\top \beta = \beta_0 + \beta_1 x_i\)
\(\frac{\mu_i}{1 - \mu_i} = \exp(\beta_0 + \beta_1 x_i)\)
Suppose there is only one binary predictor x

\[\hat{OR} = \frac{\hat{odds}|_{x_i=1}}{\hat{odds}|_{x_i=0}} = \frac{\frac{P(y_i=1)}{P(y_i=0)}|_{x_i=1}}{\frac{P(y_i=1)}{P(y_i=0)}|_{x_i=0}} = \frac{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_i=1}}{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_i=0}} = \frac{\exp(\hat{\beta}_0 + 1 \times \hat{\beta}_1)}{\exp(\hat{\beta}_0 + 0 \times \hat{\beta}_1)} = \exp(\hat{\beta}_1) \]

\(\beta_1\) is log odds ratio:
- The log odds of being \(y=1\) than \(y=0\) will increase by \(\beta_1\) if moving from \(x=0\) to \(x=1\)
\(\exp(\beta_1)\) is odds ratio:
- The odds of being \(y=1\) than \(y=0\) will increase by \(\exp(\beta_1)\) if moving from \(x=0\) to \(x=1\)

interpret logistic regression coefficient \(\beta\) adjusting for other covaraites

\(logit(\mu_i) = \log(\frac{\mu_i}{1 - \mu_i}) = \beta_0 + \beta_1 x_i + Z_i^\top\gamma\)
for predictor x

\[\hat{OR} = \frac{\hat{odds}|_{x_i=1}}{\hat{odds}|_{x_i=0}} = \frac{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_i=1}}{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_i=0}} = \frac{\exp(\hat{\beta}_0 + 1 \times \hat{\beta}_1 + Z_i^\top\gamma)}{\exp(\hat{\beta}_0 + 0 \times \hat{\beta}_1 + Z_i^\top\gamma)} = \exp(\hat{\beta}_1) \]

\(\beta_1\) is log odds ratio:
- The log odds of being \(y=1\) than \(y=0\) will increase by \(\beta_1\) if moving from \(x=0\) to \(x=1\), after adjusting for other covariates.
\(\exp(\beta_1)\) is odds ratio:
- The odds of being \(y=1\) than \(y=0\) will increase by \(\exp(\beta_1)\) if moving from \(x=0\) to \(x=1\), after adjusting for other covariates

logistic regression in R

glm_binomial3 <- glm(svi ~ lcavol + lweight + age, data = prostate,  family = binomial())
summary(glm_binomial3) # display results

## 
## Call:
## glm(formula = svi ~ lcavol + lweight + age, family = binomial(), 
##     data = prostate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.80253  -0.51277  -0.20249  -0.03868   2.29319  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.26605    3.85789  -1.624    0.104    
## lcavol       1.99773    0.46970   4.253 2.11e-05 ***
## lweight     -0.24193    0.91732  -0.264    0.792    
## age          0.03203    0.05097   0.628    0.530    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101.353  on 96  degrees of freedom
## Residual deviance:  63.742  on 93  degrees of freedom
## AIC: 71.742
## 
## Number of Fisher Scoring iterations: 6

confint(glm_binomial3) # 95% CI for the coefficients

## Waiting for profiling to be done...

##                    2.5 %    97.5 %
## (Intercept) -14.28625935 1.0304960
## lcavol        1.18463130 3.0485710
## lweight      -2.09281024 1.5785799
## age          -0.06780234 0.1336198

exp(coef(glm_binomial3)) # exponentiated coefficients

## (Intercept)      lcavol     lweight         age 
## 0.001899712 7.372329880 0.785110058 1.032546950

exp(confint(glm_binomial3)) # 95% CI for exponentiated coefficients

## Waiting for profiling to be done...

##                    2.5 %    97.5 %
## (Intercept) 6.245344e-07  2.802456
## lcavol      3.269481e+00 21.085191
## lweight     1.233400e-01  4.848066
## age         9.344452e-01  1.142958

Model selection for logistic regression (also all other GLM)

step(glm(svi ~ ., data = prostate,  family = binomial()))

## Start:  AIC=55.72
## svi ~ lcavol + lweight + age + lbph + lcp + gleason + pgg45 + 
##     lpsa
## 
##           Df Deviance    AIC
## - lcavol   1   37.760 53.760
## - pgg45    1   37.772 53.772
## - lweight  1   37.782 53.782
## - gleason  1   37.794 53.794
## - lbph     1   38.248 54.248
## - age      1   38.907 54.907
## <none>         37.721 55.721
## - lcp      1   48.013 64.013
## - lpsa     1   49.249 65.249
## 
## Step:  AIC=53.76
## svi ~ lweight + age + lbph + lcp + gleason + pgg45 + lpsa
## 
##           Df Deviance    AIC
## - pgg45    1   37.809 51.809
## - gleason  1   37.839 51.839
## - lweight  1   37.857 51.857
## - lbph     1   38.248 52.248
## - age      1   38.912 52.912
## <none>         37.760 53.760
## - lcp      1   50.549 64.549
## - lpsa     1   51.558 65.558
## 
## Step:  AIC=51.81
## svi ~ lweight + age + lbph + lcp + gleason + lpsa
## 
##           Df Deviance    AIC
## - lweight  1   37.898 49.898
## - gleason  1   38.076 50.076
## - lbph     1   38.398 50.398
## - age      1   38.920 50.920
## <none>         37.809 51.809
## - lpsa     1   52.945 64.945
## - lcp      1   54.318 66.318
## 
## Step:  AIC=49.9
## svi ~ age + lbph + lcp + gleason + lpsa
## 
##           Df Deviance    AIC
## - gleason  1   38.102 48.102
## - lbph     1   38.758 48.758
## - age      1   39.002 49.002
## <none>         37.898 49.898
## - lpsa     1   53.462 63.462
## - lcp      1   54.379 64.379
## 
## Step:  AIC=48.1
## svi ~ age + lbph + lcp + lpsa
## 
##        Df Deviance    AIC
## - lbph  1   38.959 46.959
## - age   1   39.066 47.066
## <none>      38.102 48.102
## - lpsa  1   53.509 61.509
## - lcp   1   57.700 65.700
## 
## Step:  AIC=46.96
## svi ~ age + lcp + lpsa
## 
##        Df Deviance    AIC
## - age   1   39.653 45.653
## <none>      38.959 46.959
## - lpsa  1   54.840 60.840
## - lcp   1   59.815 65.815
## 
## Step:  AIC=45.65
## svi ~ lcp + lpsa
## 
##        Df Deviance    AIC
## <none>      39.653 45.653
## - lpsa  1   55.355 59.355
## - lcp   1   60.781 64.781

## 
## Call:  glm(formula = svi ~ lcp + lpsa, family = binomial(), data = prostate)
## 
## Coefficients:
## (Intercept)          lcp         lpsa  
##      -8.532        1.343        2.198  
## 
## Degrees of Freedom: 96 Total (i.e. Null);  94 Residual
## Null Deviance:       101.4 
## Residual Deviance: 39.65     AIC: 45.65

Estimation for \(\beta\)

The estimating equation can be least square loss or likelihood function.
Two methods are typically used to solve the estimating equations of \(\beta\):
- Newton-Raphson method (an iterative method for solving nonlinear equations)
- Fisher scoring method (similar to the Newton-Raphson method but using the expected information or Hessian matrix instead of the observed information)
Will learn how to solve this problem in the optimization lecture.

Multinomial logistic Regression

When the outcome categories \(1,2,\ldots,J\) are unordered, a multinomial logit model defines \(J - 1\) logits with the form (treating \(j=1\) as the reference category):

\[\log (odds_{j1}) = \log(\frac{\mu_{ij}}{\mu_{i1}}) = \log(\frac{P(Y_i = j)}{P(Y_i = 1)}) = \beta_{0j} + \sum_k \beta_{kj} X_{ik}\]

For each model, there will be \(J - 1\) predicted log odds for each category relative to the reference category.
Interpretation of the coefficients are the same as that in a logistic regression but \(J - 1\) intercepts and \(J - 1\) sets of coefficients instead of one.
The coefficient \(\beta_{kj}\) can be interpreted as: the increase in log-odds of falling into category \(j\) versus category 1 (the reference category) resulting from a one-unit increase in covariate \(X_k\) holding the other covariates constant.
When \(J=2\), the multinomial logistic regression reduces to logistic regression.

Multinomial logistic Regression example

The program choices are general program, vocational program and academic program.
Their choices can be modeled using their writing score and their social economic status.

ml <- read.csv("https://caleb-huo.github.io/teaching/data/mlogistic_program/hsbdemo.csv",row.names=1)
head(ml)

##    id female    ses schtyp     prog read write math science socst
## 1  45 female    low public vocation   34    35   41      29    26
## 2 108   male middle public  general   34    33   41      36    36
## 3  15   male   high public vocation   39    39   44      26    42
## 4  67   male    low public vocation   37    37   42      33    32
## 5 153   male middle public vocation   39    31   40      39    51
## 6  51 female   high public  general   42    36   42      31    39
##         honors awards cid
## 1 not enrolled      0   1
## 2 not enrolled      0   1
## 3 not enrolled      0   1
## 4 not enrolled      0   1
## 5 not enrolled      0   1
## 6 not enrolled      0   1

Multinomial logistic Regression result

ml$prog2 <- relevel(ml$prog, ref = "vocation")
ml$ses2 <- relevel(ml$ses, ref = "low")

library("nnet")
test <- multinom(prog2 ~ ses2 + write, data = ml)

## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.993001
## final  value 179.981726 
## converged

summary(test)

## Call:
## multinom(formula = prog2 ~ ses2 + write, data = ml)
## 
## Coefficients:
##          (Intercept)   ses2high ses2middle      write
## academic   -5.218216  0.9826776 -0.2913902 0.11360290
## general    -2.366021 -0.1801588 -0.8246874 0.05567435
## 
## Std. Errors:
##          (Intercept)  ses2high ses2middle      write
## academic    1.163550 0.5955674  0.4763737 0.02221991
## general     1.174249 0.6484555  0.4901231 0.02333137
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635

Multinomial logistic Regression interpretation

A one-unit increase in write increases the log odds of being in academic program vs. vocation program by 0.11
A one-unit increase in write increases the log odds of being in general program vs. vocation program by 0.056
The log odds of being in academic program than in vocation program will increases by 0.98 if moving from ses=”low” to ses=”high”.

ordinal logistic regression

An ordinal logistic regression model has \(J-1\) logit functions

\[\log (\frac{P(Y > j | X)}{P(Y \le j | X)}) = \log (\frac{P(Y > j | X)}{1 - P(Y > j | X)}) = \alpha_j + \beta^T X\] For all \(j = 1, \ldots, J - 1\).

When \(J = 3\)

\[\log (\frac{P(Y > 1 | X)}{P(Y \le 1 | X)}) = \log (\frac{P(Y = 2 | X) + P(Y = 3 | X)}{P(Y = 1 | X)}) = \alpha_1 + \beta^T X\]

\[\log (\frac{P(Y > 2 | X)}{P(Y \le 2 | X)}) = \log (\frac{P(Y = 3 | X)}{P(Y = 1 | X) + P(Y = 2 | X)}) = \alpha_2 + \beta^T X\]

ordinal logistic regression

An ordinal logistic regression model has \(J-1\) logit functions

\[\log (\frac{P(Y > j | X)}{P(Y \le j | X)}) = \log (\frac{P(Y > j | X)}{1 - P(Y > j | X)}) = \alpha_j + \beta^T X\] For all \(j = 1, \ldots, J - 1\).

The sign of the coefficient \(\beta\) indicates that
- If \(\beta > 0\): it is more likely to observe higher values of \(Y\).
- If \(\beta < 0\): it is more likely to observe lower values of \(Y\).

Ordinal Logistic Regression example

dat <- read.csv("https://caleb-huo.github.io/teaching/data/ologit/ologit.csv")
head(dat)

##             apply pared public  gpa
## 1     very likely     0      0 3.26
## 2 somewhat likely     1      0 3.21
## 3        unlikely     1      1 3.94
## 4 somewhat likely     0      0 2.81
## 5 somewhat likely     0      0 2.53
## 6        unlikely     0      1 2.59

Ordinal logistic Regression result

library(MASS)
dat$apply2 <- relevel(dat$apply, ref = "unlikely")
m <- polr(apply2 ~ pared + public + gpa, data = dat, Hess=TRUE)
summary(m)

## Call:
## polr(formula = apply2 ~ pared + public + gpa, data = dat, Hess = TRUE)
## 
## Coefficients:
##           Value Std. Error t value
## pared   1.04769     0.2658  3.9418
## public -0.05879     0.2979 -0.1974
## gpa     0.61594     0.2606  2.3632
## 
## Intercepts:
##                             Value   Std. Error t value
## unlikely|somewhat likely     2.2039  0.7795     2.8272
## somewhat likely|very likely  4.2994  0.8043     5.3453
## 
## Residual Deviance: 717.0249 
## AIC: 727.0249

exp(coef(m))

##     pared    public       gpa 
## 2.8510579 0.9429088 1.8513972

Ordinal logistic Regression interpretation

One unit increase in parental education, from 0 (Low) to 1 (High), the odds of “very likely” applying versus “somewhat likely” or “unlikely” applying combined are 2.85 greater .
The odds “very likely” or “somewhat likely” applying versus “unlikely” applying is 2.85 times greater .
For gpa, when a student’s gpa increases 1 unit, the odds of moving from “unlikely” applying to “somewhat likely” or “very likely” applying (or from the lower and middle categories to the high category) are multiplied by 1.85.

GLM for count data

Possion regression
Negative bionomial

Poisson Model (Components of GLM for count data)

To analyze data where the outcome \(Y\) is a count, we can use GLM.
The random component is the Poisson distribution.
- \(Y|X \sim Poisson(\mu)\)
The systematic compoents is linear combination of the explanatory variables \(X^\top \beta\)
The most common link used is the log link, which is the canonical link from exponential family.
Poisson model for counts with log link has the form \(\log (\mu) = X^\top \beta\), which also refers to Poisson loglinear model.

Poisson distribution

Let \(\mu\) be the rate of occurrence of an event, or the expected number of times an event will occur during a given period.
Let Y be a random variable indicating the number times the event did occur. If \(Y \sim Poisson(\mu)\), then \[P(Y = y;\mu) = \frac{\exp(-\mu)\mu^y}{y!},\]
\(y = 0,1,2, \ldots\)
\(E(Y) = \mu = Var(Y)\)

Poisson Model

For Poisson Model, can we use the identify link instead of the log link?
This would be a linear model and give \(E(Y|X) = \mu = X^\top \beta\)
The problem with this formulation is that it can yield values of \(\mu < 0\)

fit Poisson Model in R

library(pscl)

## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis

head(bioChemists)

##   art   fem     mar kid5  phd ment
## 1   0   Men Married    0 2.52    7
## 2   0 Women  Single    0 2.05    6
## 3   0 Women  Single    0 3.75    6
## 4   0   Men Married    1 1.18    3
## 5   0 Women  Single    0 3.75   26
## 6   0 Women Married    2 3.59    2

fit Poisson Model in R

glm_poisson <- glm(art ~ . , data = bioChemists,  family = poisson())
summary(glm_poisson) # display results

## 
## Call:
## glm(formula = art ~ ., family = poisson(), data = bioChemists)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5672  -1.5398  -0.3660   0.5722   5.4467  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.304617   0.102981   2.958   0.0031 ** 
## femWomen    -0.224594   0.054613  -4.112 3.92e-05 ***
## marMarried   0.155243   0.061374   2.529   0.0114 *  
## kid5        -0.184883   0.040127  -4.607 4.08e-06 ***
## phd          0.012823   0.026397   0.486   0.6271    
## ment         0.025543   0.002006  12.733  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1634.4  on 909  degrees of freedom
## AIC: 3314.1
## 
## Number of Fisher Scoring iterations: 5

confint(glm_poisson) # 95% CI for the coefficients

## Waiting for profiling to be done...

##                   2.5 %      97.5 %
## (Intercept)  0.10155469  0.50526581
## femWomen    -0.33193995 -0.11781821
## marMarried   0.03520230  0.27584836
## kid5        -0.26422746 -0.10689876
## phd         -0.03881122  0.06467552
## ment         0.02154163  0.02940716

exp(coef(glm_poisson)) # exponentiated coefficients

## (Intercept)    femWomen  marMarried        kid5         phd        ment 
##   1.3561053   0.7988403   1.1679422   0.8312018   1.0129051   1.0258718

exp(confint(glm_poisson)) # 95% CI for exponentiated coefficients

## Waiting for profiling to be done...

##                 2.5 %    97.5 %
## (Intercept) 1.1068905 1.6574260
## femWomen    0.7175304 0.8888576
## marMarried  1.0358292 1.3176480
## kid5        0.7677989 0.8986166
## phd         0.9619323 1.0668128
## ment        1.0217753 1.0298438

Interpretation of \(\beta\) for Poisson model

Suppose there is only one covariate x
\(\log E(Y|X) = \log(\mu) = \beta_0 + \beta_1 X\)
- \(X = 0\), \(\log(\mu | X=0) = \beta_0\), \(\mu_0 = \exp (\beta_0)\)
- \(X = 1\), \(\log(\mu | X=1) = \beta_0 + \beta_1\), \(\mu_1 = \exp (\beta_0 + \beta_1)\)
\(\log(\mu_1) - \log(\mu_0) = \beta_1\)
\(\log(\frac{\mu_1}{\mu_0}) = \beta_1\)
\(\mu_1 = \exp(\beta_1) \mu_0\)

\(\beta_1\) is difference in log of expected counts when \(X\) increases by 1 unit.

\(\exp (\beta_1)\) is multiplicative effect of the mean of \(Y\) when \(X\) increases by 1 unit.

Model selection for Poisson regression (also all other GLM)

step(glm_poisson)

## Start:  AIC=3314.11
## art ~ fem + mar + kid5 + phd + ment
## 
##        Df Deviance    AIC
## - phd   1   1634.6 3312.3
## <none>      1634.4 3314.1
## - mar   1   1640.8 3318.5
## - fem   1   1651.5 3329.2
## - kid5  1   1656.5 3334.2
## - ment  1   1766.2 3444.0
## 
## Step:  AIC=3312.35
## art ~ fem + mar + kid5 + ment
## 
##        Df Deviance    AIC
## <none>      1634.6 3312.3
## - mar   1   1640.8 3316.6
## - fem   1   1651.8 3327.5
## - kid5  1   1656.7 3332.4
## - ment  1   1776.7 3452.5

## 
## Call:  glm(formula = art ~ fem + mar + kid5 + ment, family = poisson(), 
##     data = bioChemists)
## 
## Coefficients:
## (Intercept)     femWomen   marMarried         kid5         ment  
##     0.34517     -0.22530      0.15218     -0.18499      0.02576  
## 
## Degrees of Freedom: 914 Total (i.e. Null);  910 Residual
## Null Deviance:       1817 
## Residual Deviance: 1635  AIC: 3312

Poission regression for Rate data

Events can occur over time and the length of time can vary from observation to observation.
A rate is given by \(\lambda_i = \mu_i/n_i = E(Y_i|X_i)/n_i\), where \(n_i\) is follow-up time.
We can write a Poisson model in terms of the rates using \(\log(\lambda_i) = \log(\mu_i/n_i) = \log(\mu_i) - \log(n_i)\)
The following Poisson log-linear models for rates are equivalent:
- \(\log(\lambda_i) = X_i^\top \beta\)
- \(\log(\mu_i) - \log(n_i) = X_i^\top \beta\)
- \(\log(\mu_i) = \log(n_i) + X_i^\top \beta\)

Modeling the rate

Poisson model for rate: \(\log(\mu_i) = \log(n_i) + X_i^\top \beta\)
- \(n_i\) is called the exposure.
- \(\log (n_i)\) is called the offset term

burnData <- read.csv("https://caleb-huo.github.io/teaching/data/Burn/burn.csv", row.names = 1)
head(burnData)

##   treatment gender race percent head buttock trunk upper_leg lower_leg
## 1         0      0    0      15    0       0     1         1         0
## 2         0      0    1      20    0       0     1         0         0
## 3         0      0    1      15    0       0     0         1         1
## 4         0      0    0      20    1       0     1         0         0
## 5         0      0    1      70    1       1     1         1         0
## 6         0      0    1      20    1       0     1         0         0
##   resp_tract type numfup infection
## 1          0    2     12         0
## 2          0    4      9         0
## 3          0    2      7         1
## 4          0    2     29         0
## 5          0    2      4         1
## 6          0    4      8         1

Modeling the rate

glm_poisson_rate <- glm(infection ~ . - numfup + offset(log(numfup)), data = burnData,  family = poisson())
summary(glm_poisson_rate) # display results

## 
## Call:
## glm(formula = infection ~ . - numfup + offset(log(numfup)), family = poisson(), 
##     data = burnData)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0008  -0.8023  -0.5350   1.0419   2.5011  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.603081   1.201833  -4.662 3.13e-06 ***
## treatment   -0.622350   0.316669  -1.965   0.0494 *  
## gender      -0.724283   0.396879  -1.825   0.0680 .  
## race         2.155901   1.016616   2.121   0.0340 *  
## percent      0.003402   0.009854   0.345   0.7299    
## head        -0.023392   0.353237  -0.066   0.9472    
## buttock      0.612703   0.421929   1.452   0.1465    
## trunk        0.058203   0.500184   0.116   0.9074    
## upper_leg   -0.472179   0.373669  -1.264   0.2064    
## lower_leg   -0.299164   0.387843  -0.771   0.4405    
## resp_tract  -0.042332   0.364401  -0.116   0.9075    
## type        -0.045439   0.171668  -0.265   0.7912    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 207.46  on 153  degrees of freedom
## Residual deviance: 184.46  on 142  degrees of freedom
## AIC: 304.46
## 
## Number of Fisher Scoring iterations: 7

Poisson Overdispersion

Poisson regression is the standard method used to model count (and rate) response data.
Poisson distribution assumes the equality of its mean and variance, which is a property that is rarely found in real data.
Response variance is greater than the mean in Poisson models is called (Poisson) overdispersion.

bioChemists data

mean(bioChemists$art)

## [1] 1.692896

var(bioChemists$art)

## [1] 3.709742

Negative Bionomial Regression

Negative binomial (NB) regression is a standard method to model overdispersed Poisson data.
Negative binomial can be derived from a Poisson-gamma mixture model.
- Let random variable \(Y|\lambda \sim Poisson(\lambda)\) and \(\lambda \sim gamma(\alpha, \beta)\).
- The marginal distribution \(Y\) follows a negative binomial distribution with \(r = \alpha\) and \(p = 1/(\beta + 1)\)

\[Y|\lambda \sim Poisson(\lambda)\] \[\lambda \sim gamma(\alpha, \beta)\] \[Y \sim NB(\alpha, 1/(\beta + 1))\]

Negative Binomial Distribution

Negative binomial pdf (the number of failures before the \(r\)th success occurs) \[f(y;r,p) = \binom{y + r - 1}{r - 1} p^r (1 - p)^y\]
- \(0 \le p \le 1\) and \(r\) is a positive integer
- \(y + r\) is total number of trials.
Exponential family form: \[f(y;r,p) = \exp\{y\log(1 - p) + r\log p + \log \binom{y + r - 1}{r - 1} \}\]
- Canonical parameter: \(\theta = \log (1 - p) \rightarrow p = 1 - \exp(\theta)\)
- \(b(\theta) = -r \log (p) = -r \log (1 - \exp(\theta))\)
- Scale parameter: \(a(\phi) = 1\)

The canonical link for the negative binomial distribution is rather complicated and hard to interpret, so it is rarely used. Instead to facilitate comparisons with the Poisson generalized linear model, a log link is typically used.

Negative Binomial Distribution

Negative bionomial mean: \[b'(\theta) = \frac{\partial b}{\partial p} \times \frac{\partial p}{\partial \theta} = -\frac{r}{p}\{-(1-p)\} = \frac{r(1-p)}{p} \]
Negative bionomial variance:

\[b''(\theta) = \frac{\partial^2 b}{\partial p^2} \times (\frac{\partial p}{\partial \theta})^2 + \frac{\partial b}{\partial p} \times \frac{\partial^2 p}{\partial \theta^2} = \frac{r(1-p)}{p^2} \]

Reparametrization:
- \(\alpha = 1/r\)
- \(\mu = \frac{r(1-p)}{p}\)
Negative bionomial mean: \[b'(\theta) = \frac{\partial b}{\partial p} \times \frac{\partial p}{\partial \theta} = \mu \]
Negative bionomial variance:

\[b''(\theta) = \frac{\partial^2 b}{\partial p^2} \times (\frac{\partial p}{\partial \theta})^2 + \frac{\partial b}{\partial p} \times \frac{\partial^2 p}{\partial \theta^2} = \mu + \alpha \mu^2 \] The variance is always larger than the mean for the negative binomial.

NB is possion if \(\alpha = 0\)

Negative binomial example 1

library(MASS)
glm_nb <- glm.nb(art ~ . , data = bioChemists)
summary(glm_nb) # display results

## 
## Call:
## glm.nb(formula = art ~ ., data = bioChemists, init.theta = 2.264387695, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1678  -1.3617  -0.2806   0.4476   3.4524  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.256144   0.137348   1.865 0.062191 .  
## femWomen    -0.216418   0.072636  -2.979 0.002887 ** 
## marMarried   0.150489   0.082097   1.833 0.066791 .  
## kid5        -0.176415   0.052813  -3.340 0.000837 ***
## phd          0.015271   0.035873   0.426 0.670326    
## ment         0.029082   0.003214   9.048  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.2644) family taken to be 1)
## 
##     Null deviance: 1109.0  on 914  degrees of freedom
## Residual deviance: 1004.3  on 909  degrees of freedom
## AIC: 3135.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.264 
##           Std. Err.:  0.271 
## 
##  2 x log-likelihood:  -3121.917

Negative binomial example 2

glm_nb_rate <- glm.nb(infection ~ . - numfup + offset(log(numfup)), data = burnData)

## Warning in glm.nb(infection ~ . - numfup + offset(log(numfup)), data =
## burnData): alternation limit reached

summary(glm_nb_rate)

## 
## Call:
## glm.nb(formula = infection ~ . - numfup + offset(log(numfup)), 
##     data = burnData, init.theta = 0.6478173549, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7731  -0.8564  -0.5674   0.5065   2.3192  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.285673   1.300638  -4.064 4.83e-05 ***
## treatment   -0.840907   0.371826  -2.262   0.0237 *  
## gender      -1.037906   0.464514  -2.234   0.0255 *  
## race         2.620297   1.081204   2.423   0.0154 *  
## percent      0.005769   0.011903   0.485   0.6279    
## head        -0.208180   0.410433  -0.507   0.6120    
## buttock      0.899329   0.500276   1.798   0.0722 .  
## trunk        0.339320   0.565532   0.600   0.5485    
## upper_leg   -0.692250   0.446461  -1.551   0.1210    
## lower_leg   -0.180517   0.428152  -0.422   0.6733    
## resp_tract  -0.191947   0.437243  -0.439   0.6607    
## type        -0.149369   0.201242  -0.742   0.4579    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.6478) family taken to be 1)
## 
##     Null deviance: 149.78  on 153  degrees of freedom
## Residual deviance: 126.24  on 142  degrees of freedom
## AIC: 299.93
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.648 
##           Std. Err.:  0.246 
## Warning while fitting theta: alternation limit reached 
## 
##  2 x log-likelihood:  -273.928

Interpretation Using the Count (or Rate)

As the mean structure for the negative binomial regression is identical to that for the Poisson regression, the same methods of interpretation based on \(E(Y=y|z,x)\) can be used
Model: \(\log E(Y=y|Z,X) = \gamma Z + \beta X\) \[\frac{E(Y=y|Z=z,X=x+\Delta)}{E(Y=y|Z=z,X=x)} = \exp(\beta \Delta)\]

For a change of \(\Delta\) in \(x\), the expected count increases by a factor of \(\exp(\gamma \Delta)\) holding other variables \(Z = z\) constant.

Other popular GLM for count data

zero inflated count data (excessive zeros)
- zero inflated Poisson regressoin
- zero inflated negative binomial regressoin
zero truncated count data: (zero cannot occur)
- zero truncated Poisson regressoin
- zero truncated negative binomial regressoin

zero inflated Poisson regression

The zero-inflated Poisson (ZIP) regression is used for count data with excess zeros.
The data distribution combines the Poisson distribution and the Bernoulli distribution.

\[ P(y_i = j)= \begin{cases} \pi_i + (1 - \pi_i) \exp (-\mu_i), \mbox{if } j=0\\ (1 - \pi_i) \frac{\mu_i^{y_i} \exp(-\mu_i)}{y_i!}, \mbox{if } j>0 \end{cases} \] - The Poisson component is \[\log(\mu_i) = \beta_0x_{10} + \beta_1 x_{1i} + \ldots + \beta_px_{1p}\] - The zero proportion component is \[\pi_i = \frac{\exp(\gamma_0z_{10} + \gamma_1 z_{1i} + \ldots + \gamma_pz_{1p})}{1 + \exp(\gamma_0z_{10} + \gamma_1 z_{1i} + \ldots + \gamma_pz_{1p})}\]

ZIP in R

zeroinfl:
- y ~ x | z
- x: for the Poisson component
- z: for the zero proportion component

fm_pois <- glm(art ~ ., data = bioChemists, family = poisson) 
## without inflation

fm_zip <- zeroinfl(art ~ . | 1, data = bioChemists) 
## with simple inflation (no regressors for zero component)

fm_zip2 <- zeroinfl(art ~ . | ., data = bioChemists) 
## inflation with regressors
## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")

ZINB in R

The zero-inflated negative binomial (ZINB) regression is used for count data with overdispersion and excess zeros.

fm_nb <- MASS::glm.nb(art ~ ., data = bioChemists)
## without inflation

fm_zinb <- zeroinfl(art ~ . | 1, data = bioChemists, dist = "negbin")
## with simple inflation (no regressors for zero component)

fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")
## inflation with regressors
## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")

Further extensions

We can extend these conditions for linear regression model

Random (stochastic) component
- \(\mu = E(Y|X), Y|X \sim f(\mu)\), \(f\) is a distribution from the exponential family.
Systematic component – Assume linear model: \(X^\top\beta\)
Link function
- \(X^\top\beta = g(\mu)\)

How about non linear systematic component:

generalized additive model
- Assume additive models for the link function: \(g(\mu) = \beta_0 + \sum_p r_p (x_p)\)

Biostatistical Computing, PHC 6068

Generalized Linear Model

Outline

Review: linear regression model

Limitation of linear model:

Generalize the “linear regression model” (1)

Generalize the “linear regression model” (2)

Commonly used distribution \(f\) in GLM

Standard link functions and their inverses

logistic regression motivating example

logistic regression motivating example

logistic regression

Prostate cancer data

fit logistic regression in R (logit link)

fit logistic regression in R (probit link)

Compare logit link and probit link

Exponential family (1)

Exponential family (2)

The role of \(b(\theta)\) in the exponential family

Prove \(E(Y|X) = b'(\theta)\)

Prove \(Var(Y|X) = a(\phi) b''(\theta)\)

Canonical link

Canonical link (normal distribution)

Canonical link (binomial distribution)

Canonical link (Poisson distribution)

Canonical link functions for distributions in the exponential families

fit logistic regression in R (by default using logit link)

interpret logistic regression coefficient \(\beta\)

interpret logistic regression coefficient \(\beta\) adjusting for other covaraites

logistic regression in R

Model selection for logistic regression (also all other GLM)

Estimation for \(\beta\)

Multinomial logistic Regression

Multinomial logistic Regression example

Multinomial logistic Regression result

Multinomial logistic Regression interpretation

ordinal logistic regression

ordinal logistic regression

Ordinal Logistic Regression example

Ordinal logistic Regression result

Ordinal logistic Regression interpretation

GLM for count data

Poisson Model (Components of GLM for count data)

Poisson distribution

Poisson Model

fit Poisson Model in R

fit Poisson Model in R

Interpretation of \(\beta\) for Poisson model

Model selection for Poisson regression (also all other GLM)

Poission regression for Rate data

Modeling the rate

Modeling the rate

Poisson Overdispersion

bioChemists data

Negative Bionomial Regression

Negative Binomial Distribution

Negative Binomial Distribution

Negative binomial example 1

Negative binomial example 2

Interpretation Using the Count (or Rate)

Other popular GLM for count data

zero inflated Poisson regression

ZIP in R

ZINB in R

Further extensions