Biostatistical Computing, PHC 6068

Generalized Linear Model

Zhiguang Huo (Caleb)

Monday October 9, 2017

Recap: linear regression model

\[Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \ldots + \beta_pX_p + \varepsilon, \text{with }\varepsilon \sim N(0, \sigma^2)\]

Intepretation of linear model coefficients:

Limitation of linear model:

How about other data types?

Generalize the “linear regression model” (1)

\[Y = X^\top\beta + \varepsilon, \text{with }\varepsilon \sim N(0, \sigma^2)\]

Generalize the “linear regression model” (2)

We can generalize these conditions for linear regression model

Commonly used distribution \(f\) in GLM

logistic regression motivating example

X <- matrix(c(189, 104, 10845, 10933), nrow=2,
            dimnames=list(Treatment=c("Placebo","Aspirin"), 
                          "Myocardial Infarction"=c("Yes", "No")))
X
##          Myocardial Infarction
## Treatment Yes    No
##   Placebo 189 10845
##   Aspirin 104 10933
odds.ratio0 <- function(X){
  result <- X[1,1]*X[2,2]/(X[1,2]*X[2,1])
  return(result)
}
odds.ratio0(X)
## [1] 1.832054

logistic regression motivating example

ID Y(Yes/No) Treatment(1/0)
1 1 0
2 0 1
3 1 1
4 1 0

logistic regression

Prostate cancer data

The data is from the book element of statistical learning

library(ElemStatLearn)
str(prostate)
## 'data.frame':    97 obs. of  10 variables:
##  $ lcavol : num  -0.58 -0.994 -0.511 -1.204 0.751 ...
##  $ lweight: num  2.77 3.32 2.69 3.28 3.43 ...
##  $ age    : int  50 58 74 58 62 50 64 58 47 63 ...
##  $ lbph   : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ svi    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lcp    : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ gleason: int  6 6 7 6 6 6 6 6 6 6 ...
##  $ pgg45  : int  0 0 20 0 0 0 0 0 0 0 ...
##  $ lpsa   : num  -0.431 -0.163 -0.163 -0.163 0.372 ...
##  $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
prostate$train <- NULL

Exponential family (1)

Exponential family (2)

The role of \(b(\theta)\) in the exponential family

Comments:

Prove \(E(Y|X) = b'(\theta)\)

Prove \(Var(Y|X) = a(\phi) b''(\theta)\)

interpret logistic regression

\[y|X \sim f(\mu)\]

flogit_plus <- function(x) exp(x)/(exp(x) + 1)
flogit_minus <- function(x) exp(-x)/(exp(-x) + 1)

par(mfrow = c(1,2))
curve(flogit_plus, -5, 5, ylab = expression(beta > 0), lwd = 2)
curve(flogit_minus, -5, 5,  ylab = expression(beta < 0), lwd = 2)

interpret logistic regression coefficient \(\beta\)

\[\hat{OR} = \frac{\hat{odds}|_{x_1=1}}{\hat{odds}|_{x_1=0}} = \frac{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_1=1}}{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_1=0}} = \frac{\exp(\hat{\beta}_0 + 1 \times \hat{\beta}_1)}{\exp(\hat{\beta}_0 + 0 \times \hat{\beta}_1)} = \exp(\hat{\beta}_1) \]

interpret logistic regression coefficient \(\beta\) adjusting for other covaraites

\[\hat{OR} = \frac{\hat{odds}|_{x_1=1}}{\hat{odds}|_{x_1=0}} = \frac{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_1=1}}{\frac{\hat{\mu}_i}{1 - \hat{\mu}_i}|_{x_1=0}} = \frac{\exp(\hat{\beta}_0 + 1 \times \hat{\beta}_1 + Z_i^\top\gamma)}{\exp(\hat{\beta}_0 + 0 \times \hat{\beta}_1 + Z_i^\top\gamma)} = \exp(\hat{\beta}_1) \]

logistic regression in R

glm_binomial3 <- glm(svi ~ lcavol + lweight + age, data = prostate,  family = binomial())
summary(glm_binomial3) # display results
## 
## Call:
## glm(formula = svi ~ lcavol + lweight + age, family = binomial(), 
##     data = prostate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.80253  -0.51277  -0.20249  -0.03868   2.29319  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.26605    3.85789  -1.624    0.104    
## lcavol       1.99773    0.46970   4.253 2.11e-05 ***
## lweight     -0.24193    0.91732  -0.264    0.792    
## age          0.03203    0.05097   0.628    0.530    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 101.353  on 96  degrees of freedom
## Residual deviance:  63.742  on 93  degrees of freedom
## AIC: 71.742
## 
## Number of Fisher Scoring iterations: 6
confint(glm_binomial3) # 95% CI for the coefficients
## Waiting for profiling to be done...
##                    2.5 %    97.5 %
## (Intercept) -14.28625935 1.0304960
## lcavol        1.18463130 3.0485710
## lweight      -2.09281024 1.5785799
## age          -0.06780234 0.1336198
exp(coef(glm_binomial3)) # exponentiated coefficients
## (Intercept)      lcavol     lweight         age 
## 0.001899712 7.372329880 0.785110058 1.032546950
exp(confint(glm_binomial3)) # 95% CI for exponentiated coefficients
## Waiting for profiling to be done...
##                    2.5 %    97.5 %
## (Intercept) 6.245344e-07  2.802456
## lcavol      3.269481e+00 21.085191
## lweight     1.233400e-01  4.848066
## age         9.344452e-01  1.142958
head(predict(glm_binomial3, type="response")) # predicted values type = c("link", " response", "terms")
##            1            2            3            4            5 
## 0.0015117583 0.0007476917 0.0038053593 0.0004963039 0.0263507563 
##            6 
## 0.0005295007
head(residuals(glm_binomial3)) # residuals
##           1           2           3           4           5           6 
## -0.05500731 -0.03867742 -0.08732260 -0.03150959 -0.23110240 -0.03254661

Model selection for logistic regression (also all other GLM)

step(glm(svi ~ ., data = prostate,  family = binomial()))
## Start:  AIC=55.72
## svi ~ lcavol + lweight + age + lbph + lcp + gleason + pgg45 + 
##     lpsa
## 
##           Df Deviance    AIC
## - lcavol   1   37.760 53.760
## - pgg45    1   37.772 53.772
## - lweight  1   37.782 53.782
## - gleason  1   37.794 53.794
## - lbph     1   38.248 54.248
## - age      1   38.907 54.907
## <none>         37.721 55.721
## - lcp      1   48.013 64.013
## - lpsa     1   49.249 65.249
## 
## Step:  AIC=53.76
## svi ~ lweight + age + lbph + lcp + gleason + pgg45 + lpsa
## 
##           Df Deviance    AIC
## - pgg45    1   37.809 51.809
## - gleason  1   37.839 51.839
## - lweight  1   37.857 51.857
## - lbph     1   38.248 52.248
## - age      1   38.912 52.912
## <none>         37.760 53.760
## - lcp      1   50.549 64.549
## - lpsa     1   51.558 65.558
## 
## Step:  AIC=51.81
## svi ~ lweight + age + lbph + lcp + gleason + lpsa
## 
##           Df Deviance    AIC
## - lweight  1   37.898 49.898
## - gleason  1   38.076 50.076
## - lbph     1   38.398 50.398
## - age      1   38.920 50.920
## <none>         37.809 51.809
## - lpsa     1   52.945 64.945
## - lcp      1   54.318 66.318
## 
## Step:  AIC=49.9
## svi ~ age + lbph + lcp + gleason + lpsa
## 
##           Df Deviance    AIC
## - gleason  1   38.102 48.102
## - lbph     1   38.758 48.758
## - age      1   39.002 49.002
## <none>         37.898 49.898
## - lpsa     1   53.462 63.462
## - lcp      1   54.379 64.379
## 
## Step:  AIC=48.1
## svi ~ age + lbph + lcp + lpsa
## 
##        Df Deviance    AIC
## - lbph  1   38.959 46.959
## - age   1   39.066 47.066
## <none>      38.102 48.102
## - lpsa  1   53.509 61.509
## - lcp   1   57.700 65.700
## 
## Step:  AIC=46.96
## svi ~ age + lcp + lpsa
## 
##        Df Deviance    AIC
## - age   1   39.653 45.653
## <none>      38.959 46.959
## - lpsa  1   54.840 60.840
## - lcp   1   59.815 65.815
## 
## Step:  AIC=45.65
## svi ~ lcp + lpsa
## 
##        Df Deviance    AIC
## <none>      39.653 45.653
## - lpsa  1   55.355 59.355
## - lcp   1   60.781 64.781
## 
## Call:  glm(formula = svi ~ lcp + lpsa, family = binomial(), data = prostate)
## 
## Coefficients:
## (Intercept)          lcp         lpsa  
##      -8.532        1.343        2.198  
## 
## Degrees of Freedom: 96 Total (i.e. Null);  94 Residual
## Null Deviance:       101.4 
## Residual Deviance: 39.65     AIC: 45.65

Estimation for \(\beta\)

Multinomial logistic Regression

When the outcome categories \(1,2,\ldots,J\) are unordered, a multinomial logit model defines \(J - 1\) logits with the form (treating \(j=1\) as the reference category):

\[\log (odds_{j1}) = \log(\frac{\mu_{ij}}{\mu_{i1}}) = \log(\frac{P(Y_i = j)}{P(Y_i = 1)}) = \beta_{0j} + \beta_j^\top X_i\]

Multinomial logistic Regression example

ml <- read.csv(url("https://raw.githubusercontent.com/rlowrance/re/master/hsbdemo.csv"), row.names=1)

head(ml)
##    id female    ses schtyp     prog read write math science socst
## 1  45 female    low public vocation   34    35   41      29    26
## 2 108   male middle public  general   34    33   41      36    36
## 3  15   male   high public vocation   39    39   44      26    42
## 4  67   male    low public vocation   37    37   42      33    32
## 5 153   male middle public vocation   39    31   40      39    51
## 6  51 female   high public  general   42    36   42      31    39
##         honors awards cid
## 1 not enrolled      0   1
## 2 not enrolled      0   1
## 3 not enrolled      0   1
## 4 not enrolled      0   1
## 5 not enrolled      0   1
## 6 not enrolled      0   1

Multinomial logistic Regression result

ml$prog2 <- relevel(ml$prog, ref = "vocation")
ml$ses2 <- relevel(ml$ses, ref = "low")

library("nnet")
test <- multinom(prog2 ~ ses2 + write, data = ml)
## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.993001
## final  value 179.981726 
## converged
summary(test)
## Call:
## multinom(formula = prog2 ~ ses2 + write, data = ml)
## 
## Coefficients:
##          (Intercept)   ses2high ses2middle      write
## academic   -5.218216  0.9826776 -0.2913902 0.11360290
## general    -2.366021 -0.1801588 -0.8246874 0.05567435
## 
## Std. Errors:
##          (Intercept)  ses2high ses2middle      write
## academic    1.163550 0.5955674  0.4763737 0.02221991
## general     1.174249 0.6484555  0.4901231 0.02333137
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635

Multinomial logistic Regression interpretation

  1. A one-unit increase in write increases the log odds of being in academic program vs. vocation program by 0.11
  2. A one-unit increase in write increases the log odds of being in general program vs. vocation program by 0.056
  3. The log odds of being in academic program than in vocation program will increases by 0.98 if moving from ses=”low” to ses=”high”.

ordinal logistic regression

\[log (\frac{P(Y\le j | X)}{P(Y > j | X)}) = log (\frac{P(Y\le j | X)}{1 - P(Y \le j | X)}) = \alpha_j - \beta^T X\] For all \(j = 1, \ldots, J - 1\).

Ordinal Logistic Regression example

dat <- read.csv(url("https://raw.githubusercontent.com/DragonflyStats/MS4024Rstats/master/datasets/ologit.csv"))
head(dat)
##   X           apply pared public  gpa
## 1 1     very likely     0      0 3.26
## 2 2 somewhat likely     1      0 3.21
## 3 3        unlikely     1      1 3.94
## 4 4 somewhat likely     0      0 2.81
## 5 5 somewhat likely     0      0 2.53
## 6 6        unlikely     0      1 2.59

Ordinal logistic Regression result

library(MASS)
dat$apply2 <- relevel(dat$apply, ref = "unlikely")
m <- polr(apply2 ~ pared + public + gpa, data = dat, Hess=TRUE)
summary(m)
## Call:
## polr(formula = apply2 ~ pared + public + gpa, data = dat, Hess = TRUE)
## 
## Coefficients:
##           Value Std. Error t value
## pared   1.04769     0.2658  3.9418
## public -0.05879     0.2979 -0.1974
## gpa     0.61594     0.2606  2.3632
## 
## Intercepts:
##                             Value   Std. Error t value
## unlikely|somewhat likely     2.2039  0.7795     2.8272
## somewhat likely|very likely  4.2994  0.8043     5.3453
## 
## Residual Deviance: 717.0249 
## AIC: 727.0249
exp(coef(m))
##     pared    public       gpa 
## 2.8510579 0.9429088 1.8513972

Ordinal logistic Regression interpretation

  1. One unit increase in parental education, from 0 (Low) to 1 (High), the odds of “very likely” applying versus “somewhat likely” or “unlikely” applying combined are 2.85 greater .

  2. The odds “very likely” or “somewhat likely” applying versus “unlikely” applying is 2.85 times greater .

  3. For gpa, when a student’s gpa moves 1 unit, the odds of moving from “unlikely” applying to “somewhat likely” or “very likley” applying (or from the lower and middle categories to the high category) are multiplied by 1.85.

GLM for count data

Poisson Model (Components of GLM for count data)

Poisson distribution

Poisson Model

fit Poisson Model in R

library(pscl)
## Loading required package: lattice
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
head(bioChemists)
##   art   fem     mar kid5  phd ment
## 1   0   Men Married    0 2.52    7
## 2   0 Women  Single    0 2.05    6
## 3   0 Women  Single    0 3.75    6
## 4   0   Men Married    1 1.18    3
## 5   0 Women  Single    0 3.75   26
## 6   0 Women Married    2 3.59    2

fit Poisson Model in R

glm_poisson <- glm(art ~ . , data = bioChemists,  family = poisson())
summary(glm_poisson) # display results
## 
## Call:
## glm(formula = art ~ ., family = poisson(), data = bioChemists)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5672  -1.5398  -0.3660   0.5722   5.4467  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.304617   0.102981   2.958   0.0031 ** 
## femWomen    -0.224594   0.054613  -4.112 3.92e-05 ***
## marMarried   0.155243   0.061374   2.529   0.0114 *  
## kid5        -0.184883   0.040127  -4.607 4.08e-06 ***
## phd          0.012823   0.026397   0.486   0.6271    
## ment         0.025543   0.002006  12.733  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1634.4  on 909  degrees of freedom
## AIC: 3314.1
## 
## Number of Fisher Scoring iterations: 5
confint(glm_poisson) # 95% CI for the coefficients
## Waiting for profiling to be done...
##                   2.5 %      97.5 %
## (Intercept)  0.10155469  0.50526581
## femWomen    -0.33193995 -0.11781821
## marMarried   0.03520230  0.27584836
## kid5        -0.26422746 -0.10689876
## phd         -0.03881122  0.06467552
## ment         0.02154163  0.02940716
exp(coef(glm_poisson)) # exponentiated coefficients
## (Intercept)    femWomen  marMarried        kid5         phd        ment 
##   1.3561053   0.7988403   1.1679422   0.8312018   1.0129051   1.0258718
exp(confint(glm_poisson)) # 95% CI for exponentiated coefficients
## Waiting for profiling to be done...
##                 2.5 %    97.5 %
## (Intercept) 1.1068905 1.6574260
## femWomen    0.7175304 0.8888576
## marMarried  1.0358292 1.3176480
## kid5        0.7677989 0.8986166
## phd         0.9619323 1.0668128
## ment        1.0217753 1.0298438
head(predict(glm_poisson)) # predicted values type = c("link", " response", "terms")
##           1           2           3           4           5           6 
##  0.67097234  0.25956537  0.28136376  0.36673640  0.79221866 -0.03738085
head(residuals(glm_poisson)) # residuals
##         1         2         3         4         5         6 
## -1.977948 -1.610197 -1.627842 -1.698833 -2.101566 -1.388027

Model selection for Poisson regression (also all other GLM)

step(glm_poisson)
## Start:  AIC=3314.11
## art ~ fem + mar + kid5 + phd + ment
## 
##        Df Deviance    AIC
## - phd   1   1634.6 3312.3
## <none>      1634.4 3314.1
## - mar   1   1640.8 3318.5
## - fem   1   1651.5 3329.2
## - kid5  1   1656.5 3334.2
## - ment  1   1766.2 3444.0
## 
## Step:  AIC=3312.35
## art ~ fem + mar + kid5 + ment
## 
##        Df Deviance    AIC
## <none>      1634.6 3312.3
## - mar   1   1640.8 3316.6
## - fem   1   1651.8 3327.5
## - kid5  1   1656.7 3332.4
## - ment  1   1776.7 3452.5
## 
## Call:  glm(formula = art ~ fem + mar + kid5 + ment, family = poisson(), 
##     data = bioChemists)
## 
## Coefficients:
## (Intercept)     femWomen   marMarried         kid5         ment  
##     0.34517     -0.22530      0.15218     -0.18499      0.02576  
## 
## Degrees of Freedom: 914 Total (i.e. Null);  910 Residual
## Null Deviance:       1817 
## Residual Deviance: 1635  AIC: 3312

Interpretation of \(\beta\) for Poisson model

\(\beta_1\) is difference in log of expected counts when \(X\) increases by 1 unit.

\(\exp (\beta_1)\) is multiplicative effect of the mean of \(Y\) when \(X\) increase by 1 unit.

If \(\beta_1 > 0\), then \(\exp(\beta_1) > 1\). Therefore, \(\mu_1 > \mu_0\); \(\mu_1\) is \(\exp (\beta_1)\) times larger than \(\mu_0\).

If \(\beta_1 < 0\), then \(\exp(\beta_1) < 1\). Therefore, \(\mu_1 < \mu_0\); \(\mu_1\) is \(\exp (\beta_1)\) times smaller than \(\mu_0\).

Poission regression for Rate data

Modeling the rate

burnData <- read.csv(url("https://caleb-huo.github.io/teaching/data/Burn/burn.csv"), row.names = 1)
head(burnData)
##   treatment gender race percent head buttock trunk upper_leg lower_leg
## 1         0      0    0      15    0       0     1         1         0
## 2         0      0    1      20    0       0     1         0         0
## 3         0      0    1      15    0       0     0         1         1
## 4         0      0    0      20    1       0     1         0         0
## 5         0      0    1      70    1       1     1         1         0
## 6         0      0    1      20    1       0     1         0         0
##   resp_tract type numfup infection
## 1          0    2     12         0
## 2          0    4      9         0
## 3          0    2      7         1
## 4          0    2     29         0
## 5          0    2      4         1
## 6          0    4      8         1

Modeling the rate

glm_poisson_rate <- glm(infection ~ . - numfup + offset(log(numfup)), data = burnData,  family = poisson())
summary(glm_poisson_rate) # display results
## 
## Call:
## glm(formula = infection ~ . - numfup + offset(log(numfup)), family = poisson(), 
##     data = burnData)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0008  -0.8023  -0.5350   1.0419   2.5011  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.603081   1.201833  -4.662 3.13e-06 ***
## treatment   -0.622350   0.316669  -1.965   0.0494 *  
## gender      -0.724283   0.396879  -1.825   0.0680 .  
## race         2.155901   1.016616   2.121   0.0340 *  
## percent      0.003402   0.009854   0.345   0.7299    
## head        -0.023392   0.353237  -0.066   0.9472    
## buttock      0.612703   0.421929   1.452   0.1465    
## trunk        0.058203   0.500184   0.116   0.9074    
## upper_leg   -0.472179   0.373669  -1.264   0.2064    
## lower_leg   -0.299164   0.387843  -0.771   0.4405    
## resp_tract  -0.042332   0.364401  -0.116   0.9075    
## type        -0.045439   0.171668  -0.265   0.7912    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 207.46  on 153  degrees of freedom
## Residual deviance: 184.46  on 142  degrees of freedom
## AIC: 304.46
## 
## Number of Fisher Scoring iterations: 7

Poisson Overdispersion

bioChemists data

mean(bioChemists$art)
## [1] 1.692896
var(bioChemists$art)
## [1] 3.709742

Negative Bionomial Regression

\[Y|\lambda \sim Poisson(\lambda)\] \[\lambda \sim gamma(\alpha, \beta)\] \[Y \sim NB(\alpha, 1/(\beta + 1))\]

Negative Binomial Distribution

The canonical link for the negative binomial distribution is rather complicated and hard to interpret, so it is rarely used. Instead to facilitate comparisons with the Poisson generalized linear model, a log link is typically used.

Negative Binomial Distribution

\[b''(\theta) = \frac{\partial^2 b}{\partial p^2} \times (\frac{\partial p}{\partial \theta})^2 + \frac{\partial b}{\partial p} \times \frac{\partial^2 p}{\partial \theta^2} = \frac{r(1-p)}{p^2} \]

\[b''(\theta) = \frac{\partial^2 b}{\partial p^2} \times (\frac{\partial p}{\partial \theta})^2 + \frac{\partial b}{\partial p} \times \frac{\partial^2 p}{\partial \theta^2} = \mu + \alpha \mu^2 \] The variance is always larger than the mean for the negative binomial.

NB is possion if \(\alpha = 0\)

Negative binomial example 1

library(MASS)
glm_nb <- glm.nb(art ~ . , data = bioChemists)
summary(glm_nb) # display results
## 
## Call:
## glm.nb(formula = art ~ ., data = bioChemists, init.theta = 2.264387695, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1678  -1.3617  -0.2806   0.4476   3.4524  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.256144   0.137348   1.865 0.062191 .  
## femWomen    -0.216418   0.072636  -2.979 0.002887 ** 
## marMarried   0.150489   0.082097   1.833 0.066791 .  
## kid5        -0.176415   0.052813  -3.340 0.000837 ***
## phd          0.015271   0.035873   0.426 0.670326    
## ment         0.029082   0.003214   9.048  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.2644) family taken to be 1)
## 
##     Null deviance: 1109.0  on 914  degrees of freedom
## Residual deviance: 1004.3  on 909  degrees of freedom
## AIC: 3135.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.264 
##           Std. Err.:  0.271 
## 
##  2 x log-likelihood:  -3121.917

Negative binomial example 2

glm_nb_rate <- glm.nb(infection ~ . - numfup + offset(log(numfup)), data = burnData)
## Warning in glm.nb(infection ~ . - numfup + offset(log(numfup)), data =
## burnData): alternation limit reached
summary(glm_nb_rate)
## 
## Call:
## glm.nb(formula = infection ~ . - numfup + offset(log(numfup)), 
##     data = burnData, init.theta = 0.6478173549, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7731  -0.8564  -0.5674   0.5065   2.3192  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.285673   1.300638  -4.064 4.83e-05 ***
## treatment   -0.840907   0.371826  -2.262   0.0237 *  
## gender      -1.037906   0.464514  -2.234   0.0255 *  
## race         2.620297   1.081204   2.423   0.0154 *  
## percent      0.005769   0.011903   0.485   0.6279    
## head        -0.208180   0.410433  -0.507   0.6120    
## buttock      0.899329   0.500276   1.798   0.0722 .  
## trunk        0.339320   0.565532   0.600   0.5485    
## upper_leg   -0.692250   0.446461  -1.551   0.1210    
## lower_leg   -0.180517   0.428152  -0.422   0.6733    
## resp_tract  -0.191947   0.437243  -0.439   0.6607    
## type        -0.149369   0.201242  -0.742   0.4579    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.6478) family taken to be 1)
## 
##     Null deviance: 149.78  on 153  degrees of freedom
## Residual deviance: 126.24  on 142  degrees of freedom
## AIC: 299.93
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.648 
##           Std. Err.:  0.246 
## Warning while fitting theta: alternation limit reached 
## 
##  2 x log-likelihood:  -273.928

Interpretation Using the Count (or Rate)

For a change of \(\Delta\) in \(x\), the expected count increases by a factor of \(\exp(\gamma \Delta)\) holding other variables \(Z = z\) constant.

generate .R file

knitr::purl("glm.rmd", output = "glm.R ", documentation = 2)
## 
## 
## processing file: glm.rmd
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |.                                                                |   2%
  |                                                                       
  |...                                                              |   4%
  |                                                                       
  |....                                                             |   6%
  |                                                                       
  |.....                                                            |   8%
  |                                                                       
  |.......                                                          |  10%
  |                                                                       
  |........                                                         |  12%
  |                                                                       
  |.........                                                        |  15%
  |                                                                       
  |...........                                                      |  17%
  |                                                                       
  |............                                                     |  19%
  |                                                                       
  |..............                                                   |  21%
  |                                                                       
  |...............                                                  |  23%
  |                                                                       
  |................                                                 |  25%
  |                                                                       
  |..................                                               |  27%
  |                                                                       
  |...................                                              |  29%
  |                                                                       
  |....................                                             |  31%
  |                                                                       
  |......................                                           |  33%
  |                                                                       
  |.......................                                          |  35%
  |                                                                       
  |........................                                         |  38%
  |                                                                       
  |..........................                                       |  40%
  |                                                                       
  |...........................                                      |  42%
  |                                                                       
  |............................                                     |  44%
  |                                                                       
  |..............................                                   |  46%
  |                                                                       
  |...............................                                  |  48%
  |                                                                       
  |................................                                 |  50%
  |                                                                       
  |..................................                               |  52%
  |                                                                       
  |...................................                              |  54%
  |                                                                       
  |.....................................                            |  56%
  |                                                                       
  |......................................                           |  58%
  |                                                                       
  |.......................................                          |  60%
  |                                                                       
  |.........................................                        |  62%
  |                                                                       
  |..........................................                       |  65%
  |                                                                       
  |...........................................                      |  67%
  |                                                                       
  |.............................................                    |  69%
  |                                                                       
  |..............................................                   |  71%
  |                                                                       
  |...............................................                  |  73%
  |                                                                       
  |.................................................                |  75%
  |                                                                       
  |..................................................               |  77%
  |                                                                       
  |...................................................              |  79%
  |                                                                       
  |.....................................................            |  81%
  |                                                                       
  |......................................................           |  83%
  |                                                                       
  |........................................................         |  85%
  |                                                                       
  |.........................................................        |  88%
  |                                                                       
  |..........................................................       |  90%
  |                                                                       
  |............................................................     |  92%
  |                                                                       
  |.............................................................    |  94%
  |                                                                       
  |..............................................................   |  96%
  |                                                                       
  |................................................................ |  98%
  |                                                                       
  |.................................................................| 100%
## output file: glm.R
## [1] "glm.R "