Zhiguang Huo (Caleb)
Wednesday October 11, 2017
Credits give to:
Linear model:
\[Y = X^\top\beta + \varepsilon\]
Linear mixed model:
## 'data.frame': 84 obs. of 5 variables:
## $ subject : Factor w/ 6 levels "F1","F2","F3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ gender : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
## $ scenario : int 1 1 2 2 3 3 4 4 5 5 ...
## $ attitude : Factor w/ 2 levels "inf","pol": 2 1 2 1 2 1 2 1 2 1 ...
## $ frequency: num 213 204 285 260 204 ...
## subject gender scenario attitude frequency scenario0
## 1 F1 F 1 pol 213.3 1
## 2 F1 F 1 inf 204.5 1
## 3 F1 F 2 pol 285.1 2
## 4 F1 F 2 inf 259.7 2
## 5 F1 F 3 pol 203.9 3
## 6 F1 F 3 inf 286.9 3
## gender
## subject F M
## F1 14 0
## F2 14 0
## F3 14 0
## M3 0 14
## M4 0 14
## M7 0 14
## [1] 39
library(ggplot2)
ggplot(aes(attitude, frequency, color = subject), data=d) + facet_grid(~subject) + geom_boxplot() + geom_jitter() + theme_bw()
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).
\[pitch \sim politeness + gender + \varepsilon\]
##
## Call:
## lm(formula = frequency ~ attitude + gender, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.409 -26.561 -4.262 24.690 100.140
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 256.762 6.756 38.006 <2e-16 ***
## attitudepol -19.553 7.833 -2.496 0.0146 *
## genderM -108.349 7.833 -13.832 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.68 on 80 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7109, Adjusted R-squared: 0.7037
## F-statistic: 98.38 on 2 and 80 DF, p-value: < 2.2e-16
Note: 1 observation deleted due to missingness
pol_subj = subset(d, attitude == "pol")
inf_subj = subset(d, attitude == "inf")
pol_subj$attitude <- NULL
inf_subj$attitude <- NULL
names(pol_subj)[names(pol_subj) == "frequency"] <- "frequency_pol"
names(inf_subj)[names(inf_subj) == "frequency"] <- "frequency_inf"
subj_merge <- merge(pol_subj, inf_subj)
head(subj_merge)
## subject gender scenario scenario0 frequency_pol frequency_inf
## 1 F1 F 1 1 213.3 204.5
## 2 F1 F 2 2 285.1 259.7
## 3 F1 F 3 3 203.9 286.9
## 4 F1 F 4 4 250.8 276.8
## 5 F1 F 5 5 231.9 252.4
## 6 F1 F 6 6 181.2 230.7
ggplot(data=subj_merge, aes(frequency_pol, frequency_inf)) +
geom_point(aes(col=subject)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(data=subj_merge, aes(frequency_pol, frequency_inf)) +
geom_point(aes(col=subject)) +
stat_ellipse(aes(col=subject)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_ellipse).
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(data=subj_merge, aes(frequency_pol, frequency_inf)) +
geom_point(aes(col=scenario0)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(data=subj_merge, aes(frequency_pol, frequency_inf)) +
geom_point(aes(col=scenario0)) +
stat_ellipse(aes(col=scenario0)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_ellipse).
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
linear model \[pitch \sim politeness + gender + \varepsilon\]
linear model with random effect (random intercept) \[pitch \sim politeness + gender + (1|subject) + \varepsilon\] \((1|subject)\) is the R syntax for a random intercept.
the formula still contains a general error term \(\varepsilon\)
##
## Call:
## lm(formula = frequency ~ attitude, data = d)
##
## Coefficients:
## (Intercept) attitudepol
## 202.59 -18.23
## Loading required package: Matrix
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
## <simpleError: No random effects terms specified in formula>
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
## Data: d
##
## REML criterion at convergence: 804.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2953 -0.6018 -0.2005 0.4774 3.1772
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 3982 63.10
## Residual 851 29.17
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 202.588 26.151 5.150 7.747 0.000501 ***
## attitudepol -19.376 6.407 76.003 -3.024 0.003399 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.121
## $subject
## (Intercept) attitudepol
## F1 241.1352 -19.37584
## F2 266.8920 -19.37584
## F3 259.5540 -19.37584
## M3 179.0262 -19.37584
## M4 155.6906 -19.37584
## M7 113.2306 -19.37584
##
## attr(,"class")
## [1] "coef.mer"
## [1] 812.7023
\[Y_{ijk} = \mu + \beta_j + \alpha_i + \varepsilon_{ijk}\]
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 826.5 836.2 -409.3 818.5 79
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3067 -0.6051 -0.2042 0.4793 3.1931
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 3308.7 57.52
## Residual 839.9 28.98
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 202.588 23.905 6.214 8.475 0.000122 ***
## attitudepol -19.372 6.365 77.003 -3.043 0.003199 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.131
REML represents restricted maximum likelihood
##
## Call:
## lm(formula = frequency ~ attitude, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.488 -62.122 9.044 51.178 105.044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 202.59 10.08 20.107 <2e-16 ***
## attitudepol -18.23 14.34 -1.272 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.3 on 81 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01958, Adjusted R-squared: 0.007475
## F-statistic: 1.618 on 1 and 81 DF, p-value: 0.2071
Without adjusting for random effect, attitude won’t be a significant predictor.
rs_subj_reml = lmer(frequency ~ attitude + (1 | subject), data = d)
anova(rs_subj_reml, ddf = "Kenward-Roger")
## Type III Analysis of Variance Table with Kenward-Roger's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## attitude 7782.8 7782.8 1 76.002 9.1456 0.003399 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## likelihood ratio test: comparing full model and reduced model
## get the p-value for reduced variable
## have to use REML = FALSE
rs_subj_ml = lmer(frequency ~ attitude + (1 | subject), REML = FALSE, data = d)
rs_null_ml = lmer(frequency ~ 1 + (1 | subject), REML = FALSE, data = d)
anova(rs_null_ml, rs_subj_ml)
## Data: d
## Models:
## rs_null_ml: frequency ~ 1 + (1 | subject)
## rs_subj_ml: frequency ~ attitude + (1 | subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## rs_null_ml 3 833.25 840.51 -413.62 827.25
## rs_subj_ml 4 826.51 836.18 -409.25 818.51 8.7444 1 0.003106 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude * gender + (1 | subject)
## Data: d
##
## REML criterion at convergence: 778.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2334 -0.5105 -0.1551 0.4738 3.0875
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 605.4 24.61
## Residual 844.1 29.05
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 260.686 15.557 4.754 16.757 2.07e-05 ***
## attitudepol -27.400 8.966 75.009 -3.056 0.00311 **
## genderM -116.195 22.000 4.754 -5.282 0.00376 **
## attitudepol:genderM 16.190 12.763 75.018 1.269 0.20852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp gendrM
## attitudepol -0.288
## genderM -0.707 0.204
## atttdpl:gnM 0.202 -0.703 -0.286
ggplot(aes(1, frequency, col=scenario0), data=d) +
geom_boxplot() +
geom_jitter() +
facet_grid(attitude ~ scenario0) +
theme_bw()
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).
rs_intergen_subjscene_ml = lmer(frequency ~ attitude + gender + (1 | subject) +
(1 | scenario), REML = FALSE, data = d)
summary(rs_intergen_subjscene_ml)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 807.1 821.6 -397.6 795.1 77
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2958 -0.6456 -0.0777 0.5449 3.5121
##
## Random effects:
## Groups Name Variance Std.Dev.
## scenario (Intercept) 205.2 14.32
## subject (Intercept) 417.1 20.42
## Residual 637.4 25.25
## Number of obs: 83, groups: scenario, 7; subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 256.847 13.828 8.497 18.575 3.55e-08 ***
## attitudepol -19.722 5.547 70.920 -3.555 0.000677 ***
## genderM -108.517 17.573 5.928 -6.175 0.000868 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.201
## genderM -0.635 0.004
## $scenario
## (Intercept) attitudepol genderM
## 1 243.4865 -19.72206 -108.5173
## 2 263.3589 -19.72206 -108.5173
## 3 268.1317 -19.72206 -108.5173
## 4 277.2536 -19.72206 -108.5173
## 5 254.9320 -19.72206 -108.5173
## 6 244.8021 -19.72206 -108.5173
## 7 245.9623 -19.72206 -108.5173
##
## $subject
## (Intercept) attitudepol genderM
## F1 243.3681 -19.72206 -108.5173
## F2 266.9444 -19.72206 -108.5173
## F3 260.2277 -19.72206 -108.5173
## M3 284.3541 -19.72206 -108.5173
## M4 262.0576 -19.72206 -108.5173
## M7 224.1286 -19.72206 -108.5173
##
## attr(,"class")
## [1] "coef.mer"
ggplot(aes(attitude, frequency, color = subject), data=d) +
geom_point() +
geom_smooth(method="lm", aes(group=1)) +
facet_wrap(~subject ) +
theme_bw()
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(data=subj_merge, aes(frequency_pol, frequency_inf, col=scenario0)) +
geom_point() +
geom_smooth(method="lm", se = F)
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
politeness.model.rs = lmer(frequency ~ attitude + gender + (1 + attitude | subject) +
(1 | scenario), REML = FALSE, data = d)
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 + attitude | subject) + (1 |
## scenario)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 811.1 830.4 -397.5 795.1 75
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2800 -0.6555 -0.0803 0.5369 3.5016
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## scenario (Intercept) 205.052 14.320
## subject (Intercept) 395.469 19.886
## attitudepol 1.251 1.118 1.00
## Residual 637.069 25.240
## Number of obs: 83, groups: scenario, 7; subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 257.802 13.683 7.975 18.842 6.75e-08 ***
## attitudepol -19.715 5.565 58.506 -3.543 0.000785 ***
## genderM -110.428 17.531 5.927 -6.299 0.000782 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.153
## genderM -0.641 0.003
## convergence code: 0
## boundary (singular) fit: see ?isSingular
## $scenario
## (Intercept) attitudepol genderM
## 1 244.4392 -19.71467 -110.4277
## 2 264.3110 -19.71467 -110.4277
## 3 269.0836 -19.71467 -110.4277
## 4 278.2053 -19.71467 -110.4277
## 5 255.8843 -19.71467 -110.4277
## 6 245.7753 -19.71467 -110.4277
## 7 246.9149 -19.71467 -110.4277
##
## $subject
## (Intercept) attitudepol genderM
## F1 243.7394 -20.50555 -110.4277
## F2 266.7317 -19.21246 -110.4277
## F3 260.1367 -19.58336 -110.4277
## M3 285.4172 -18.16159 -110.4277
## M4 263.9094 -19.37119 -110.4277
## M7 226.8772 -21.45387 -110.4277
##
## attr(,"class")
## [1] "coef.mer"
rs_gen_subjscene_con_reml = lmer(frequency ~ attitude + gender + (1 + attitude |
subject) + (1 | scenario), REML = TRUE, data = d)
## boundary (singular) fit: see ?isSingular
rs_gen_subjscene_reml = lmer(frequency ~ attitude + gender + (1 | subject) + (1 |
scenario), REML = TRUE, data = d)
anova(rs_gen_subjscene_reml, rs_gen_subjscene_con_reml)
## refitting model(s) with ML (instead of REML)
## Data: d
## Models:
## rs_gen_subjscene_reml: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
## rs_gen_subjscene_con_reml: frequency ~ attitude + gender + (1 + attitude | subject) + (1 |
## rs_gen_subjscene_con_reml: scenario)
## Df AIC BIC logLik deviance Chisq Chi Df
## rs_gen_subjscene_reml 6 807.10 821.61 -397.55 795.10
## rs_gen_subjscene_con_reml 8 811.07 830.42 -397.54 795.07 0.0275 2
## Pr(>Chisq)
## rs_gen_subjscene_reml
## rs_gen_subjscene_con_reml 0.9863
If the outcome variable is
Function lmer is used to fit linear mixed models, function glmer is used to fit generalized (non-Gaussian) linear mixed models.
sleepStudy <- read.table("https://caleb-huo.github.io/teaching/data/sleep/sleepstudy.txt", header=T, row.names = 1)
sleepStudy$Subject <- as.factor(sleepStudy$Subject)
head(sleepStudy, n=10)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
## 7 382.2038 6 308
## 8 290.1486 7 308
## 9 430.5853 8 308
## 10 466.3535 9 308
## 'data.frame': 180 obs. of 3 variables:
## $ Reaction: num 250 259 251 321 357 ...
## $ Days : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4633 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 611.90 24.737
## Days 35.08 5.923 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.824 17.005 36.843 < 2e-16 ***
## Days 10.467 1.546 16.995 6.771 3.27e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
## $Subject
## (Intercept) Days
## 308 253.6626 19.6665597
## 309 211.0108 1.8467699
## 310 212.4488 5.0177063
## 330 275.0940 5.6531411
## 331 273.6636 7.3976093
## 332 260.4439 10.1952325
## 333 268.2441 10.2438881
## 334 244.1731 11.5417935
## 335 251.0724 -0.2851939
## 337 286.2916 19.0963068
## 349 226.1971 11.6403856
## 350 238.3357 17.0815045
## 351 255.9828 7.4520035
## 352 272.2666 14.0036922
## 369 254.6802 11.3395736
## 370 225.7940 15.2895377
## 371 252.2122 9.4791130
## 372 263.7185 11.7515240
##
## attr(,"class")
## [1] "coef.mer"
##
##
## processing file: mixedModel.rmd
##
|
| | 0%
|
|. | 2%
|
|.. | 3%
|
|.... | 5%
|
|..... | 7%
|
|...... | 9%
|
|....... | 10%
|
|........ | 12%
|
|.......... | 14%
|
|........... | 16%
|
|............ | 17%
|
|............. | 19%
|
|.............. | 21%
|
|................ | 22%
|
|................. | 24%
|
|.................. | 26%
|
|................... | 28%
|
|..................... | 29%
|
|...................... | 31%
|
|....................... | 33%
|
|........................ | 34%
|
|......................... | 36%
|
|........................... | 38%
|
|............................ | 40%
|
|............................. | 41%
|
|.............................. | 43%
|
|............................... | 45%
|
|................................. | 47%
|
|.................................. | 48%
|
|................................... | 50%
|
|.................................... | 52%
|
|..................................... | 53%
|
|....................................... | 55%
|
|........................................ | 57%
|
|......................................... | 59%
|
|.......................................... | 60%
|
|........................................... | 62%
|
|............................................. | 64%
|
|.............................................. | 66%
|
|............................................... | 67%
|
|................................................ | 69%
|
|................................................. | 71%
|
|................................................... | 72%
|
|.................................................... | 74%
|
|..................................................... | 76%
|
|...................................................... | 78%
|
|........................................................ | 79%
|
|......................................................... | 81%
|
|.......................................................... | 83%
|
|........................................................... | 84%
|
|............................................................ | 86%
|
|.............................................................. | 88%
|
|............................................................... | 90%
|
|................................................................ | 91%
|
|................................................................. | 93%
|
|.................................................................. | 95%
|
|.................................................................... | 97%
|
|..................................................................... | 98%
|
|......................................................................| 100%
## output file: mixedModel.R
## [1] "mixedModel.R "