Zhiguang Huo (Caleb)
Wednesday October 28, 2020
\[Y = X^\top\beta + \varepsilon\]
## 'data.frame': 84 obs. of 5 variables:
## $ subject : Factor w/ 6 levels "F1","F2","F3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ gender : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
## $ scenario : int 1 1 2 2 3 3 4 4 5 5 ...
## $ attitude : Factor w/ 2 levels "inf","pol": 2 1 2 1 2 1 2 1 2 1 ...
## $ frequency: num 213 204 285 260 204 ...
## subject gender scenario attitude frequency scenario0
## 1 F1 F 1 pol 213.3 1
## 2 F1 F 1 inf 204.5 1
## 3 F1 F 2 pol 285.1 2
## 4 F1 F 2 inf 259.7 2
## 5 F1 F 3 pol 203.9 3
## 6 F1 F 3 inf 286.9 3
## gender
## subject F M
## F1 14 0
## F2 14 0
## F3 14 0
## M3 0 14
## M4 0 14
## M7 0 14
## [1] 39
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 1.0.1
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).
\[frequency \sim \mu + politeness + \varepsilon\]
##
## Call:
## lm(formula = frequency ~ attitude, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.488 -62.122 9.044 51.178 105.044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 202.59 10.08 20.107 <2e-16 ***
## attitudepol -18.23 14.34 -1.272 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.3 on 81 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01958, Adjusted R-squared: 0.007475
## F-statistic: 1.618 on 1 and 81 DF, p-value: 0.2071
Note:
\[frequency_{ijk} \sim \mu + politeness_{k} + \varepsilon_{ijk},\] where \(i\) is a subject index, \(j\) is a scenario index, and \(k\) is the politeness index.
Independence:
subj_spread <- d %>% spread(attitude, frequency)
subj_spread$scenario0 <- as.factor(subj_spread$scenario)
head(subj_spread)
## subject gender scenario scenario0 inf pol
## 1 F1 F 1 1 204.5 213.3
## 2 F1 F 2 2 259.7 285.1
## 3 F1 F 3 3 286.9 203.9
## 4 F1 F 4 4 276.8 250.8
## 5 F1 F 5 5 252.4 231.9
## 6 F1 F 6 6 230.7 181.2
ggplot(data=subj_spread, aes(pol, inf)) + theme_bw() +
geom_point(aes(col=subject)) +
stat_ellipse(aes(col=subject)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_ellipse).
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot(data=subj_spread, aes(pol, inf)) + theme_bw() +
geom_point(aes(col=scenario0)) +
stat_ellipse(aes(col=scenario0)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_ellipse).
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
When to use random effects?
alm <- lm(frequency ~ attitude, data = d)
suppressMessages(library(lme4))
suppressMessages(library(lmerTest)) ## this package helps provide a p-value
tryCatch({lmer(frequency ~ attitude, data = d)}, error=function(e) print(e))
## <simpleError: No random effects terms specified in formula>
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
## Data: d
##
## REML criterion at convergence: 804.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2953 -0.6018 -0.2005 0.4774 3.1772
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 3982 63.10
## Residual 851 29.17
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 202.588 26.151 5.150 7.747 0.000501 ***
## attitudepol -19.376 6.407 76.003 -3.024 0.003399 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.121
\[Y_{ijk} = \mu + \gamma \mathbb{I}(k=pol) + \alpha_i + \varepsilon_{ijk}\]
## (Intercept) attitudepol
## 202.5881 -18.2320
## $subject
## (Intercept) attitudepol
## F1 241.1352 -19.37584
## F2 266.8920 -19.37584
## F3 259.5540 -19.37584
## M3 179.0262 -19.37584
## M4 155.6906 -19.37584
## M7 113.2306 -19.37584
##
## attr(,"class")
## [1] "coef.mer"
## [1] 812.7023
linear model \[frequency \sim 1 + politeness + \varepsilon\]
linear model with random effect (random intercept) \[frequency \sim 1 + politeness +(1|subject) + \varepsilon\] \((1|subject)\) is the R syntax for a random intercept.
the formula still contains a general error term \(\varepsilon\)
## Warning: Removed 1 rows containing missing values (geom_point).
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 826.5 836.2 -409.3 818.5 79
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3067 -0.6051 -0.2042 0.4793 3.1931
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 3308.7 57.52
## Residual 839.9 28.98
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 202.588 23.905 6.214 8.475 0.000122 ***
## attitudepol -19.372 6.365 77.003 -3.043 0.003199 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.131
REML represents restricted maximum likelihood
##
## Call:
## lm(formula = frequency ~ attitude, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.488 -62.122 9.044 51.178 105.044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 202.59 10.08 20.107 <2e-16 ***
## attitudepol -18.23 14.34 -1.272 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.3 on 81 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01958, Adjusted R-squared: 0.007475
## F-statistic: 1.618 on 1 and 81 DF, p-value: 0.2071
Without adjusting for random effect, attitude won’t be a significant predictor.
rs_subj_ml = lmer(frequency ~ attitude + (1 | subject), REML = FALSE, data = d)
rs_null_ml = lmer(frequency ~ 1 + (1 | subject), REML = FALSE, data = d)
anova(rs_null_ml, rs_subj_ml)
## Data: d
## Models:
## rs_null_ml: frequency ~ 1 + (1 | subject)
## rs_subj_ml: frequency ~ attitude + (1 | subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## rs_null_ml 3 833.25 840.51 -413.62 827.25
## rs_subj_ml 4 826.51 836.18 -409.25 818.51 8.7444 1 0.003106 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude * gender + (1 | subject)
## Data: d
##
## REML criterion at convergence: 778.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2334 -0.5105 -0.1551 0.4738 3.0875
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 605.4 24.61
## Residual 844.1 29.05
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 260.686 15.557 4.754 16.757 2.07e-05 ***
## attitudepol -27.400 8.966 75.009 -3.056 0.00311 **
## genderM -116.195 22.000 4.754 -5.282 0.00376 **
## attitudepol:genderM 16.190 12.763 75.018 1.269 0.20852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp gendrM
## attitudepol -0.288
## genderM -0.707 0.204
## atttdpl:gnM 0.202 -0.703 -0.286
rs_intergen_subjscene_ml = lmer(frequency ~ attitude + gender + (1 | subject) +
(1 | scenario), REML = FALSE, data = d)
summary(rs_intergen_subjscene_ml)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 807.1 821.6 -397.6 795.1 77
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2958 -0.6456 -0.0777 0.5449 3.5121
##
## Random effects:
## Groups Name Variance Std.Dev.
## scenario (Intercept) 205.2 14.32
## subject (Intercept) 417.1 20.42
## Residual 637.4 25.25
## Number of obs: 83, groups: scenario, 7; subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 256.847 13.828 8.497 18.575 3.55e-08 ***
## attitudepol -19.722 5.547 70.920 -3.555 0.000677 ***
## genderM -108.517 17.573 5.928 -6.175 0.000868 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.201
## genderM -0.635 0.004
## $scenario
## (Intercept) attitudepol genderM
## 1 243.4865 -19.72206 -108.5173
## 2 263.3589 -19.72206 -108.5173
## 3 268.1317 -19.72206 -108.5173
## 4 277.2536 -19.72206 -108.5173
## 5 254.9320 -19.72206 -108.5173
## 6 244.8021 -19.72206 -108.5173
## 7 245.9623 -19.72206 -108.5173
##
## $subject
## (Intercept) attitudepol genderM
## F1 243.3681 -19.72206 -108.5173
## F2 266.9444 -19.72206 -108.5173
## F3 260.2277 -19.72206 -108.5173
## M3 284.3541 -19.72206 -108.5173
## M4 262.0576 -19.72206 -108.5173
## M7 224.1286 -19.72206 -108.5173
##
## attr(,"class")
## [1] "coef.mer"
\[Y_{ijk} = (\mu + \alpha_i) + \gamma \mathbb{I}(k=pol) + \varepsilon_{ijk}\]
\[Y_{ijk} = (\mu + \alpha_i) + (\gamma + \eta_i) \mathbb{I}(k=pol) + \varepsilon_{ijk}\]
## boundary (singular) fit: see ?isSingular
## Warning: Removed 1 rows containing missing values (geom_point).
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 + attitude | subject)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 820.3 837.2 -403.2 806.3 76
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3848 -0.5270 -0.2221 0.5179 3.2462
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 359.959 18.97
## attitudepol 1.564 1.25 1.00
## Residual 839.418 28.97
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 257.501 12.396 6.275 20.774 5.06e-07 ***
## attitudepol -19.408 6.384 63.996 -3.040 0.003422 **
## genderM -109.826 17.167 6.022 -6.398 0.000677 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.205
## genderM -0.692 0.004
## convergence code: 0
## boundary (singular) fit: see ?isSingular
## $subject
## (Intercept) attitudepol genderM
## F1 244.1800 -20.28634 -109.8262
## F2 266.1361 -18.83927 -109.8262
## F3 259.8312 -19.25481 -109.8262
## M3 283.4803 -17.69615 -109.8262
## M4 263.7697 -18.99523 -109.8262
## M7 227.6097 -21.37845 -109.8262
##
## attr(,"class")
## [1] "coef.mer"
rs_gen_subjscene_con_reml = lmer(frequency ~ attitude + gender + (1 + attitude |
subject), REML = TRUE, data = d)
## boundary (singular) fit: see ?isSingular
rs_gen_subjscene_reml = lmer(frequency ~ attitude + gender + (1 | subject), REML = TRUE, data = d)
anova(rs_gen_subjscene_reml, rs_gen_subjscene_con_reml)
## refitting model(s) with ML (instead of REML)
## Data: d
## Models:
## rs_gen_subjscene_reml: frequency ~ attitude + gender + (1 | subject)
## rs_gen_subjscene_con_reml: frequency ~ attitude + gender + (1 + attitude | subject)
## Df AIC BIC logLik deviance Chisq Chi Df
## rs_gen_subjscene_reml 5 816.34 828.43 -403.17 806.34
## rs_gen_subjscene_con_reml 7 820.31 837.24 -403.15 806.31 0.0282 2
## Pr(>Chisq)
## rs_gen_subjscene_reml
## rs_gen_subjscene_con_reml 0.986
If the outcome variable is
Function lmer is used to fit linear mixed models, function glmer is used to fit generalized (non-Gaussian) linear mixed models.
sleepStudy <- read.table("https://caleb-huo.github.io/teaching/data/sleep/sleepstudy.txt", header=T, row.names = 1)
sleepStudy$Subject <- as.factor(sleepStudy$Subject)
head(sleepStudy, n=10)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
## 7 382.2038 6 308
## 8 290.1486 7 308
## 9 430.5853 8 308
## 10 466.3535 9 308
## 'data.frame': 180 obs. of 3 variables:
## $ Reaction: num 250 259 251 321 357 ...
## $ Days : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1786.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2257 -0.5529 0.0109 0.5188 4.2506
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1378.2 37.12
## Residual 960.5 30.99
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.4051 9.7467 22.8102 25.79 <2e-16 ***
## Days 10.4673 0.8042 161.0000 13.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.371
fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy)
#fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ## same as this one
summary(fm1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 + Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4633 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 611.90 24.737
## Days 35.08 5.923 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.824 17.005 36.843 < 2e-16 ***
## Days 10.467 1.546 16.995 6.771 3.27e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
## $Subject
## (Intercept) Days
## 308 253.6626 19.6665597
## 309 211.0108 1.8467699
## 310 212.4488 5.0177063
## 330 275.0940 5.6531411
## 331 273.6636 7.3976093
## 332 260.4439 10.1952325
## 333 268.2441 10.2438881
## 334 244.1731 11.5417935
## 335 251.0724 -0.2851939
## 337 286.2916 19.0963068
## 349 226.1971 11.6403856
## 350 238.3357 17.0815045
## 351 255.9828 7.4520035
## 352 272.2666 14.0036922
## 369 254.6802 11.3395736
## 370 225.7940 15.2895377
## 371 252.2122 9.4791130
## 372 263.7185 11.7515240
##
## attr(,"class")
## [1] "coef.mer"
Credits give to: