Zhiguang Huo (Caleb)
Wednesday October 27, 2021
\[Y = X^\top\beta + \varepsilon\]
## 'data.frame': 84 obs. of 5 variables:
## $ subject : chr "F1" "F1" "F1" "F1" ...
## $ gender : chr "F" "F" "F" "F" ...
## $ scenario : int 1 1 2 2 3 3 4 4 5 5 ...
## $ attitude : chr "pol" "inf" "pol" "inf" ...
## $ frequency: num 213 204 285 260 204 ...
## subject gender scenario attitude frequency scenario0
## 1 F1 F 1 pol 213.3 1
## 2 F1 F 1 inf 204.5 1
## 3 F1 F 2 pol 285.1 2
## 4 F1 F 2 inf 259.7 2
## 5 F1 F 3 pol 203.9 3
## 6 F1 F 3 inf 286.9 3
## gender
## subject F M
## F1 14 0
## F2 14 0
## F3 14 0
## M3 0 14
## M4 0 14
## M7 0 14
which(apply(d,1,function(x) any(is.na(x)))) ## one subject contails missing data (missing outcome variable)
## [1] 39
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
ggplot(aes(attitude, frequency, color = subject), data=d) + facet_grid(~subject) + geom_boxplot() + geom_jitter() + theme_bw()
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).
\[frequency \sim \mu + politeness + \varepsilon\]
##
## Call:
## lm(formula = frequency ~ attitude, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.488 -62.122 9.044 51.178 105.044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 202.59 10.08 20.107 <2e-16 ***
## attitudepol -18.23 14.34 -1.272 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.3 on 81 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01958, Adjusted R-squared: 0.007475
## F-statistic: 1.618 on 1 and 81 DF, p-value: 0.2071
Note:
\[frequency_{ijk} \sim \mu + politeness_{k} + \varepsilon_{ijk},\] where \(i\) is a subject index, \(j\) is a scenario index, and \(k\) is the politeness index.
Dependent structure:
subj_spread <- d %>% spread(attitude, frequency)
subj_spread$scenario0 <- as.factor(subj_spread$scenario)
head(subj_spread)
## subject gender scenario scenario0 inf pol
## 1 F1 F 1 1 204.5 213.3
## 2 F1 F 2 2 259.7 285.1
## 3 F1 F 3 3 286.9 203.9
## 4 F1 F 4 4 276.8 250.8
## 5 F1 F 5 5 252.4 231.9
## 6 F1 F 6 6 230.7 181.2
ggplot(data=subj_spread, aes(pol, inf)) + theme_bw() +
geom_point(aes(col=subject)) +
stat_ellipse(aes(col=subject)) +
geom_smooth(method="lm")
## Warning: Removed 1 rows containing non-finite values (stat_ellipse).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
When to use random effects?
alm <- lm(frequency ~ attitude, data = d)
suppressMessages(library(lme4))
suppressMessages(library(lmerTest)) ## this package helps provide a p-value
lmer(frequency ~ attitude, data = d)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
## Data: d
##
## REML criterion at convergence: 804.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2953 -0.6018 -0.2005 0.4774 3.1772
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 3982 63.10
## Residual 851 29.17
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 202.588 26.151 5.150 7.747 0.000501 ***
## attitudepol -19.376 6.407 76.003 -3.024 0.003399 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.121
\[Y_{ijk} = \mu + \gamma \mathbb{I}(k=pol) + \alpha_i + \varepsilon_{ijk}\]
## (Intercept) attitudepol
## 202.5881 -18.2320
## $subject
## (Intercept) attitudepol
## F1 241.1352 -19.37584
## F2 266.8920 -19.37584
## F3 259.5540 -19.37584
## M3 179.0262 -19.37584
## M4 155.6906 -19.37584
## M7 113.2306 -19.37584
##
## attr(,"class")
## [1] "coef.mer"
## [1] 812.7023
linear model \[frequency \sim 1 + politeness + \varepsilon\]
linear model with random effect (random intercept) \[frequency \sim 1 + politeness +(1|subject) + \varepsilon\] \((1|subject)\) is the R syntax for a random intercept.
intercepts are different for each subject
“\(1\)” stands for the intercept in R
this formula indicates multiple responses per subject
resolves the non-independence multiple responses by the same subject
the formula still contains a general error term \(\varepsilon\)
## Warning: Removed 1 rows containing missing values (geom_point).
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 826.5 836.2 -409.3 818.5 79
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3067 -0.6051 -0.2042 0.4793 3.1931
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 3308.7 57.52
## Residual 839.9 28.98
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 202.588 23.905 6.214 8.475 0.000122 ***
## attitudepol -19.372 6.365 77.003 -3.043 0.003199 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## attitudepol -0.131
REML represents restricted maximum likelihood
##
## Call:
## lm(formula = frequency ~ attitude, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.488 -62.122 9.044 51.178 105.044
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 202.59 10.08 20.107 <2e-16 ***
## attitudepol -18.23 14.34 -1.272 0.207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.3 on 81 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01958, Adjusted R-squared: 0.007475
## F-statistic: 1.618 on 1 and 81 DF, p-value: 0.2071
Without adjusting for random effect, attitude won’t be a significant predictor.
rs_subj_ml = lmer(frequency ~ attitude + (1 | subject), REML = FALSE, data = d)
rs_null_ml = lmer(frequency ~ 1 + (1 | subject), REML = FALSE, data = d)
anova(rs_null_ml, rs_subj_ml)
## Data: d
## Models:
## rs_null_ml: frequency ~ 1 + (1 | subject)
## rs_subj_ml: frequency ~ attitude + (1 | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## rs_null_ml 3 833.25 840.51 -413.62 827.25
## rs_subj_ml 4 826.51 836.18 -409.25 818.51 8.7444 1 0.003106 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 | subject)
## Data: d
##
## REML criterion at convergence: 786.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3619 -0.5305 -0.1724 0.4647 3.2260
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 603.9 24.57
## Residual 850.9 29.17
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 256.691 15.226 4.378 16.859 3.78e-05 ***
## attitudepol -19.410 6.407 76.018 -3.030 0.00334 **
## genderM -108.205 21.063 4.009 -5.137 0.00677 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.210
## genderM -0.692 0.004
\[Y_{ijk} = (\mu + \alpha_i) + \gamma \mathbb{I}(k=pol) + \varepsilon_{ijk}\]
\[Y_{ijk} = (\mu + \alpha_i) + (\gamma + \eta_i) \mathbb{I}(k=pol) + \varepsilon_{ijk}\]
## boundary (singular) fit: see ?isSingular
## Warning: Removed 1 rows containing missing values (geom_point).
politeness.model.rs = lmer(frequency ~ attitude + gender + (1 + attitude | subject), REML = FALSE, data = d)
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 + attitude | subject)
## Data: d
##
## AIC BIC logLik deviance df.resid
## 820.3 837.2 -403.2 806.3 76
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3848 -0.5270 -0.2221 0.5179 3.2462
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 359.95 18.972
## attitudepol 1.57 1.253 1.00
## Residual 839.41 28.973
## Number of obs: 83, groups: subject, 6
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 257.503 12.396 6.274 20.773 5.06e-07 ***
## attitudepol -19.408 6.384 63.951 -3.040 0.003423 **
## genderM -109.829 17.167 6.021 -6.398 0.000677 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) atttdp
## attitudepol -0.205
## genderM -0.692 0.004
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular
## $subject
## (Intercept) attitudepol genderM
## F1 244.1807 -20.28814 -109.8295
## F2 266.1358 -18.83824 -109.8295
## F3 259.8311 -19.25459 -109.8295
## M3 283.4821 -17.69271 -109.8295
## M4 263.7729 -18.99429 -109.8295
## M7 227.6144 -21.38216 -109.8295
##
## attr(,"class")
## [1] "coef.mer"
rs_gen_subjscene_con_reml = lmer(frequency ~ attitude + gender + (1 + attitude |
subject), REML = TRUE, data = d)
## boundary (singular) fit: see ?isSingular
rs_gen_subjscene_reml = lmer(frequency ~ attitude + gender + (1 | subject), REML = TRUE, data = d)
anova(rs_gen_subjscene_reml, rs_gen_subjscene_con_reml)
## refitting model(s) with ML (instead of REML)
## Data: d
## Models:
## rs_gen_subjscene_reml: frequency ~ attitude + gender + (1 | subject)
## rs_gen_subjscene_con_reml: frequency ~ attitude + gender + (1 + attitude | subject)
## npar AIC BIC logLik deviance Chisq Df
## rs_gen_subjscene_reml 5 816.34 828.43 -403.17 806.34
## rs_gen_subjscene_con_reml 7 820.31 837.24 -403.15 806.31 0.0282 2
## Pr(>Chisq)
## rs_gen_subjscene_reml
## rs_gen_subjscene_con_reml 0.986
If the outcome variable is
Function lmer is used to fit linear mixed models, function glmer is used to fit generalized (non-Gaussian) linear mixed models.
sleepStudy <- read.table("https://caleb-huo.github.io/teaching/data/sleep/sleepstudy.txt", header=T, row.names = 1)
sleepStudy$Subject <- as.factor(sleepStudy$Subject)
head(sleepStudy, n=10)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
## 7 382.2038 6 308
## 8 290.1486 7 308
## 9 430.5853 8 308
## 10 466.3535 9 308
## 'data.frame': 180 obs. of 3 variables:
## $ Reaction: num 250 259 251 321 357 ...
## $ Days : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
ggplot(aes(Days, Reaction, col=Subject), data = sleepStudy) + geom_point() + geom_smooth(method="lm") + facet_wrap(~Subject)
## `geom_smooth()` using formula 'y ~ x'
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1786.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2257 -0.5529 0.0109 0.5188 4.2506
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1378.2 37.12
## Residual 960.5 30.99
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.4051 9.7467 22.8102 25.79 <2e-16 ***
## Days 10.4673 0.8042 161.0000 13.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.371
fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy)
#fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ## same as this one
summary(fm1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 + Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.10 24.741
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.825 17.000 36.838 < 2e-16 ***
## Days 10.467 1.546 17.000 6.771 3.26e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
## $Subject
## (Intercept) Days
## 308 253.6637 19.6662617
## 309 211.0064 1.8476053
## 310 212.4447 5.0184295
## 330 275.0957 5.6529356
## 331 273.6654 7.3973743
## 332 260.4447 10.1951090
## 333 268.2456 10.2436499
## 334 244.1725 11.5418676
## 335 251.0714 -0.2848792
## 337 286.2956 19.0955511
## 349 226.1949 11.6407181
## 350 238.3351 17.0815038
## 351 255.9830 7.4520239
## 352 272.2688 14.0032871
## 369 254.6806 11.3395008
## 370 225.7921 15.2897709
## 371 252.2122 9.4791297
## 372 263.7197 11.7513080
##
## attr(,"class")
## [1] "coef.mer"
Credits give to: