voice pitch data

d = read.csv("https://caleb-huo.github.io/teaching/data/politeness/politeness_data.csv")
str(d)

## 'data.frame':    84 obs. of  5 variables:
##  $ subject  : Factor w/ 6 levels "F1","F2","F3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gender   : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ scenario : int  1 1 2 2 3 3 4 4 5 5 ...
##  $ attitude : Factor w/ 2 levels "inf","pol": 2 1 2 1 2 1 2 1 2 1 ...
##  $ frequency: num  213 204 285 260 204 ...

d$attitude <- as.factor(d$attitude)
d$scenario0 <- as.factor(d$scenario)

subject: (6 subjects)
gender: (F/M)
scenario: different scenario (7 categories)
attitude:
- inf: informal situations (e.g., explaining to a friend why you’re late)
- pol: politeness (e.g., giving an excuse for being late to a professor)
frequency: outcome variable, frequency of voice pitch

voice pitch data

head(d)

##   subject gender scenario attitude frequency scenario0
## 1      F1      F        1      pol     213.3         1
## 2      F1      F        1      inf     204.5         1
## 3      F1      F        2      pol     285.1         2
## 4      F1      F        2      inf     259.7         2
## 5      F1      F        3      pol     203.9         3
## 6      F1      F        3      inf     286.9         3

with(d, table(subject, gender))

##        gender
## subject  F  M
##      F1 14  0
##      F2 14  0
##      F3 14  0
##      M3  0 14
##      M4  0 14
##      M7  0 14

which(apply(d,1,function(x) any(is.na(x)))) ## one subject contails missing data (missing outcome variable)

## [1] 39

visualize the data

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   1.0.1
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

ggplot(aes(attitude, frequency, color = subject), data=d) + facet_grid(~subject) + geom_boxplot() + geom_jitter() + theme_bw()

## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

## Warning: Removed 1 rows containing missing values (geom_point).

Fit linear model

Outcome variable: pitch (frequency)
primary predictor: attitude (politeness)
\(\varepsilon\): deviations from our predictions due to random factors that we cannot control experimentally

\[frequency \sim \mu + politeness + \varepsilon\]

Linear model in R

lmFit <- lm(frequency ~ attitude, data = d)
summary(lmFit)

## 
## Call:
## lm(formula = frequency ~ attitude, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.488  -62.122    9.044   51.178  105.044 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   202.59      10.08  20.107   <2e-16 ***
## attitudepol   -18.23      14.34  -1.272    0.207    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 65.3 on 81 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.01958,    Adjusted R-squared:  0.007475 
## F-statistic: 1.618 on 1 and 81 DF,  p-value: 0.2071

Note:

\(1\) observation deleted due to missingness
attitude is not a significant predictor

Problem with linear model

\[frequency_{ijk} \sim \mu + politeness_{k} + \varepsilon_{ijk},\] where \(i\) is a subject index, \(j\) is a scenario index, and \(k\) is the politeness index.

Independence:

Multiple responses from the same subject \(i\) cannot be regarded as independent from each other.
- \(frequency_{i1k}, frequency_{i2k}, \ldots, frequency_{iJk}\) should be correlated
Under the same scenario \(j\), people should also react similarly, not totally independent.
- \(frequency_{1jk}, frequency_{2jk}, \ldots, frequency_{Ijk}\) should be correlated

data independence

subj_spread <- d %>% spread(attitude, frequency)
subj_spread$scenario0 <- as.factor(subj_spread$scenario)
head(subj_spread)

##   subject gender scenario scenario0   inf   pol
## 1      F1      F        1         1 204.5 213.3
## 2      F1      F        2         2 259.7 285.1
## 3      F1      F        3         3 286.9 203.9
## 4      F1      F        4         4 276.8 250.8
## 5      F1      F        5         5 252.4 231.9
## 6      F1      F        6         6 230.7 181.2

Subject dependence

ggplot(data=subj_spread, aes(pol, inf)) + theme_bw() +
  geom_point(aes(col=subject)) + 
  stat_ellipse(aes(col=subject)) +
  geom_smooth(method="lm")

## Warning: Removed 1 rows containing non-finite values (stat_ellipse).

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

scenario independence

ggplot(data=subj_spread, aes(pol, inf)) + theme_bw() +
  geom_point(aes(col=scenario0)) + 
  stat_ellipse(aes(col=scenario0)) +
  geom_smooth(method="lm")

## Warning: Removed 1 rows containing non-finite values (stat_ellipse).

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing missing values (geom_point).

Random effects:

Fixed effects:
- Categorical variables with well-defined categories (e.g., gender, race)
Random effects:
- We only sample a subset of the entire population
  - The 6 participants in the data are a subset of the entire population
  - Each participant has repeated measurement.
  - The “group” effect \(\alpha_i\) for subject \(i\) is best thought of as random: \(\alpha_i \sim N(0, \sigma_0^2)\)

When to use random effects?

A “group” effect is random if we can think of the levels we observe in that group to be samples from a larger population.
Example: we are investigating the GPA of UF undergraduate students by surveying a small group of students.
- Gender and race are considered fixed effect since their categories are well-defined.
- Department is considered as random effect, since our survey may not covering all departments in UF.

Mixed model

Linear model is “fixed-effects only” model
- one or more fixed effects
- a general error term \(\varepsilon\)
- error term is considered no structure.
Mixed model is a mixture of fixed effect and random effect
- Random effect: error term is considered to be correlated.
- Repeated measurement from the same subject are correlated.

Benefit of random effect models

Allows us to resolve this non-independence data structure for each subject
Allows missing data (missing outcome variable)

Fit linear mixed model in R

In R, lmer() function in lme4 package will do the job

alm <- lm(frequency ~ attitude, data = d)
suppressMessages(library(lme4))
suppressMessages(library(lmerTest)) ## this package helps provide a p-value
tryCatch({lmer(frequency ~ attitude, data = d)}, error=function(e) print(e))

## <simpleError: No random effects terms specified in formula>

We have to specify random effect when using lmer funciton.

Fit linear mixed model in R

rs_subj_reml = lmer(frequency ~ attitude + (1 | subject), data = d)
summary(rs_subj_reml)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
##    Data: d
## 
## REML criterion at convergence: 804.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2953 -0.6018 -0.2005  0.4774  3.1772 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 3982     63.10   
##  Residual              851     29.17   
## Number of obs: 83, groups:  subject, 6
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  202.588     26.151   5.150   7.747 0.000501 ***
## attitudepol  -19.376      6.407  76.003  -3.024 0.003399 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## attitudepol -0.121

How to write the linear mixed model mathmatically

\[Y_{ijk} = \mu + \gamma \mathbb{I}(k=pol) + \alpha_i + \varepsilon_{ijk}\]

\(i\): subject index
\(j\): scenario index
\(k\): attitude index
\(\mu\): intercept
\(\gamma\): attitude coefficient of the polite group
\(\alpha_i \sim N(0, \sigma^2_\alpha)\): random intercept for the \(i^{th}\) subject
\(\varepsilon_{ijk} \sim N(0, \sigma^2)\):
Total error term: \(\alpha_i + \varepsilon_{ijk}\)
- so the error for \(i1k\) and \(i2k\) are dependent
If there is no \(\alpha_i\), the above model will reduce to linear regression model.

Further explore rs_subj_reml

coef(alm)

## (Intercept) attitudepol 
##    202.5881    -18.2320

coef(rs_subj_reml)

## $subject
##    (Intercept) attitudepol
## F1    241.1352   -19.37584
## F2    266.8920   -19.37584
## F3    259.5540   -19.37584
## M3    179.0262   -19.37584
## M4    155.6906   -19.37584
## M7    113.2306   -19.37584
## 
## attr(,"class")
## [1] "coef.mer"

AIC(rs_subj_reml)

## [1] 812.7023

Further explore rs_subj_reml

Subject intercept: \[\mu + \alpha_i\]
- vary among different subjects
Attitude effect \[\beta_j\]
- same among different subjects

Random Intercepts

linear model \[frequency \sim 1 + politeness + \varepsilon\]
linear model with random effect (random intercept) \[frequency \sim 1 + politeness +(1|subject) + \varepsilon\] \((1|subject)\) is the R syntax for a random intercept.
intercepts are different for each subject
“\(1\)” stands for the intercept in R
this formula indicates multiple responses per subject
resolves the non-independence multiple responses by the same subject
the formula still contains a general error term \(\varepsilon\)

Visualization for the random intercept

## Warning: Removed 1 rows containing missing values (geom_point).

Fit linear mixed model in R (REML = FALSE)

rs_subj_ml = lmer(frequency ~ attitude + (1 | subject), REML = FALSE, data = d)
summary(rs_subj_ml)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: frequency ~ attitude + (1 | subject)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    826.5    836.2   -409.3    818.5       79 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3067 -0.6051 -0.2042  0.4793  3.1931 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 3308.7   57.52   
##  Residual              839.9   28.98   
## Number of obs: 83, groups:  subject, 6
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  202.588     23.905   6.214   8.475 0.000122 ***
## attitudepol  -19.372      6.365  77.003  -3.043 0.003199 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## attitudepol -0.131

REML option

REML represents restricted maximum likelihood

if REML=FALSE, maximum likelihood estimation will be used (biased estimation)
if REML=TRUE, its estimation does not base on a maximum likelihood, but instead uses a likelihood function calculated from a transformed set of data (unbiased estimation)

Compare to linear model

summary(lm(frequency ~ attitude, data = d))

## 
## Call:
## lm(formula = frequency ~ attitude, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.488  -62.122    9.044   51.178  105.044 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   202.59      10.08  20.107   <2e-16 ***
## attitudepol   -18.23      14.34  -1.272    0.207    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 65.3 on 81 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.01958,    Adjusted R-squared:  0.007475 
## F-statistic: 1.618 on 1 and 81 DF,  p-value: 0.2071

Without adjusting for random effect, attitude won’t be a significant predictor.

Alternative ways to get p-values (likelihood ratio test)

likelihood ratio test: comparing full model and reduced model
get the p-value for reduced variable
need to use REML = FALSE, otherwise, anova will re-fit the reml model.

rs_subj_ml = lmer(frequency ~ attitude + (1 | subject), REML = FALSE, data = d)
rs_null_ml = lmer(frequency ~ 1 + (1 | subject), REML = FALSE, data = d)

anova(rs_null_ml, rs_subj_ml)

## Data: d
## Models:
## rs_null_ml: frequency ~ 1 + (1 | subject)
## rs_subj_ml: frequency ~ attitude + (1 | subject)
##            Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)   
## rs_null_ml  3 833.25 840.51 -413.62   827.25                            
## rs_subj_ml  4 826.51 836.18 -409.25   818.51 8.7444      1   0.003106 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

add another covariate gender and its interaction with attitude

summary(lmer(frequency ~ attitude * gender +  (1 | subject), data = d))

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ attitude * gender + (1 | subject)
##    Data: d
## 
## REML criterion at convergence: 778.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2334 -0.5105 -0.1551  0.4738  3.0875 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 605.4    24.61   
##  Residual             844.1    29.05   
## Number of obs: 83, groups:  subject, 6
## 
## Fixed effects:
##                     Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)          260.686     15.557    4.754  16.757 2.07e-05 ***
## attitudepol          -27.400      8.966   75.009  -3.056  0.00311 ** 
## genderM             -116.195     22.000    4.754  -5.282  0.00376 ** 
## attitudepol:genderM   16.190     12.763   75.018   1.269  0.20852    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) atttdp gendrM
## attitudepol -0.288              
## genderM     -0.707  0.204       
## atttdpl:gnM  0.202 -0.703 -0.286

Two random effects (two random intercepts)

rs_intergen_subjscene_ml = lmer(frequency ~ attitude + gender + (1 | subject) + 
    (1 | scenario), REML = FALSE, data = d)
summary(rs_intergen_subjscene_ml)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 | subject) + (1 | scenario)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    807.1    821.6   -397.6    795.1       77 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2958 -0.6456 -0.0777  0.5449  3.5121 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  scenario (Intercept) 205.2    14.32   
##  subject  (Intercept) 417.1    20.42   
##  Residual             637.4    25.25   
## Number of obs: 83, groups:  scenario, 7; subject, 6
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  256.847     13.828    8.497  18.575 3.55e-08 ***
## attitudepol  -19.722      5.547   70.920  -3.555 0.000677 ***
## genderM     -108.517     17.573    5.928  -6.175 0.000868 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) atttdp
## attitudepol -0.201       
## genderM     -0.635  0.004

Two random effects (intercepts)

coef(rs_intergen_subjscene_ml)

## $scenario
##   (Intercept) attitudepol   genderM
## 1    243.4865   -19.72206 -108.5173
## 2    263.3589   -19.72206 -108.5173
## 3    268.1317   -19.72206 -108.5173
## 4    277.2536   -19.72206 -108.5173
## 5    254.9320   -19.72206 -108.5173
## 6    244.8021   -19.72206 -108.5173
## 7    245.9623   -19.72206 -108.5173
## 
## $subject
##    (Intercept) attitudepol   genderM
## F1    243.3681   -19.72206 -108.5173
## F2    266.9444   -19.72206 -108.5173
## F3    260.2277   -19.72206 -108.5173
## M3    284.3541   -19.72206 -108.5173
## M4    262.0576   -19.72206 -108.5173
## M7    224.1286   -19.72206 -108.5173
## 
## attr(,"class")
## [1] "coef.mer"

Randon intercept and random slope

Random intercept only:

\[Y_{ijk} = (\mu + \alpha_i) + \gamma \mathbb{I}(k=pol) + \varepsilon_{ijk}\]

Random intercept and random slope:

\[Y_{ijk} = (\mu + \alpha_i) + (\gamma + \eta_i) \mathbb{I}(k=pol) + \varepsilon_{ijk}\]

\((\alpha_i, \eta_i)^\top \sim N(0, \Sigma)\): random intercept and random slope for the \(i^{th}\) subject
\(\varepsilon_{ijk} \sim N(0, \sigma^2)\):

Visualization for the random slope

## boundary (singular) fit: see ?isSingular

## Warning: Removed 1 rows containing missing values (geom_point).

Random Slopes in R

politeness.model.rs = lmer(frequency ~  attitude + gender + (1 + attitude | subject), REML = FALSE, data = d)

## boundary (singular) fit: see ?isSingular

summary(politeness.model.rs)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: frequency ~ attitude + gender + (1 + attitude | subject)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##    820.3    837.2   -403.2    806.3       76 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3848 -0.5270 -0.2221  0.5179  3.2462 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  subject  (Intercept) 359.959  18.97        
##           attitudepol   1.564   1.25    1.00
##  Residual             839.418  28.97        
## Number of obs: 83, groups:  subject, 6
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  257.501     12.396    6.275  20.774 5.06e-07 ***
## attitudepol  -19.408      6.384   63.996  -3.040 0.003422 ** 
## genderM     -109.826     17.167    6.022  -6.398 0.000677 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) atttdp
## attitudepol -0.205       
## genderM     -0.692  0.004
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Random Slopes in R

coef(politeness.model.rs)

## $subject
##    (Intercept) attitudepol   genderM
## F1    244.1800   -20.28634 -109.8262
## F2    266.1361   -18.83927 -109.8262
## F3    259.8312   -19.25481 -109.8262
## M3    283.4803   -17.69615 -109.8262
## M4    263.7697   -18.99523 -109.8262
## M7    227.6097   -21.37845 -109.8262
## 
## attr(,"class")
## [1] "coef.mer"

Is random slope necessary

rs_gen_subjscene_con_reml = lmer(frequency ~ attitude + gender + (1 + attitude | 
    subject), REML = TRUE, data = d)

## boundary (singular) fit: see ?isSingular

rs_gen_subjscene_reml = lmer(frequency ~ attitude + gender + (1 | subject), REML = TRUE, data = d)

anova(rs_gen_subjscene_reml, rs_gen_subjscene_con_reml)

## refitting model(s) with ML (instead of REML)

## Data: d
## Models:
## rs_gen_subjscene_reml: frequency ~ attitude + gender + (1 | subject)
## rs_gen_subjscene_con_reml: frequency ~ attitude + gender + (1 + attitude | subject)
##                           Df    AIC    BIC  logLik deviance  Chisq Chi Df
## rs_gen_subjscene_reml      5 816.34 828.43 -403.17   806.34              
## rs_gen_subjscene_con_reml  7 820.31 837.24 -403.15   806.31 0.0282      2
##                           Pr(>Chisq)
## rs_gen_subjscene_reml               
## rs_gen_subjscene_con_reml      0.986

it appears that we don’t need to include random slope for condition in the model
because the likelihood ratio test is not significant
however, others would argue that we should keep our models maximal

Beyond linear mixed model

If the outcome variable is

Continuous: use a linear regression model with mixed effects
Binary: use a logistic regression model with mixed effects
Count: use a Poisson/negative binomial regression model with mixed effects

Function lmer is used to fit linear mixed models, function glmer is used to fit generalized (non-Gaussian) linear mixed models.

Another example sleep study

sleepStudy <- read.table("https://caleb-huo.github.io/teaching/data/sleep/sleepstudy.txt", header=T, row.names = 1)
sleepStudy$Subject <- as.factor(sleepStudy$Subject)

head(sleepStudy, n=10)

##    Reaction Days Subject
## 1  249.5600    0     308
## 2  258.7047    1     308
## 3  250.8006    2     308
## 4  321.4398    3     308
## 5  356.8519    4     308
## 6  414.6901    5     308
## 7  382.2038    6     308
## 8  290.1486    7     308
## 9  430.5853    8     308
## 10 466.3535    9     308

str(sleepStudy)

## 'data.frame':    180 obs. of  3 variables:
##  $ Reaction: num  250 259 251 321 357 ...
##  $ Days    : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...

Sleep deprivation data

This laboratory experiment measured the effect of sleep deprivation on cognitive performance.
There were 18 subjects, chosen from the population of interest (long-distance truck drivers), in the 10 day trial. These subjects were restricted to 3 hours sleep per night during the trial.
On each day of the trial each subject’s reaction time was measured. The reaction time shown here is the average of several measurements.
These data are balanced in that each subject is measured the same number of times and on the same occasions.

Longitudinal data

Longitudinal data are repeated measures data in which the observations are taken over time.
We wish to characterize the response over time within subjects and the variation in the time trends between subjects.

Data exploratory

ggplot(aes(Days, Reaction, col=Subject), data = sleepStudy) + geom_point() + geom_smooth(method="lm") + facet_wrap(~Subject)

Fit linear mixed model in R

Examine the relationship of Reaction time with respect to Days
Each subject has repeated measurement of Reaction time
Consider subject specific random intercept only

fm1 <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)
summary(fm1)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1786.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2257 -0.5529  0.0109  0.5188  4.2506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1378.2   37.12   
##  Residual              960.5   30.99   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept) 251.4051     9.7467  22.8102   25.79   <2e-16 ***
## Days         10.4673     0.8042 161.0000   13.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.371

Fit linear mixed model in R

Examine the relationship of Reaction time with respect to Days
Each subject has repeated measurement of Reaction time
Consider both subject specific random intercept and subject specific random slope

fm1 <- lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy)
#fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ## same as this one
summary(fm1)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 + Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4633  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 611.90   24.737       
##           Days         35.08    5.923   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.824  17.005  36.843  < 2e-16 ***
## Days          10.467      1.546  16.995   6.771 3.27e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

coef

coef(fm1)

## $Subject
##     (Intercept)       Days
## 308    253.6626 19.6665597
## 309    211.0108  1.8467699
## 310    212.4488  5.0177063
## 330    275.0940  5.6531411
## 331    273.6636  7.3976093
## 332    260.4439 10.1952325
## 333    268.2441 10.2438881
## 334    244.1731 11.5417935
## 335    251.0724 -0.2851939
## 337    286.2916 19.0963068
## 349    226.1971 11.6403856
## 350    238.3357 17.0815045
## 351    255.9828  7.4520035
## 352    272.2666 14.0036922
## 369    254.6802 11.3395736
## 370    225.7940 15.2895377
## 371    252.2122  9.4791130
## 372    263.7185 11.7515240
## 
## attr(,"class")
## [1] "coef.mer"

Biostatistical Computing, PHC 6068

Linear Mixed Model

From linear model to linear mixed model

voice pitch data

voice pitch data

visualize the data

Fit linear model

Linear model in R

Problem with linear model

data independence

Subject dependence

scenario independence

Random effects:

Mixed model

Benefit of random effect models

Fit linear mixed model in R

Fit linear mixed model in R

How to write the linear mixed model mathmatically

Further explore rs_subj_reml

Further explore rs_subj_reml

Random Intercepts

Visualization for the random intercept

Fit linear mixed model in R (REML = FALSE)

REML option

Compare to linear model

Alternative ways to get p-values (likelihood ratio test)

add another covariate gender and its interaction with attitude

Two random effects (two random intercepts)

Two random effects (intercepts)

Randon intercept and random slope

Visualization for the random slope

Random Slopes in R

Random Slopes in R

Is random slope necessary

Beyond linear mixed model

Another example sleep study

Sleep deprivation data

Longitudinal data

Data exploratory

Fit linear mixed model in R

Fit linear mixed model in R

coef

References: