Frontiers of Biostatistics

Mendelian Randomization

Zhiguang Huo (Caleb)

Friday Feb 28, 2020

Outline

Review for Mendelian randomization

Assumptions

G: genetic variant; Y: outcome variable; X: exposure variable; U: unknown confounders

  1. Independence between \(G\) and \(U\) \[G \perp U\]
  2. Established association between \(G\) and \(X\) \[P(X|G) \ne P(X)\]
  3. No alternative pathway from \(G\) to \(Y\), (exclusion restriction) \[G \perp Y | X, U\]

Relaxed Assumptions

G: genetic variant; Y: outcome variable; X: exposure variable; U: unknown confounders; W: known confounders;

  1. Independence between \(G\) and \(U\) \[G \perp U | W\]
  2. Established association between \(G\) and \(X\) \[P(X|G, W) \ne P(X|W)\]
  3. No alternative pathway from \(G\) to \(Y\), (exclusion restriction) \[G \perp Y | X, U, W\]

Instrumental variable regression

2 Stage Least Square

\[ \hat{\boldsymbol{X}} = \boldsymbol{G} (\boldsymbol{G}^\top \boldsymbol{G})^{-1} \boldsymbol{G} \boldsymbol{X} \]

\[\boldsymbol{\beta}_{IV} = (\hat{\boldsymbol{X}}^\top \hat{\boldsymbol{X}} )^{-1} \hat{\boldsymbol{X}} ^\top \boldsymbol{Y} \]

library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
# ?ivreg

Example data: CigarettesSW

data("CigarettesSW")
head(CigarettesSW)
##   state year   cpi population    packs    income  tax     price     taxs
## 1    AL 1985 1.076    3973000 116.4863  46014968 32.5 102.18167 33.34834
## 2    AR 1985 1.076    2327000 128.5346  26210736 37.0 101.47500 37.00000
## 3    AZ 1985 1.076    3184000 104.5226  43956936 31.0 108.57875 36.17042
## 4    CA 1985 1.076   26444000 100.3630 447102816 26.0 107.83734 32.10400
## 5    CO 1985 1.076    3209000 112.9635  49466672 31.0  94.26666 31.00000
## 6    CT 1985 1.076    3201000 109.2784  60063368 42.0 128.02499 51.48333

CigarettesSW

CigarettesSW$rprice <- with(CigarettesSW, price/cpi)
CigarettesSW$rincome <- with(CigarettesSW, income/population/cpi)
CigarettesSW$tdiff <- with(CigarettesSW, (taxs - tax)/cpi)
CigarettesSW$ratio_tc <- with(CigarettesSW, tax/cpi)

Goal: examine whether higher cigarette price (price) could reduce smoking (pack)

Valid instrument

Assumptoin 2: Established association between \(G\) and \(X\) \[P(X|G, W) \ne P(X|W)\]

alm <- lm(log(rprice) ~ tdiff + log(rincome), data = CigarettesSW) 
## alm <- lm(log(rprice) ~ ratio_tc + log(rincome), data = CigarettesSW) 
summary(alm)
## 
## Call:
## lm(formula = log(rprice) ~ tdiff + log(rincome), data = CigarettesSW)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.219336 -0.056866  0.001329  0.062840  0.245333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.384429   0.157936  21.429  < 2e-16 ***
## tdiff        0.031206   0.003187   9.790 5.68e-16 ***
## log(rincome) 0.443583   0.061291   7.237 1.28e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08865 on 93 degrees of freedom
## Multiple R-squared:  0.6733, Adjusted R-squared:  0.6662 
## F-statistic: 95.82 on 2 and 93 DF,  p-value: < 2.2e-16

2SLS Model fitting

ivreg(y ~ x + w | w + g)

\[ \hat{\boldsymbol{X}} = \boldsymbol{G} (\boldsymbol{G}^\top \boldsymbol{G})^{-1} \boldsymbol{G} \boldsymbol{X} \]

\[\boldsymbol{\beta}_{IV} = (\hat{\boldsymbol{X}}^\top \hat{\boldsymbol{X}} )^{-1} \hat{\boldsymbol{X}} ^\top \boldsymbol{Y} \]

2SLS Model fitting

fm <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + ratio_tc,
  data = CigarettesSW)
summary(fm)
## 
## Call:
## ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
##     tdiff + ratio_tc, data = CigarettesSW)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61378 -0.07232  0.01435  0.09486  0.37504 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.7365     0.5687  17.122  < 2e-16 ***
## log(rprice)   -1.2291     0.1552  -7.922 4.94e-12 ***
## log(rincome)   0.2568     0.1434   1.791   0.0765 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1654 on 93 degrees of freedom
## Multiple R-Squared: 0.5486,  Adjusted R-squared: 0.5389 
## Wald test: 39.93 on 2 and 93 DF,  p-value: 3.025e-13

2SLS Model diagnostic

summary(fm, vcov = sandwich, df = Inf, diagnostics = TRUE)
## 
## Call:
## ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
##     tdiff + ratio_tc, data = CigarettesSW)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61378 -0.07232  0.01435  0.09486  0.37504 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    9.7365     0.5060  19.243  < 2e-16 ***
## log(rprice)   -1.2291     0.1522  -8.078 6.59e-16 ***
## log(rincome)   0.2568     0.1503   1.709   0.0874 .  
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   2  92   145.502  <2e-16 ***
## Wu-Hausman         1  92     2.204   0.141    
## Sargan             1  NA     0.018   0.893    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1654 on Inf degrees of freedom
## Multiple R-Squared: 0.5486,  Adjusted R-squared: 0.5389 
## Wald test: 101.5 on 2 DF,  p-value: < 2.2e-16

2SLS Model diagnostic

Two sample MR

(Bowden et al. 2015), each dot represents the effects from one instrumental variable

(Bowden et al. 2015), each dot represents the effects from one instrumental variable

Two sample MR

library(TwoSampleMR)
## TwoSampleMR 0.5.1
## [>] IMPORTANT: Some datasets have been updated, and some are disabled while we update them.
##     Apologies for this inconvenience, they will be back up as soon as possible.
##     See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for more information
## [>] To temporarily revert to the previous database, see here:
##     https://mrcieu.github.io/TwoSampleMR/articles/gwas2020.html

Goal: investigate whether higher BMI (exposure) causes Coronary heart disease (outcome)

Datasets

ao <- available_outcomes()
## API: public: http://gwas-api.mrcieu.ac.uk/
head(subset(ao, select=c(trait, id)))
## # A tibble: 6 x 2
##   trait                                                 id                 
##   <chr>                                                 <chr>              
## 1 ENSG00000134884                                       eqtl-a-ENSG0000013…
## 2 Operative procedures - main OPCS: B27.4 Total mastec… ukb-b-20278        
## 3 ENSG00000170464                                       eqtl-a-ENSG0000017…
## 4 ENSG00000090263                                       eqtl-a-ENSG0000009…
## 5 Lamina-associated polypeptide 2, isoforms beta/gamma  prot-a-3021        
## 6 ENSG00000251637                                       eqtl-a-ENSG0000025…

Get instruments and exposure variable (BMI)

bmi_exp_dat0 <- extract_instruments("ieu-a-2")
head(bmi_exp_dat0)
##   pval.exposure samplesize.exposure chr.exposure se.exposure beta.exposure
## 1   2.18198e-08              339152            1      0.0030       -0.0168
## 2   4.56773e-11              339065            1      0.0031        0.0201
## 3   5.05941e-14              313621            1      0.0087        0.0659
## 4   1.93299e-08              294030            3      0.0033        0.0183
## 5   2.53501e-10              337750            6      0.0033        0.0209
## 6   4.52376e-31              339197            6      0.0038        0.0444
##   pos.exposure id.exposure        SNP effect_allele.exposure
## 1     47684677     ieu-a-2   rs977747                      G
## 2     78048331     ieu-a-2 rs17381664                      C
## 3    110082886     ieu-a-2  rs7550711                      T
## 4     81792112     ieu-a-2  rs3849570                      A
## 5     34828553     ieu-a-2  rs6457796                      C
## 6     50865820     ieu-a-2   rs943005                      T
##   other_allele.exposure eaf.exposure                      exposure
## 1                     T       0.5333 Body mass index || id:ieu-a-2
## 2                     T       0.4250 Body mass index || id:ieu-a-2
## 3                     C       0.0339 Body mass index || id:ieu-a-2
## 4                     C       0.3667 Body mass index || id:ieu-a-2
## 5                     T       0.2583 Body mass index || id:ieu-a-2
## 6                     C       0.1000 Body mass index || id:ieu-a-2
##   mr_keep.exposure pval_origin.exposure data_source.exposure
## 1             TRUE             reported                  igd
## 2             TRUE             reported                  igd
## 3             TRUE             reported                  igd
## 4             TRUE             reported                  igd
## 5             TRUE             reported                  igd
## 6             TRUE             reported                  igd
dim(bmi_exp_dat0)
## [1] 79 15

Exposure data explanation

The minimum information required for MR analysis is the following:

Data clumping

bmi_exp_dat <- clump_data(bmi_exp_dat0)
## Warning: 'clump_data' is deprecated.
## Use 'ieugwasr::ld_clump()' instead.
## See help("Deprecated")
## Warning: since v0.4.2 the default r2 value for clumping has changed from 0.01 to 0.001
## Clumping ieu-a-2, 79 variants
## Removing 1 of 79 variants due to LD with other variants or absence from LD reference panel
dim(bmi_exp_dat)
## [1] 78 15

Outcome data

chd_out_dat <- extract_outcome_data(snps=bmi_exp_dat$SNP, outcomes="ieu-a-7")
## Extracting data for 78 SNP(s) from 1 GWAS(s)
head(chd_out_dat)
##          SNP chr       pos beta.outcome se.outcome samplesize.outcome
## 1   rs977747   1  47684677    -0.013896  0.0096146             184305
## 2 rs17203016   2 208255518     0.020352  0.0122315             184305
## 3  rs7550711   1 110082886    -0.048354  0.0305635             184305
## 4  rs9374842   6 120185665     0.013555  0.0106053             184305
## 5  rs4740619   9  15634326    -0.006546  0.0092497             184305
## 6  rs2112347   5  75015242    -0.005855  0.0096581             184305
##   pval.outcome eaf.outcome effect_allele.outcome other_allele.outcome
## 1    0.1483750    0.550761                     G                    T
## 2    0.0961324    0.186737                     G                    A
## 3    0.1136310    0.027650                     T                    C
## 4    0.2012030    0.723965                     T                    C
## 5    0.4791310    0.467784                     C                    T
## 6    0.5443640    0.401399                     G                    T
##                                outcome id.outcome   originalname.outcome
## 1 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 2 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 3 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 4 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 5 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 6 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
##               outcome.deprecated mr_keep.outcome data_source.outcome
## 1 Coronary heart disease ||  ||             TRUE                 igd
## 2 Coronary heart disease ||  ||             TRUE                 igd
## 3 Coronary heart disease ||  ||             TRUE                 igd
## 4 Coronary heart disease ||  ||             TRUE                 igd
## 5 Coronary heart disease ||  ||             TRUE                 igd
## 6 Coronary heart disease ||  ||             TRUE                 igd

Harmonise data

This means that the effect of a SNP on the exposure and the effect of that SNP on the outcome must each correspond to the same allele.

dat <- harmonise_data(
    exposure_dat = bmi_exp_dat, 
    outcome_dat = chd_out_dat
)
## Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
head(dat)
##          SNP effect_allele.exposure other_allele.exposure
## 1  rs1000940                      G                     A
## 2 rs10132280                      A                     C
## 3  rs1016287                      C                     T
## 4 rs10182181                      G                     A
## 5 rs10733682                      G                     A
## 6 rs10840100                      G                     A
##   effect_allele.outcome other_allele.outcome beta.exposure beta.outcome
## 1                     G                    A        0.0184    -0.014538
## 2                     A                    C       -0.0221    -0.012169
## 3                     C                    T       -0.0228    -0.014087
## 4                     G                    A        0.0309     0.018295
## 5                     G                    A       -0.0188    -0.004541
## 6                     G                    A        0.0206     0.014599
##   eaf.exposure eaf.outcome remove palindromic ambiguous id.outcome chr
## 1       0.2250    0.337103  FALSE       FALSE     FALSE    ieu-a-7  17
## 2       0.3333    0.282164  FALSE       FALSE     FALSE    ieu-a-7  14
## 3       0.6750    0.699031  FALSE       FALSE     FALSE    ieu-a-7   2
## 4       0.5000    0.473525  FALSE       FALSE     FALSE    ieu-a-7   2
## 5       0.5750    0.491116  FALSE       FALSE     FALSE    ieu-a-7   9
## 6       0.7250    0.607618  FALSE       FALSE     FALSE    ieu-a-7  11
##         pos se.outcome samplesize.outcome pval.outcome
## 1   5283252  0.0098395             184305    0.1395380
## 2  25928179  0.0106640             184305    0.2538160
## 3  59305625  0.0105738             184305    0.1827780
## 4  25150296  0.0092790             184305    0.0486486
## 5 129460914  0.0095245             184305    0.6335250
## 6   8669437  0.0095234             184305    0.1252860
##                                outcome   originalname.outcome
## 1 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 2 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 3 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 4 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 5 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 6 Coronary heart disease || id:ieu-a-7 Coronary heart disease
##               outcome.deprecated mr_keep.outcome data_source.outcome
## 1 Coronary heart disease ||  ||             TRUE                 igd
## 2 Coronary heart disease ||  ||             TRUE                 igd
## 3 Coronary heart disease ||  ||             TRUE                 igd
## 4 Coronary heart disease ||  ||             TRUE                 igd
## 5 Coronary heart disease ||  ||             TRUE                 igd
## 6 Coronary heart disease ||  ||             TRUE                 igd
##   pval.exposure samplesize.exposure chr.exposure se.exposure pos.exposure
## 1   1.81201e-08              338903           17      0.0033      5283252
## 2   1.40088e-11              338856           14      0.0033     25928179
## 3   4.35512e-12              339033            2      0.0033     59305625
## 4   8.07049e-26              338829            2      0.0029     25150296
## 5   2.45499e-10              336886            9      0.0030    129460914
## 6   6.66653e-12              339135           11      0.0030      8669437
##   id.exposure                      exposure mr_keep.exposure
## 1     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 2     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 3     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 4     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 5     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 6     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
##   pval_origin.exposure data_source.exposure action mr_keep
## 1             reported                  igd      2    TRUE
## 2             reported                  igd      2    TRUE
## 3             reported                  igd      2    TRUE
## 4             reported                  igd      2    TRUE
## 5             reported                  igd      2    TRUE
## 6             reported                  igd      2    TRUE

Perform MR

res_single <- mr_singlesnp(dat)
res <- mr(dat, method_list=c("mr_egger_regression", "mr_ivw"))
## Analysing 'ieu-a-2' on 'ieu-a-7'
res
##   id.exposure id.outcome                              outcome
## 1     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7
## 2     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7
##                        exposure                    method nsnp         b
## 1 Body mass index || id:ieu-a-2                  MR Egger   78 0.4926008
## 2 Body mass index || id:ieu-a-2 Inverse variance weighted   78 0.4505027
##           se         pval
## 1 0.14436715 1.035793e-03
## 2 0.05913769 2.579139e-14

References