Zhiguang Huo (Caleb)
Friday Feb 28, 2020
G: genetic variant; Y: outcome variable; X: exposure variable; U: unknown confounders
G: genetic variant; Y: outcome variable; X: exposure variable; U: unknown confounders; W: known confounders;
\[ \hat{\boldsymbol{X}} = \boldsymbol{G} (\boldsymbol{G}^\top \boldsymbol{G})^{-1} \boldsymbol{G} \boldsymbol{X} \]
\[\boldsymbol{\beta}_{IV} = (\hat{\boldsymbol{X}}^\top \hat{\boldsymbol{X}} )^{-1} \hat{\boldsymbol{X}} ^\top \boldsymbol{Y} \]
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
##   state year   cpi population    packs    income  tax     price     taxs
## 1    AL 1985 1.076    3973000 116.4863  46014968 32.5 102.18167 33.34834
## 2    AR 1985 1.076    2327000 128.5346  26210736 37.0 101.47500 37.00000
## 3    AZ 1985 1.076    3184000 104.5226  43956936 31.0 108.57875 36.17042
## 4    CA 1985 1.076   26444000 100.3630 447102816 26.0 107.83734 32.10400
## 5    CO 1985 1.076    3209000 112.9635  49466672 31.0  94.26666 31.00000
## 6    CT 1985 1.076    3201000 109.2784  60063368 42.0 128.02499 51.48333
CigarettesSW$rprice <- with(CigarettesSW, price/cpi)
CigarettesSW$rincome <- with(CigarettesSW, income/population/cpi)
CigarettesSW$tdiff <- with(CigarettesSW, (taxs - tax)/cpi)
CigarettesSW$ratio_tc <- with(CigarettesSW, tax/cpi)Goal: examine whether higher cigarette price (price) could reduce smoking (pack)
Assumptoin 2: Established association between \(G\) and \(X\) \[P(X|G, W) \ne P(X|W)\]
alm <- lm(log(rprice) ~ tdiff + log(rincome), data = CigarettesSW) 
## alm <- lm(log(rprice) ~ ratio_tc + log(rincome), data = CigarettesSW) 
summary(alm)## 
## Call:
## lm(formula = log(rprice) ~ tdiff + log(rincome), data = CigarettesSW)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.219336 -0.056866  0.001329  0.062840  0.245333 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.384429   0.157936  21.429  < 2e-16 ***
## tdiff        0.031206   0.003187   9.790 5.68e-16 ***
## log(rincome) 0.443583   0.061291   7.237 1.28e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08865 on 93 degrees of freedom
## Multiple R-squared:  0.6733, Adjusted R-squared:  0.6662 
## F-statistic: 95.82 on 2 and 93 DF,  p-value: < 2.2e-16
\[ \hat{\boldsymbol{X}} = \boldsymbol{G} (\boldsymbol{G}^\top \boldsymbol{G})^{-1} \boldsymbol{G} \boldsymbol{X} \]
\[\boldsymbol{\beta}_{IV} = (\hat{\boldsymbol{X}}^\top \hat{\boldsymbol{X}} )^{-1} \hat{\boldsymbol{X}} ^\top \boldsymbol{Y} \]
fm <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + ratio_tc,
  data = CigarettesSW)
summary(fm)## 
## Call:
## ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
##     tdiff + ratio_tc, data = CigarettesSW)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61378 -0.07232  0.01435  0.09486  0.37504 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.7365     0.5687  17.122  < 2e-16 ***
## log(rprice)   -1.2291     0.1552  -7.922 4.94e-12 ***
## log(rincome)   0.2568     0.1434   1.791   0.0765 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1654 on 93 degrees of freedom
## Multiple R-Squared: 0.5486,  Adjusted R-squared: 0.5389 
## Wald test: 39.93 on 2 and 93 DF,  p-value: 3.025e-13
## 
## Call:
## ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) + 
##     tdiff + ratio_tc, data = CigarettesSW)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61378 -0.07232  0.01435  0.09486  0.37504 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    9.7365     0.5060  19.243  < 2e-16 ***
## log(rprice)   -1.2291     0.1522  -8.078 6.59e-16 ***
## log(rincome)   0.2568     0.1503   1.709   0.0874 .  
## 
## Diagnostic tests:
##                  df1 df2 statistic p-value    
## Weak instruments   2  92   145.502  <2e-16 ***
## Wu-Hausman         1  92     2.204   0.141    
## Sargan             1  NA     0.018   0.893    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1654 on Inf degrees of freedom
## Multiple R-Squared: 0.5486,  Adjusted R-squared: 0.5389 
## Wald test: 101.5 on 2 DF,  p-value: < 2.2e-16
(Bowden et al. 2015), each dot represents the effects from one instrumental variable
## TwoSampleMR 0.5.1
## [>] IMPORTANT: Some datasets have been updated, and some are disabled while we update them.
##     Apologies for this inconvenience, they will be back up as soon as possible.
##     See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for more information
## [>] To temporarily revert to the previous database, see here:
##     https://mrcieu.github.io/TwoSampleMR/articles/gwas2020.html
Goal: investigate whether higher BMI (exposure) causes Coronary heart disease (outcome)
## API: public: http://gwas-api.mrcieu.ac.uk/
## # A tibble: 6 x 2
##   trait                                                 id                 
##   <chr>                                                 <chr>              
## 1 ENSG00000134884                                       eqtl-a-ENSG0000013…
## 2 Operative procedures - main OPCS: B27.4 Total mastec… ukb-b-20278        
## 3 ENSG00000170464                                       eqtl-a-ENSG0000017…
## 4 ENSG00000090263                                       eqtl-a-ENSG0000009…
## 5 Lamina-associated polypeptide 2, isoforms beta/gamma  prot-a-3021        
## 6 ENSG00000251637                                       eqtl-a-ENSG0000025…
##   pval.exposure samplesize.exposure chr.exposure se.exposure beta.exposure
## 1   2.18198e-08              339152            1      0.0030       -0.0168
## 2   4.56773e-11              339065            1      0.0031        0.0201
## 3   5.05941e-14              313621            1      0.0087        0.0659
## 4   1.93299e-08              294030            3      0.0033        0.0183
## 5   2.53501e-10              337750            6      0.0033        0.0209
## 6   4.52376e-31              339197            6      0.0038        0.0444
##   pos.exposure id.exposure        SNP effect_allele.exposure
## 1     47684677     ieu-a-2   rs977747                      G
## 2     78048331     ieu-a-2 rs17381664                      C
## 3    110082886     ieu-a-2  rs7550711                      T
## 4     81792112     ieu-a-2  rs3849570                      A
## 5     34828553     ieu-a-2  rs6457796                      C
## 6     50865820     ieu-a-2   rs943005                      T
##   other_allele.exposure eaf.exposure                      exposure
## 1                     T       0.5333 Body mass index || id:ieu-a-2
## 2                     T       0.4250 Body mass index || id:ieu-a-2
## 3                     C       0.0339 Body mass index || id:ieu-a-2
## 4                     C       0.3667 Body mass index || id:ieu-a-2
## 5                     T       0.2583 Body mass index || id:ieu-a-2
## 6                     C       0.1000 Body mass index || id:ieu-a-2
##   mr_keep.exposure pval_origin.exposure data_source.exposure
## 1             TRUE             reported                  igd
## 2             TRUE             reported                  igd
## 3             TRUE             reported                  igd
## 4             TRUE             reported                  igd
## 5             TRUE             reported                  igd
## 6             TRUE             reported                  igd
## [1] 79 15
The minimum information required for MR analysis is the following:
## Warning: 'clump_data' is deprecated.
## Use 'ieugwasr::ld_clump()' instead.
## See help("Deprecated")
## Warning: since v0.4.2 the default r2 value for clumping has changed from 0.01 to 0.001
## Clumping ieu-a-2, 79 variants
## Removing 1 of 79 variants due to LD with other variants or absence from LD reference panel
## [1] 78 15
## Extracting data for 78 SNP(s) from 1 GWAS(s)
##          SNP chr       pos beta.outcome se.outcome samplesize.outcome
## 1   rs977747   1  47684677    -0.013896  0.0096146             184305
## 2 rs17203016   2 208255518     0.020352  0.0122315             184305
## 3  rs7550711   1 110082886    -0.048354  0.0305635             184305
## 4  rs9374842   6 120185665     0.013555  0.0106053             184305
## 5  rs4740619   9  15634326    -0.006546  0.0092497             184305
## 6  rs2112347   5  75015242    -0.005855  0.0096581             184305
##   pval.outcome eaf.outcome effect_allele.outcome other_allele.outcome
## 1    0.1483750    0.550761                     G                    T
## 2    0.0961324    0.186737                     G                    A
## 3    0.1136310    0.027650                     T                    C
## 4    0.2012030    0.723965                     T                    C
## 5    0.4791310    0.467784                     C                    T
## 6    0.5443640    0.401399                     G                    T
##                                outcome id.outcome   originalname.outcome
## 1 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 2 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 3 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 4 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 5 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
## 6 Coronary heart disease || id:ieu-a-7    ieu-a-7 Coronary heart disease
##               outcome.deprecated mr_keep.outcome data_source.outcome
## 1 Coronary heart disease ||  ||             TRUE                 igd
## 2 Coronary heart disease ||  ||             TRUE                 igd
## 3 Coronary heart disease ||  ||             TRUE                 igd
## 4 Coronary heart disease ||  ||             TRUE                 igd
## 5 Coronary heart disease ||  ||             TRUE                 igd
## 6 Coronary heart disease ||  ||             TRUE                 igd
This means that the effect of a SNP on the exposure and the effect of that SNP on the outcome must each correspond to the same allele.
## Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
##          SNP effect_allele.exposure other_allele.exposure
## 1  rs1000940                      G                     A
## 2 rs10132280                      A                     C
## 3  rs1016287                      C                     T
## 4 rs10182181                      G                     A
## 5 rs10733682                      G                     A
## 6 rs10840100                      G                     A
##   effect_allele.outcome other_allele.outcome beta.exposure beta.outcome
## 1                     G                    A        0.0184    -0.014538
## 2                     A                    C       -0.0221    -0.012169
## 3                     C                    T       -0.0228    -0.014087
## 4                     G                    A        0.0309     0.018295
## 5                     G                    A       -0.0188    -0.004541
## 6                     G                    A        0.0206     0.014599
##   eaf.exposure eaf.outcome remove palindromic ambiguous id.outcome chr
## 1       0.2250    0.337103  FALSE       FALSE     FALSE    ieu-a-7  17
## 2       0.3333    0.282164  FALSE       FALSE     FALSE    ieu-a-7  14
## 3       0.6750    0.699031  FALSE       FALSE     FALSE    ieu-a-7   2
## 4       0.5000    0.473525  FALSE       FALSE     FALSE    ieu-a-7   2
## 5       0.5750    0.491116  FALSE       FALSE     FALSE    ieu-a-7   9
## 6       0.7250    0.607618  FALSE       FALSE     FALSE    ieu-a-7  11
##         pos se.outcome samplesize.outcome pval.outcome
## 1   5283252  0.0098395             184305    0.1395380
## 2  25928179  0.0106640             184305    0.2538160
## 3  59305625  0.0105738             184305    0.1827780
## 4  25150296  0.0092790             184305    0.0486486
## 5 129460914  0.0095245             184305    0.6335250
## 6   8669437  0.0095234             184305    0.1252860
##                                outcome   originalname.outcome
## 1 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 2 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 3 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 4 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 5 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 6 Coronary heart disease || id:ieu-a-7 Coronary heart disease
##               outcome.deprecated mr_keep.outcome data_source.outcome
## 1 Coronary heart disease ||  ||             TRUE                 igd
## 2 Coronary heart disease ||  ||             TRUE                 igd
## 3 Coronary heart disease ||  ||             TRUE                 igd
## 4 Coronary heart disease ||  ||             TRUE                 igd
## 5 Coronary heart disease ||  ||             TRUE                 igd
## 6 Coronary heart disease ||  ||             TRUE                 igd
##   pval.exposure samplesize.exposure chr.exposure se.exposure pos.exposure
## 1   1.81201e-08              338903           17      0.0033      5283252
## 2   1.40088e-11              338856           14      0.0033     25928179
## 3   4.35512e-12              339033            2      0.0033     59305625
## 4   8.07049e-26              338829            2      0.0029     25150296
## 5   2.45499e-10              336886            9      0.0030    129460914
## 6   6.66653e-12              339135           11      0.0030      8669437
##   id.exposure                      exposure mr_keep.exposure
## 1     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 2     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 3     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 4     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 5     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
## 6     ieu-a-2 Body mass index || id:ieu-a-2             TRUE
##   pval_origin.exposure data_source.exposure action mr_keep
## 1             reported                  igd      2    TRUE
## 2             reported                  igd      2    TRUE
## 3             reported                  igd      2    TRUE
## 4             reported                  igd      2    TRUE
## 5             reported                  igd      2    TRUE
## 6             reported                  igd      2    TRUE
## Analysing 'ieu-a-2' on 'ieu-a-7'
##   id.exposure id.outcome                              outcome
## 1     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7
## 2     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7
##                        exposure                    method nsnp         b
## 1 Body mass index || id:ieu-a-2                  MR Egger   78 0.4926008
## 2 Body mass index || id:ieu-a-2 Inverse variance weighted   78 0.4505027
##           se         pval
## 1 0.14436715 1.035793e-03
## 2 0.05913769 2.579139e-14