Zhiguang Huo (Caleb)
Friday Feb 28, 2020
G: genetic variant; Y: outcome variable; X: exposure variable; U: unknown confounders
G: genetic variant; Y: outcome variable; X: exposure variable; U: unknown confounders; W: known confounders;
\[ \hat{\boldsymbol{X}} = \boldsymbol{G} (\boldsymbol{G}^\top \boldsymbol{G})^{-1} \boldsymbol{G} \boldsymbol{X} \]
\[\boldsymbol{\beta}_{IV} = (\hat{\boldsymbol{X}}^\top \hat{\boldsymbol{X}} )^{-1} \hat{\boldsymbol{X}} ^\top \boldsymbol{Y} \]
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
## state year cpi population packs income tax price taxs
## 1 AL 1985 1.076 3973000 116.4863 46014968 32.5 102.18167 33.34834
## 2 AR 1985 1.076 2327000 128.5346 26210736 37.0 101.47500 37.00000
## 3 AZ 1985 1.076 3184000 104.5226 43956936 31.0 108.57875 36.17042
## 4 CA 1985 1.076 26444000 100.3630 447102816 26.0 107.83734 32.10400
## 5 CO 1985 1.076 3209000 112.9635 49466672 31.0 94.26666 31.00000
## 6 CT 1985 1.076 3201000 109.2784 60063368 42.0 128.02499 51.48333
CigarettesSW$rprice <- with(CigarettesSW, price/cpi)
CigarettesSW$rincome <- with(CigarettesSW, income/population/cpi)
CigarettesSW$tdiff <- with(CigarettesSW, (taxs - tax)/cpi)
CigarettesSW$ratio_tc <- with(CigarettesSW, tax/cpi)
Goal: examine whether higher cigarette price (price) could reduce smoking (pack)
Assumptoin 2: Established association between \(G\) and \(X\) \[P(X|G, W) \ne P(X|W)\]
alm <- lm(log(rprice) ~ tdiff + log(rincome), data = CigarettesSW)
## alm <- lm(log(rprice) ~ ratio_tc + log(rincome), data = CigarettesSW)
summary(alm)
##
## Call:
## lm(formula = log(rprice) ~ tdiff + log(rincome), data = CigarettesSW)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.219336 -0.056866 0.001329 0.062840 0.245333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.384429 0.157936 21.429 < 2e-16 ***
## tdiff 0.031206 0.003187 9.790 5.68e-16 ***
## log(rincome) 0.443583 0.061291 7.237 1.28e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08865 on 93 degrees of freedom
## Multiple R-squared: 0.6733, Adjusted R-squared: 0.6662
## F-statistic: 95.82 on 2 and 93 DF, p-value: < 2.2e-16
\[ \hat{\boldsymbol{X}} = \boldsymbol{G} (\boldsymbol{G}^\top \boldsymbol{G})^{-1} \boldsymbol{G} \boldsymbol{X} \]
\[\boldsymbol{\beta}_{IV} = (\hat{\boldsymbol{X}}^\top \hat{\boldsymbol{X}} )^{-1} \hat{\boldsymbol{X}} ^\top \boldsymbol{Y} \]
fm <- ivreg(log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + ratio_tc,
data = CigarettesSW)
summary(fm)
##
## Call:
## ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) +
## tdiff + ratio_tc, data = CigarettesSW)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.61378 -0.07232 0.01435 0.09486 0.37504
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.7365 0.5687 17.122 < 2e-16 ***
## log(rprice) -1.2291 0.1552 -7.922 4.94e-12 ***
## log(rincome) 0.2568 0.1434 1.791 0.0765 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1654 on 93 degrees of freedom
## Multiple R-Squared: 0.5486, Adjusted R-squared: 0.5389
## Wald test: 39.93 on 2 and 93 DF, p-value: 3.025e-13
##
## Call:
## ivreg(formula = log(packs) ~ log(rprice) + log(rincome) | log(rincome) +
## tdiff + ratio_tc, data = CigarettesSW)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.61378 -0.07232 0.01435 0.09486 0.37504
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.7365 0.5060 19.243 < 2e-16 ***
## log(rprice) -1.2291 0.1522 -8.078 6.59e-16 ***
## log(rincome) 0.2568 0.1503 1.709 0.0874 .
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 2 92 145.502 <2e-16 ***
## Wu-Hausman 1 92 2.204 0.141
## Sargan 1 NA 0.018 0.893
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1654 on Inf degrees of freedom
## Multiple R-Squared: 0.5486, Adjusted R-squared: 0.5389
## Wald test: 101.5 on 2 DF, p-value: < 2.2e-16
## TwoSampleMR 0.5.1
## [>] IMPORTANT: Some datasets have been updated, and some are disabled while we update them.
## Apologies for this inconvenience, they will be back up as soon as possible.
## See news(package='TwoSampleMR') and https://gwas.mrcieu.ac.uk for more information
## [>] To temporarily revert to the previous database, see here:
## https://mrcieu.github.io/TwoSampleMR/articles/gwas2020.html
Goal: investigate whether higher BMI (exposure) causes Coronary heart disease (outcome)
## API: public: http://gwas-api.mrcieu.ac.uk/
## # A tibble: 6 x 2
## trait id
## <chr> <chr>
## 1 ENSG00000134884 eqtl-a-ENSG0000013…
## 2 Operative procedures - main OPCS: B27.4 Total mastec… ukb-b-20278
## 3 ENSG00000170464 eqtl-a-ENSG0000017…
## 4 ENSG00000090263 eqtl-a-ENSG0000009…
## 5 Lamina-associated polypeptide 2, isoforms beta/gamma prot-a-3021
## 6 ENSG00000251637 eqtl-a-ENSG0000025…
## pval.exposure samplesize.exposure chr.exposure se.exposure beta.exposure
## 1 2.18198e-08 339152 1 0.0030 -0.0168
## 2 4.56773e-11 339065 1 0.0031 0.0201
## 3 5.05941e-14 313621 1 0.0087 0.0659
## 4 1.93299e-08 294030 3 0.0033 0.0183
## 5 2.53501e-10 337750 6 0.0033 0.0209
## 6 4.52376e-31 339197 6 0.0038 0.0444
## pos.exposure id.exposure SNP effect_allele.exposure
## 1 47684677 ieu-a-2 rs977747 G
## 2 78048331 ieu-a-2 rs17381664 C
## 3 110082886 ieu-a-2 rs7550711 T
## 4 81792112 ieu-a-2 rs3849570 A
## 5 34828553 ieu-a-2 rs6457796 C
## 6 50865820 ieu-a-2 rs943005 T
## other_allele.exposure eaf.exposure exposure
## 1 T 0.5333 Body mass index || id:ieu-a-2
## 2 T 0.4250 Body mass index || id:ieu-a-2
## 3 C 0.0339 Body mass index || id:ieu-a-2
## 4 C 0.3667 Body mass index || id:ieu-a-2
## 5 T 0.2583 Body mass index || id:ieu-a-2
## 6 C 0.1000 Body mass index || id:ieu-a-2
## mr_keep.exposure pval_origin.exposure data_source.exposure
## 1 TRUE reported igd
## 2 TRUE reported igd
## 3 TRUE reported igd
## 4 TRUE reported igd
## 5 TRUE reported igd
## 6 TRUE reported igd
## [1] 79 15
The minimum information required for MR analysis is the following:
## Warning: 'clump_data' is deprecated.
## Use 'ieugwasr::ld_clump()' instead.
## See help("Deprecated")
## Warning: since v0.4.2 the default r2 value for clumping has changed from 0.01 to 0.001
## Clumping ieu-a-2, 79 variants
## Removing 1 of 79 variants due to LD with other variants or absence from LD reference panel
## [1] 78 15
## Extracting data for 78 SNP(s) from 1 GWAS(s)
## SNP chr pos beta.outcome se.outcome samplesize.outcome
## 1 rs977747 1 47684677 -0.013896 0.0096146 184305
## 2 rs17203016 2 208255518 0.020352 0.0122315 184305
## 3 rs7550711 1 110082886 -0.048354 0.0305635 184305
## 4 rs9374842 6 120185665 0.013555 0.0106053 184305
## 5 rs4740619 9 15634326 -0.006546 0.0092497 184305
## 6 rs2112347 5 75015242 -0.005855 0.0096581 184305
## pval.outcome eaf.outcome effect_allele.outcome other_allele.outcome
## 1 0.1483750 0.550761 G T
## 2 0.0961324 0.186737 G A
## 3 0.1136310 0.027650 T C
## 4 0.2012030 0.723965 T C
## 5 0.4791310 0.467784 C T
## 6 0.5443640 0.401399 G T
## outcome id.outcome originalname.outcome
## 1 Coronary heart disease || id:ieu-a-7 ieu-a-7 Coronary heart disease
## 2 Coronary heart disease || id:ieu-a-7 ieu-a-7 Coronary heart disease
## 3 Coronary heart disease || id:ieu-a-7 ieu-a-7 Coronary heart disease
## 4 Coronary heart disease || id:ieu-a-7 ieu-a-7 Coronary heart disease
## 5 Coronary heart disease || id:ieu-a-7 ieu-a-7 Coronary heart disease
## 6 Coronary heart disease || id:ieu-a-7 ieu-a-7 Coronary heart disease
## outcome.deprecated mr_keep.outcome data_source.outcome
## 1 Coronary heart disease || || TRUE igd
## 2 Coronary heart disease || || TRUE igd
## 3 Coronary heart disease || || TRUE igd
## 4 Coronary heart disease || || TRUE igd
## 5 Coronary heart disease || || TRUE igd
## 6 Coronary heart disease || || TRUE igd
This means that the effect of a SNP on the exposure and the effect of that SNP on the outcome must each correspond to the same allele.
## Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
## SNP effect_allele.exposure other_allele.exposure
## 1 rs1000940 G A
## 2 rs10132280 A C
## 3 rs1016287 C T
## 4 rs10182181 G A
## 5 rs10733682 G A
## 6 rs10840100 G A
## effect_allele.outcome other_allele.outcome beta.exposure beta.outcome
## 1 G A 0.0184 -0.014538
## 2 A C -0.0221 -0.012169
## 3 C T -0.0228 -0.014087
## 4 G A 0.0309 0.018295
## 5 G A -0.0188 -0.004541
## 6 G A 0.0206 0.014599
## eaf.exposure eaf.outcome remove palindromic ambiguous id.outcome chr
## 1 0.2250 0.337103 FALSE FALSE FALSE ieu-a-7 17
## 2 0.3333 0.282164 FALSE FALSE FALSE ieu-a-7 14
## 3 0.6750 0.699031 FALSE FALSE FALSE ieu-a-7 2
## 4 0.5000 0.473525 FALSE FALSE FALSE ieu-a-7 2
## 5 0.5750 0.491116 FALSE FALSE FALSE ieu-a-7 9
## 6 0.7250 0.607618 FALSE FALSE FALSE ieu-a-7 11
## pos se.outcome samplesize.outcome pval.outcome
## 1 5283252 0.0098395 184305 0.1395380
## 2 25928179 0.0106640 184305 0.2538160
## 3 59305625 0.0105738 184305 0.1827780
## 4 25150296 0.0092790 184305 0.0486486
## 5 129460914 0.0095245 184305 0.6335250
## 6 8669437 0.0095234 184305 0.1252860
## outcome originalname.outcome
## 1 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 2 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 3 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 4 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 5 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## 6 Coronary heart disease || id:ieu-a-7 Coronary heart disease
## outcome.deprecated mr_keep.outcome data_source.outcome
## 1 Coronary heart disease || || TRUE igd
## 2 Coronary heart disease || || TRUE igd
## 3 Coronary heart disease || || TRUE igd
## 4 Coronary heart disease || || TRUE igd
## 5 Coronary heart disease || || TRUE igd
## 6 Coronary heart disease || || TRUE igd
## pval.exposure samplesize.exposure chr.exposure se.exposure pos.exposure
## 1 1.81201e-08 338903 17 0.0033 5283252
## 2 1.40088e-11 338856 14 0.0033 25928179
## 3 4.35512e-12 339033 2 0.0033 59305625
## 4 8.07049e-26 338829 2 0.0029 25150296
## 5 2.45499e-10 336886 9 0.0030 129460914
## 6 6.66653e-12 339135 11 0.0030 8669437
## id.exposure exposure mr_keep.exposure
## 1 ieu-a-2 Body mass index || id:ieu-a-2 TRUE
## 2 ieu-a-2 Body mass index || id:ieu-a-2 TRUE
## 3 ieu-a-2 Body mass index || id:ieu-a-2 TRUE
## 4 ieu-a-2 Body mass index || id:ieu-a-2 TRUE
## 5 ieu-a-2 Body mass index || id:ieu-a-2 TRUE
## 6 ieu-a-2 Body mass index || id:ieu-a-2 TRUE
## pval_origin.exposure data_source.exposure action mr_keep
## 1 reported igd 2 TRUE
## 2 reported igd 2 TRUE
## 3 reported igd 2 TRUE
## 4 reported igd 2 TRUE
## 5 reported igd 2 TRUE
## 6 reported igd 2 TRUE
## Analysing 'ieu-a-2' on 'ieu-a-7'
## id.exposure id.outcome outcome
## 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
## 2 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
## exposure method nsnp b
## 1 Body mass index || id:ieu-a-2 MR Egger 78 0.4926008
## 2 Body mass index || id:ieu-a-2 Inverse variance weighted 78 0.4505027
## se pval
## 1 0.14436715 1.035793e-03
## 2 0.05913769 2.579139e-14