CMStatistics 2021: Start Registration
View Submission - CMStatistics
B1149
Title: Ultra-high dimensional learning of polygenic risk scores for mendelian randomization studies Authors:  Xinyi Zhang - University of Toronto (Canada)
Linbo Wang - University of Toronto (Canada) [presenting]
Stanislav Volgushev - University of Toronto (Canada)
Dehan Kong - University of Toronto (Canada)
Abstract: Mendelian randomization (MR) is a method by which genetic variants are leveraged as instrumental variables (IV) to investigate causal relationships between modifiable exposure or risk factor and a clinically relevant outcome from observational data. To provide reliable causal evidence, one key step in MR analysis is to identify valid instruments among the collection of all candidate genetic variants. Current methods work well when the size of candidate instruments is moderate. However, for the identification in ultrahigh dimensions, normal in practice, empirical evidence suggests that existing procedures may miss many or even all valid instrumental variables, due to the inclusion of irrelevant variables which exhibit spurious correlation with the exposure in observational data. To overcome this challenge, we propose a novel approach to first remove irrelevant variables from the candidate set and then apply existing work to the remaining candidates to make valid causal inference. Theoretically, we proved that causal effect estimates from selected irrelevant variables are also centered around a single value but distinct from the true causal effect with high probability, which makes selected irrelevant variables and valid instruments separable. Simulation studies and data application further demonstrate that the proposed procedure outperforms existing methods under ultrahigh-dimensional settings.