A1428
Title: Robust and efficient high-dimensional inference with surrogate outcomes
Authors: Yong Chen - Univ. of Pennsylvania (United States) [presenting]
Abstract: Electronic health records (EHR) offer a valuable resource for discovering novel disease risk factors. However, the common issue of missingness in the primary phenotype of interest often leads to efficiency loss in inferential methods that rely solely on fully observed samples. Additionally, the prevalent misclassification of EHR-derived phenotypes can result in systematic bias, thereby affecting the reproducibility of findings. In response to these challenges, a robust and efficient framework for high-dimensional EHR-based discovery is introduced. Based on a class of surrogate models for EHR-based phenotypes, an augmented score function is constructed, and a corresponding test statistic is developed. The statistic not only maintains correct coverage under the null hypothesis but also exhibits enhanced power under local alternatives, outperforming tests that only use fully observed samples. Surprisingly, it achieves the correct coverage even in scenarios with arbitrary misclassification of EHR-based phenotypes and misspecified surrogate models. The statistical effectiveness of the proposed method is evaluated through extensive simulations and real-world data application.