CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0203
Title: Robust and efficient transfer learning of high dimensional EHR-linked biobank data Authors:  Doudou Zhou - UC Davis (United States)
Tianxi Cai - Harvard School of Public Health (United States)
Molei Liu - Harvard T.H. Chan School of Public Health (United States) [presenting]
Abstract: Due to the increasing availability of electronic health records (EHR) and the linkage of EHR with bio-repositories, large biobank data has become an important resource for biomedical studies. Nevertheless, realizing the potential of EHR-linked biobank data remains challenging due to several practical and methodological obstacles, including the paucity of accurate labels, data heterogeneity, and high dimensionality. These challenges strongly motivated us to develop novel transfer learning approaches that can robustly and effectively leverage some sizable source data sets to assist learning on a target sample that lacks of accurate labels. Techniques like doubly robust inference, debiasing, and prior-guided regularized regression are incorporated and extended to address covariate shifts, model heterogeneity, and high-dimensionality concurred in this process. The methods at Massachusetts General and Brigham (MGB) Healthcare Biobank to are applied to realize real-world transfer learning across subjects from different ethnic groups or time windows.