EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0399
Title: Semi-supervised triply robust inductive transfer learning Authors:  Mengyan Li - Bentley University (United States) [presenting]
Tianxi Cai - Harvard School of Public Health (United States)
Molei Liu - Columbia Mailman School of Public Health (United States)
Abstract: A semi-supervised triply robust inductive transfer learning (STRIFLE) approach is proposed, which integrates heterogeneous data from a label-rich source population and a label-scarce target population and utilizes a large amount of unlabeled data simultaneously to improve the learning accuracy in the target population. Specifically, a high dimensional covariate shift setting is considered, and two nuisance models are employed, a density ratio model and an imputation model, to combine transfer learning and surrogate-assisted semi-supervised learning strategies organically and achieve triple robustness. Different from double robustness, even if both nuisance models are misspecified or the distribution of Y given the surrogates and covariates is not the same between the two populations when the shifted source population and the target population share enough similarities, the triply robust STRIFLE estimator can still partially utilize the source population, and it is guaranteed to be no worse than the target-only surrogate-assisted semi-supervised estimator with negligible errors. These desirable properties of the estimator are established theoretically and verified in finite samples via extensive simulation studies. The STRIFLE estimator trains a Type II diabetes polygenic risk prediction model for the African American target population by transferring knowledge from electronic health records linked to genomic data observed in a larger European source population.