EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0181
Title: Transfer learning methods to get more reliable estimates of effects and predictions Authors:  Haotian Zheng - Vertex Pharmaceuticals (United States) [presenting]
Sai Li - Renmin University of China (China)
Hongzhe Li - University of Pennsylvania (United States)
Abstract: One common problem in modern genomics and multiomics studies is that the model often has weak classification or prediction performance due to a small set of training samples compared to the number of genomic features. On the other hand, there are often auxiliary data sets that are related to the target learning problem, which can potentially be transferred to improve parameter estimation or prediction. Estimation and prediction methods are introduced for high-dimensional transfer learning for two problems. The first problem is transfer learning for high dimensional linear discriminant analysis (LDA), one of the most commonly used methods for building a classification rule when the data are approximately Gaussian. The second problem is transfer learning for high-dimensional linear regression in settings where summary statistics is only observed in the auxiliary studies. It is shown that such summary statistics, together with external data for estimating the feature covariance matrix (e.g., linkage disequilibrium (LD) matrix), can be effectively used in transfer learning. It is shown that under some assumptions, transfer learning methods have lower error rates in estimating the effect sizes and in classification/prediction. The methods are demonstrated using numerical studies and applications to several data sets, including the polygenic risk score (PRS) prediction of blood-related phenotypes using Penn Medicine Biobank genotype data and UK Biobank summary statistics.