CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0671
Title: Statistical learning for constructing genetic risk scores Authors:  Michael Lau - Heinrich Heine University Duesseldorf (Germany) [presenting]
Tamara Schikowski - IUF - Leibniz Research Institute for Environmental Medicine (Germany)
Holger Schwender - Heinrich Heine University Duesseldorf (Germany)
Abstract: Genetic risk scores (GRS) are an important tool in genetic epidemiology for inferring how phenotypes manifest. Most commonly, GRS are constructed using linear statistical approaches such as regularized, generalized linear models. Such models are interpretable and easy to fit. However, since genetic loci might interact with each other or with environmental risk factors, these models might not be able to properly capture important underlying biological mechanisms. It is investigated how established tree-based statistical learning methods could improve the predictive ability of GRS models. Observed shortcomings of tree-based ensemble methods include the lack of interpretability of fitted models. Therefore, a novel statistical learning method is developed called BITS (boosting interaction tree stumps) in which an interpretable and highly predictive, generalized linear model is fitted by autonomously including interaction terms to overcome this problem. In BITS, interaction tree stumps are fitted as base learners in gradient boosting for identifying predictive marginal or interaction terms. These interaction tree stumps are fitted by a branch-and-bound search that discards irrelevant terms without a full evaluation. In simulations and real data applications, it is shown that BITS induces high predictive performances, especially in comparison to other interpretability-focused methods.