COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0366
Title: Issues with the R-squared for the evaluation of polygenic prediction models across diverse ancestries Authors:  Christian Staerk - IUF - Leibniz Research Institute for Environmental Medicine (Germany) [presenting]
Hannah Klinkhammer - University of Bonn (Germany)
Tobias Wistuba - University of Bonn (Germany)
Carlo Maj - University of Marburg (Germany)
Andreas Mayr - University of Bonn (Germany)
Abstract: Polygenic risk scores (PRS) quantify genetic predispositions for traits and clinical outcomes based on genotype data. For effective personalized risk assessment, polygenic prediction models should generalize well across diverse ancestries. However, the commonly used R-squared measure is not unambiguously defined for test data, complicating the assessment of prediction accuracy of PRS models and the interpretation of results. Recent scalable regression methods are applied, including statistical boosting on large-scale individual-level genotype data from the UK Biobank, and three R-squared definitions are compared for evaluating the predictive performance of PRS models on different populations. It is found that the choice of R-squared definition considerably affects the results: while the squared correlation between predicted and observed phenotypes always stays between 0 and 1, R-squared definitions incorporating the squared prediction error can yield negative values, particularly for miscalibrated prediction models. It is argued that the choice of the most appropriate definition of the R-squared depends on the aim of the PRS analysis, i.e., whether the PRS should be mainly used for risk stratification in a given cohort or also for the prediction of continuous traits for individual risk assessment. Further research is needed to develop and evaluate well-calibrated polygenic models across diverse ancestries in clinical practice.