CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1522
Title: Stratified learning: A general-purpose statistical method for improved learning under covariate shift Authors:  Maximilian Autenrieth - Imperial College London (United Kingdom)
David van Dyk - Imperial College London (United Kingdom) [presenting]
David Stenning - Simon Fraser University (Canada)
Roberto Trotta - SISSA (Italy)
Abstract: A simple, statistically principled, and theoretically justified method is proposed to improve supervised learning when the training set is not representative, a situation known as covariate shift. Building upon a well-established methodology in causal inference, it is shown that the effect of covariate shift can be reduced or eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners within strata constructed by partitioning the data based on the estimated propensity scores, leading to approximately balanced covariates and much-improved target prediction. This refers to the overall method as Stratified Learning, or StratLearn. The effectiveness of this general-purpose method is demonstrated on contemporary research questions in cosmology, including the Supernovae photometric classification challenge, conditional density estimation of galaxy redshift from photometric data, and redshift calibration for weak lensing. Taken together, these examples illustrate how StratLearn outperforms state-of-the-art importance weighting methods.