CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0589
Title: Multi-target semi-supervised learning with application to small area estimation Authors:  Katarzyna Reluga - Humboldt University of Berlin (Germany) [presenting]
Nicola Salvati - University of Pisa (Italy)
Mark van der Laan - University of California at Berkeley (United States)
Abstract: In the classical single-target semi-supervised learning (SSL) setting, one has access to (i) a moderately sized labeled dataset containing both response values and associated features, and (ii) a much larger unlabeled dataset with only covariates observed. SSL naturally arises in settings where collecting features is easy, but obtaining labels is expensive or time-consuming, for example, in electronic health records or survey data, where full data is available for only a small subset of the population. This framework is extended to multi-target semi-supervised learning, where the goal is to estimate several parameters of interest across different subpopulations, but labeled data are sparse. Classical SSL methods can suffer from excessive variability in this setting. Novel estimation methods tailored to this problem are proposed, and it is demonstrated how they improve stability and efficiency. Finally, it is shown how small area estimation emerges as a special case of this broader learning framework.