EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0360
Title: Semi-supervised U-statistics Authors:  Ilmun Kim - Yonsei University (Korea, South) [presenting]
Larry Wasserman - Carnegie Mellon University (United States)
Sivaraman Balakrishnan - Carnegie Mellon University (United States)
Matey Neykov - Northwestern (United States)
Abstract: Semi-supervised datasets are ubiquitous across diverse domains where obtaining fully labeled data is costly or time-consuming. The prevalence of such datasets has consistently driven the demand for new tools and methods that exploit the potential of unlabeled data. In response to this demand, semi-supervised U-statistics enhanced by the abundance of unlabeled data are introduced, and their statistical properties are investigated. The proposed approach is shown to be asymptotically Normal and exhibits notable efficiency gains over classical U-statistics by effectively integrating various powerful prediction tools into the framework. To understand the fundamental difficulty of the problem, minimax lower bounds are derived in semi-supervised settings, and the procedure is showcased to be semi-parametrically efficient under regularity conditions. Moreover, tailored to bivariate kernels, a refined approach is proposed that outperforms the classical U-statistic across all degeneracy regimes and demonstrates its optimality properties.