A0721
Title: Enhancing efficiency and robustness in high-dimensional linear regression with additional unlabeled data
Authors: Kai Chen - Renmin University of China (China)
Yuqian Zhang - Renmin University of China (China) [presenting]
Abstract: In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. The aim is to challenge this notion, demonstrating its inaccuracy in high dimensions. Initially focusing on a dense scenario, robust semi-supervised estimators are introduced for the regression coefficient without relying on sparse structures in the population slope. Even when the true underlying model is linear, it is shown that leveraging information from large-scale unlabeled data improves both estimation accuracy and inference robustness. Moreover, semi-supervised methods are proposed to further enhance efficiency in scenarios with a sparse linear slope. Diverging from the standard semi-supervised literature, covariate shifts are also allowed. The performance of the proposed methods is illustrated through extensive numerical studies, including simulations and a real-data application to the AIDS Clinical Trials Group Protocol 175 (ACTG175).