EcoSta 2024: Start Registration
View Submission - EcoSta 2025
A1251
Title: Sparse robust discriminant analysis for high-dimensional and heavy-tailed data Authors:  Jing Zeng - University of Science and Technology of China (China) [presenting]
Abstract: With advancements in data-collecting techniques, large-scale data has become increasingly prevalent in medical science. For instance, gene expression data provides information on tens of thousands of genes, while diagnostic imaging, such as magnetic resonance imaging, generates a vast volume of pixels. While various sparse linear discriminant analysis methods have been developed to handle high-dimensional medical data, they often assume the light-tailed predictors, which is frequently violated in real applications. In this paper, we propose a robust classifier under an elliptically-contoured discriminant analysis (EDA) model, which accommodates both light-tailed and heavy-tailed data. In addition, we assess the prediction accuracy using the balanced rate, a more appropriate metric when the data is imbalanced. Under the EDA model, we identify the intrinsic dimension-reduction subspace that captures all information from predictors for achieving the lowest balanced rate. By leveraging this dimension reduction subspace, we propose a robust high-dimensional classifier, which reduces data dimensionality through subspace projection, followed by prediction on the reduced data. Theoretically, our proposal simultaneously enjoys the consistency of subspace estimation, variable selection, and prediction accuracy under only finite fourth-moment conditions on the predictors.