CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0501
Title: Discriminant analysis in high-dimensional Gaussian mixtures Authors:  Xin Bing - University of Toronto (Canada)
Marten Wegkamp - Cornell (United States) [presenting]
Abstract: Binary classification of high-dimensional features is considered under a postulated model with a low-dimensional latent Gaussian mixture structure and non-vanishing noise. A computationally efficient classifier is proposed that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. Explicit rates of convergence of the excess risk of the proposed PC-based classifier are derived and is proven that the obtained rates are optimal, up to some logarithmic factor, in the minimax sense. All PCs are then retained to estimate the direction of the optimal separating hyperplane. The estimated hyperplane is shown to interpolate on the training data. While the direction vector can be consistently estimated as could be expected from recent results in linear regression, a naive plug-in estimate fails to consistently estimate the intercept. A simple correction, that requires an independent hold-out sample, renders the procedure consistent and even minimax optimal in many scenarios. The interpolation property of the latter procedure can be retained but surprisingly depends on the way the labels are encoded.