View Submission

A0531

Title: Optimal subsampling for large-scale linear classification Authors: Jun Yu - Beijing Institute of Technology (China) [presenting]
Abstract: Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. A new optimal subsampling framework is presented for linear classifiers based on a piecewise linear quadratic loss. The classifier not only aims to select an informative subset of the training sample to reduce data size but also embeds some summary statistics to maintain high accuracy. A novel view of hinge loss-based classifiers under the general subsampling framework will be presented with rigorous, proven statistical properties. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.