B1229
Title: SGD algorithms based on incomplete U-statistics: Large-scale minimization of empirical risk
Authors: Guillaume Papa - Telecom Paristech (France) [presenting]
Stephan Clemencon - Telecom ParisTech (France)
Abstract: In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples of observations. We argue that in the large-scale setting, the inductive principle of stochastic approximation for risk minimization should be implemented in a very specific manner. Precisely, the gradient estimates should be obtained by sampling tuples of data points with replacement (incomplete U-statistics) instead of sampling data points without replacement (complete U-statistics based on subsamples). We develop a theoretical framework accounting for the considerable impact of this strategy on the generalization ability of the prediction model returned by the Stochastic Gradient Descent (SGD) algorithm. It reveals that the method we promote achieves a much better trade-off between statistical accuracy and computational cost. Beyond the rate bound analysis, numerical experiments on AUC maximization and metric learning provide strong empirical evidence of the superiority of the proposed approach.