CMStatistics 2016: Start Registration
View Submission - CMStatistics
B1025
Title: Trade-off between computational cost and estimate accuracy: Some attempts Authors:  Christophe Biernacki - Inria (France) [presenting]
Alain Celisse - Lille University (France)
Maxime Brunin - Inria (France)
Abstract: Most estimates practically arise from algorithmic processes aiming at optimizing some standard, but usually only asymptotically relevant, criteria. Thus, the quality of the resulting estimate is a function of both the iteration number and also the involved sample size. An important question is to design accurate estimates while saving computation time, and we address it in the simplified context of linear regression here. Firstly, we fix the sample size. We focus on estimating an early stopping time of a gradient descent estimation process aiming at maximizing the likelihood. It appears that the accuracy gain of such a stopping time increases with the number of covariates, indicating potential interest of the method in real situations involving many covariates. Secondly, we authorize both the number of iterations and the (sub)sample size to be estimated for providing the optimal estimate accuracy, while respecting now a given maximum computational cost. Indeed, restricting estimation to subsamples is a standard behavior of practitioners in a ``Big Data'' context for computational reasons. Our aim is thus to formalize such an empirical process to provide established recommendations for selecting simultaneously the couple iteration number and subsample size.