CMStatistics 2016: Start Registration
View Submission - CMStatistics
B1672
Title: Greedy variable selection on the Lasso solution grid Authors:  Piotr Pokarowski - University of Warsaw (Poland) [presenting]
Abstract: The Lasso estimator is a very popular tool for fitting sparse models to high-dimensional data. However, theoretical studies and simulations established that the model selected by the Lasso is usually too large. The concave regularizations (SCAD, MCP or capped-l1) are closer to the maximum l0-penalized likelihood estimator that is to the Generalized Information Criterion (GIC) than the Lasso and correct its intrinsic estimation bias. That methods use the Lasso as a starting set of models and try to improve it using local optimization. We propose a greedy method of improving the Lasso solution grid for Generalized Linear Models. For a given penalty the algorithm orders the Lasso non-zero coordinates according to the Wald statistics and then selects the model from a small family by GIC. We derive an upper bound on the selection error of the method and show in numerical experiments on synthetic and real-world data sets that the algorithm is more accurate than concave regularizations.