A0408
Title: Computational strategies for regression model selection in the high-dimensional case
Authors: Marios Demosthenous - Cyprus University of Technology (Cyprus) [presenting]
Cristian Gatu - Alexandru Ioan Cuza University of Iasi (Romania)
Erricos Kontoghiorghes - Cyprus University of Technology and Birkbeck University of London (Cyprus)
Abstract: Computational strategies for finding the best-subset regression models are proposed. The case of high-dimensional (HD) data, where the number of variables exceeds the number of available observations, is considered. Within this context, a theoretical combinatorial solution is proposed. It is based on a regression tree structure that generates all possible submodels in an optimal manner. The traversal of this regression tree corresponds to an effective enumeration scheme of all possible unique combinations of variables up to a specified size. The size of each submodel should be less than the number of available observations so that these submodels can be estimated numerically. An efficient branch-and-bound algorithm adapted to the HD case finds the best submodels without generating the entire tree. The best submodels are determined according to selection criteria from the AIC family. In order to further reduce the computational burden, an approximate version of the branch-and-bound strategy that prunes non-optimal subtrees using a tolerance parameter is also proposed. The approximate algorithm finds models close to the best ones that are bounded by the specified tolerance. It is shown that the relative error of the AIC criterion of the computed subset is bounded above by the chosen tolerance parameter. Computational results and experiments on artificial datasets are presented and analyzed.