View Submission - HiTECCoDES2024
A0219
Title: Combinatorial algorithms for variable selection in regression Authors:  Cristian Gatu - Alexandru Ioan Cuza University of Iasi (Romania) [presenting]
M Hofmann - University of Oviedo (Spain)
Marios Demosthenous - Cyprus University of Technology (Cyprus)
Erricos Kontoghiorghes - Cyprus University of Technology and Birkbeck University of London, UK (Cyprus)
Abstract: Computational strategies for computing the best-subset regression models are proposed. The algorithms are based on a regression tree structure that generates all possible subset models. An efficient branch-and-bound algorithm that finds the best submodels without generating the entire tree is described. Approximate algorithms that improve the computational performance are investigated. Further, this strategies are adapted to solve the problem of regression subset selection under the condition of non-negative coefficients. The solution is based on an alternative approach to quadratic programming that derives the non-negative least squares by solving the normal equations for a number of unrestricted least squares subproblems. The R package "lmSubsets" for regression subset selection is introduced and described. The package aims to provide a versatile tool for subset regression. Finally, the case of high-dimensional data where the number of variables exceeds the number of observations is considered. Within this context, a novel combinatorial solution is proposed. It generates a high-dimensional regression tree to select the optimal model of size up to k variables, where k is smaller than the available observation in the data. It avoids evaluating the same model more than once and utilizes previous computations for evaluating subsequent combinations of variables, thus reducing the computational cost. Experimental results are presented and analyzed.