A1394
Title: Heuristic algorithms for subset regression model selection
Authors: Cristian Gatu - Alexandru Ioan Cuza University of Iasi (Romania) [presenting]
Georgiana-Elena Pascaru - Alexandru Ioan Cuza University of Iasi (Romania)
Petru Sebastian Drumia - Alexandru Ioan Cuza University of Iasi (Romania)
Erricos Kontoghiorghes - Cyprus University of Technology and Birkbeck University of London (Cyprus)
Abstract: Heuristics step-wise algorithms are an established approach to the regression model selection problem. A main drawback of these methods is the reduced number of submodels that are evaluated in order to select a solution, thus, in general, failing to find the optimum. Three strategies that aim to overcome this issue are investigated: This first one (SEL-k) builds on standard forward selection, but selects at each step the best $k$ variables, instead of the only best one. The second method (TREE-k) explores a combinatorial search space, thus increasing the number of submodels that are investigated. Specifically, at each step of the algorithm a new search branch is considered for each of the best $k$ most significant variables. A branch will terminate either when there are no more significant variables to choose from, or when all variables have been considered. The third one (SHUTTLE-k) aims to obtain good solutions but avoiding a prohibitive computational cost. A list of $k$ submodels is stored. During one iteration, for each of the $k$ submodels, the best $k$ variables are chosen yielding $k^2$ submodels (augmentation step). From the resulting $k^2$ list the best $k$ submodels are kept for the subsequent iteration (reduction step). Various experiments are conducted on both real and artificially generated datasets in order to assess the three proposed algorithms. The results are presented and discussed.