CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A1659
Title: Combinatorial strategies for greedy regression model selection Authors:  Cristian Gatu - Alexandru Ioan Cuza University of Iasi (Romania) [presenting]
Georgiana-Elena Pascaru - Alexandru Ioan Cuza University of Iasi (Romania)
Petru Sebastian Drumia - Alexandru Ioan Cuza University of Iasi (Romania)
Erricos Kontoghiorghes - Cyprus University of Technology and Birkbeck University of London, UK (Cyprus)
Abstract: Greedy step-wise algorithms are an established approach to the regression model selection problem. A main drawback of these methods is the reduced number of submodels that are evaluated in order to select a solution, thus, in general, failing to find the optimum. Three strategies that aim to overcome this issue are investigated: SEL-k, TREE-k and SHUTTLE-k. SEL-k builds on standard forward selection but selects the best $k$ variables at each step instead of the only best one. TREE-k is a method that explores a combinatorial search space, thus increasing the number of submodels that are investigated. Specifically, at each step of the algorithm, a new search branch is considered for each of the best $k$ most significant variables. A branch will terminate either when there are no more significant variables to choose from or when all variables have been considered. SHUTTLE-k aims to obtain good solutions while avoiding a prohibitive computational cost. A list of $k$ submodels is stored. During one iteration, for each of the $k$ submodels, the best $k$ variables are chosen, yielding $k^2$ submodels (augmentation step). From the resulting $k^2$ list, the best $k$ submodels are kept for the subsequent iteration (reduction step). Various experiments are conducted on both real and artificially generated datasets in order to assess the three proposed algorithms. The results are presented and discussed.