Title: Boosting with random selection of weak learners for variable selection in high-dimensional biomedical data
Authors: Christian Staerk - University of Bonn (Germany) [presenting]
Andreas Mayr - University of Bonn (Germany)
Abstract: Statistical boosting is a promising alternative to popular regularization methods such as the Lasso for modelling high-dimensional biomedical data with many possible explanatory variables: early stopping of the algorithm leads to implicit regularization and variable selection, enhancing the interpretability of the final models. Traditionally, the class of possible weak learners is fixed for all iterations of Boosting and consists of simple learners including only one explanatory variable at a time. Furthermore, the choice of the number of Boosting iterations is typically guided by optimizing the predictive performance of the resulting models, leading to models which often include unnecessarily large numbers of noise variables. We propose modifications of $L_2$Boost for variable selection in high-dimensional models which aim at addressing the potential issues described above. The modifications are based on an adaptive random selection of different classes of weak learners in each Boosting iteration. The considered classes include weak learners with several variables so that multiple coefficients can be updated at a single iteration. Furthermore, the proposed modifications of $L_2$Boost can impose an automatic stopping of the algorithm, leading to a reduced number of selected noise variables. The new approach is illustrated via simulations and a biomedical real data example.