A0294
Title: Enhanced variable selection for boosting sparser and less complex models in distributional regression
Authors: Annika Stroemer - University of Bonn (Germany)
Nadja Klein - Karlsruhe Institute of Technology (Germany)
Christian Staerk - IUF - Leibniz Research Institute for Environmental Medicine (Germany)
Guillermo Briseno Sanchez - TU Dortmund University (Germany)
Andreas Mayr - University of Bonn (Germany) [presenting]
Abstract: Variable selection is already a challenge in classical regression, but in the context of distributional regression, where we model different parameters of the conditional distribution, it becomes even more pressing. An automated approach to deal with this is statistical boosting. These kinds of algorithms are able to select the most informative variables while fitting the corresponding models. Unfortunately, in many practical applications with a low to medium number of predictor variables, they have a tendency towards false positives. This does not necessarily harm prediction accuracy, as the falsely selected variables often have only a negligible impact on the final model. However, this behavior hinders the interpretation of the models. As the general aim of statistical modelling is to utilize models as complex as necessary but also as simple as possible, we investigate different approaches to enhance the variable selection properties of boosting, focusing on probing, stability selection, and a recent deselection approach. We will illustrate the effect of these approaches on variable selection and prediction accuracy with different model classes, including copula regression for multivariate distributional regression.