CMStatistics 2019: Start Registration
View Submission - CMStatistics
B0521
Title: Enhanced variable selection for boosting distributional regression Authors:  Andreas Mayr - University of Bonn (Germany) [presenting]
Annika Stroemer - University of Bonn (Germany)
Leonie Weinhold - University of Bonn (Germany)
Christian Staerk - RWTH Aachen University (Germany)
Abstract: An alternative way to fit distributional regression models is component-wise gradient boosting. Boosting leads to data-driven variable selection and works for high-dimensional data, while the resulting additive model is in the same way interpretable as if it was fitted via classical inference schemes. While being very flexible and also relatively easy to extend, in some practical applications the algorithm shows the tendency towards selecting too many variables, including false positives. This seems to take place particularly for rather low-dimensional data ($p<n$). To deal with this, we analyse different approaches to either de-select variables or stop the algorithm before it starts selecting non-informative ones. We illustrate the different approaches with a recent analysis of the health-related quality of life of patients with chronic kidney disease - using boosting to select the most informative predictors fitting a distributional beta regression model.