Title: A random forest approach for modeling bounded outcomes
Authors: Leonie Weinhold - University of Bonn (Germany)
Matthias Schmid - University of Bonn (Germany)
Richard Mitchell - USEPA Office of Wetlands (United States)
Kelly Maloney - US Environmental Protection Agency (United States)
Marvin Wright - Leibniz Institute for Prevention Research and Epidemiology - BIPS (Germany)
Moritz Berger - University of Bonn (Germany) [presenting]
Abstract: In observational studies one frequently encounters bounded outcome variables, for example, relative frequency measures restricted to the unit interval $(0, 1)$. A flexible approach to relate the bounded outcome to a set of explanatory variables is beta regression. In parametric beta regression models one usually assumes that the effects of the explanatory variables on the outcome are linear. In many applications, however, this assumption is too restrictive, for example, when higher-order interactions between the explanatory variables are present. Furthermore, parametric models may not be applicable to high-dimensional data, for example when the number of explanatory variables exceeds the number of observations. To address these issues we propose a random forest approach tailored to the modeling of bounded outcome variables. In contrast to classical random forest algorithms with continuous outcome, which use the mean squared error as splitting criterion, we propose to use the likelihood of the beta distribution for tree building. In each iteration of the tree-building algorithm one chooses the combination of explanatory variable and split point that maximizes the log-likelihood function of the beta distribution, with the parameter estimates directly derived from the nodes of the currently built tree. The method is implemented in the R package ranger.