CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1509
Title: An interval-valued random forests model Authors:  Paul Gaona Partida - Utah State University (United States) [presenting]
Chih-Ching Yeh - University of Utah (United States)
Yan Sun - Utah State University (United States)
Adele Cutler - Utah State University (United States)
Abstract: Analyzing soft interval data for uncertainty quantification has attracted much attention recently. Within this context, regression methods for interval data have been extensively studied. As most existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and the development of nonlinear regression tools for interval data is crucial. An interval-valued random forests model is proposed that defines the splitting criterion of variance reduction based on an $L_2$ type metric in the Banach space of compact intervals. The model simultaneously considers the centers and ranges of the interval data as well as their possible interactions. Unlike most linear models that require additional constraints to ensure mathematical coherences, the proposed random forests model estimates the regression function in a nonparametric way, and so the predicted length is naturally nonnegative without any constraints. Simulation studies show that the new method outperforms typical existing regression methods for various linear, semi-linear, and nonlinear data archetypes and under different error measures. A real data example is presented to demonstrate the applicability where the price range data of the Dow Jones Industrial Average index and its component stocks are analyzed.