CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0391
Title: Area-level small area estimation with random forests Authors:  Sylvia Harmening - Otto-Friedrich-Universitaet Bamberg (Germany) [presenting]
Marina Runge - Freie Universitaet Berlin (Germany)
Timo Schmid - Otto-Friedrich-Universitaet Bamberg (Germany)
Abstract: Interactions among explanatory variables and nonlinear relationships between them and the dependent variable are present in many data applications. An approach that combines a small area estimation model with tree-based methods to provide a solution when only area-level data are available is presented. In particular, the linear regression synthetic part of the Fay-Herriot model is replaced by a random forest to link survey data with related administrative information or data from other sources. By using a random forest, possible interactions and nonlinear relationships are accounted for, and automatic variable selection and robustness to outliers are indirectly provided as a property of the random forest. To obtain point estimates for an indicator of interest, the familiar structure of the Fay-Herriot estimator is retained. The estimation is done by implementing an expectation maximization algorithm. To determine the uncertainty of the point estimator, a nonparametric bootstrap method for estimating the mean squared error is presented. To evaluate the accuracy and precision of the proposed estimator and its uncertainty measure model-based simulations are carried out. The presented methodology is also demonstrated by an illustrative application.