COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0168
Title: Local machine learning for data giants Authors:  Michael Scholz - University of Klagenfurt (Austria) [presenting]
Stefan Sperlich - University of Geneva (Switzerland)
Gilles Cattani - University of Geneva (Switzerland)
Abstract: Classical nonparametric estimation is the natural link between Breiman's two cultures, say `traditional regressions methods' and `pure prediction algorithms'. We borrow ideas of local smoothers and efficient implementation to combine good practices of both cultures for generating a practical tool for the statistical analysis of large data problems, may it be estimation, prediction or attribution. Estimation and prediction are particularly successful when allowing for local adaptiveness. Further, while typically distributed databases are considered a bane, data localization can turn it into a boon. Similarly, since most of the problems with divide-and-conquer algorithms are rooted in the paradigm of facing a global parameter set, they disappear by localization, and the selection of an optimal subsample size is melted with the one of optimal bandwidths which, in addition, we allow to be local too. Moreover, model and variable selection are possible, and sometimes even necessary, when staying local. For each step and subprocedure, we look for the most efficient implementation to keep the procedure fast. The proof of concept and computational details are given in a simulation study. An application to ocean warming illustrates the practical use of such a tool.