B0301
Title: Local machine learning for data giants
Authors: Gilles Cattani - University of Geneva (Switzerland) [presenting]
Stefan Sperlich - University of Geneva (Switzerland)
Michael Scholz - University of Klagenfurt (Austria)
Abstract: Classical nonparametric estimation is the natural link between two cultures: 'traditional regressions methods' and 'pure prediction algorithms'. We borrow ideas of local smoothers and efficient implementation to combine good practices of both cultures to generate a practical tool for the statistical analysis of large data problems, whether estimation, prediction, or attribution. Estimation and prediction are particularly successful when allowing for local adaptiveness. Further, while typically distributed databases are considered as a bane, data localization can turn it into a boon. Similarly, since most of the problems with divide-and-conquer algorithms root in the paradigm of facing a global parameter set, they disappear by localization. Also, the selection of an optimal subsample size is melted with the problem of finding optimal bandwidths, which, moreover, we allow being local too. Finally, model and variable selection can be done and sometimes even becomes necessary when being local. For each step and subprocedure, we looked for the most efficient implementation to keep the procedure fast. The proof of concept and computational details are given in a simulation study. An empirical application to data giants illustrates the practical use of such a tool.