EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0362
Title: Bayesian data selection Authors:  Jeff Miller - Harvard University (United States) [presenting]
Eli Weinstein - Harvard University (United States)
Abstract: Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the ``data selection'' problem: finding a lower-dimensional statistic - such as a subset of variables - that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining ``background'' components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing both data selection and model selection, the ``Stein volume criterion'', that takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback-Leibler divergence. The Stein volume criterion does not require one to fit or even specify a nonparametric background model, making it straightforward to compute - in many cases it is as simple as fitting the parametric model of interest with an alternative objective function.