CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0647
Title: Bayesian coresets Authors:  Trevor Campbell - University of British Columbia (Canada) [presenting]
Abstract: Bayesian inference provides a coherent approach to learning from data and uncertainty assessment in complex, expressive statistical models. However, exact inference algorithms need to evaluate the full model joint probability many times, which is expensive in the large-data regime. Bayesian coresets address this problem by replacing the full dataset with a small, weighted, representative subset of data. Although the methodology is sound in principle, efficiently constructing such a coreset in practice remains a significant challenge. Existing methods tend to be complicated to implement, slow, require a secondary inference step after coreset construction, and do not enable model selection. A new method, sparse Hamiltonian flows, is introduced that addresses all of these challenges. The method involves first subsampling the data uniformly, and then optimizing a Hamiltonian flow parametrized by coreset weights and including occasional momentum quasi-refreshment steps. Theoretical results are presented, demonstrating that the method enables an exponential compression of the dataset in representative models. Real and synthetic experiments demonstrate that sparse Hamiltonian flows provide significantly more accurate posterior approximations compared with competing coreset constructions.