CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0891
Title: Federated variational inference for Bayesian mixture models Authors:  Paul Kirk - University of Cambridge (United Kingdom) [presenting]
Jackie Rao - University of Cambridge (United Kingdom)
Abstract: A one-shot, unsupervised federated learning approach is presented for Bayesian model-based clustering of large-scale binary and categorical datasets, motivated by the need to identify patient clusters in privacy-sensitive electronic health record (EHR) data. A principled divide-and-conquer inference procedure is introduced, using variational inference with local merge and delete moves within batches of the data in parallel, followed by global merge moves across batches to find global clustering structures. It is shown that these merge moves require only summaries of the data in each batch, enabling federated learning across local nodes without requiring the full dataset to be shared. Empirical results on simulated and benchmark datasets demonstrate that the method performs well relative to comparator clustering algorithms. The practical utility of the method is validated by applying it to a large-scale British primary care EHR dataset to identify clusters of individuals with common patterns of co-occurring conditions (multimorbidity).