CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A1042
Title: A Bayesian method for learning mixture models of non-parametric components Authors:  Yun Wei - University of Texas at Dallas (United States) [presenting]
Long Nguyen - University of Michigan (United States)
Yilei Zhang - University of Michigan (United States)
Aritra Guha - ATT Data Science and AI Research (United States)
Abstract: Mixture models are widely used in modeling heterogeneous subpopulations in data. Mixture models of parametric components (e.g., Gaussian mixture models) have been thoroughly studied on both statistical and algorithmic fronts. However, in the face of the increasing complexity of large-scale data, parametric assumptions such as Gaussianity are often unrealistic, and very little literature has been found on learning the mixture models of non-parametric components. In an effort to fill this gap, the identifiability issue in mixture models of non-parametric components is first addressed. Building on this, a framework is established using a mixture of Dirichlet processes to learn such models, and an efficient MCMC algorithm is developed to implement the method. The method can learn each component density without resorting to solving the mixing measure, thus providing a sample-efficient framework for learning subpopulation properties from data. The posterior contraction rate of the component density estimator of an almost polynomial order is also shown, which is a significant improvement from the logarithm convergence rate of solving mixing measures. This substantiates the sample efficiency and applicability of the method in learning non-parametric component densities.