CMStatistics 2021: Start Registration
View Submission - CMStatistics
B0663
Title: How many data clusters are in the Galaxy data set: Bayesian cluster analysis in action Authors:  Bettina Gruen - WU Vienna University of Economics and Business (Austria) [presenting]
Gertraud Malsiner-Walli - WU Vienna University of Economics and Business (Austria)
Sylvia Fruehwirth-Schnatter - WU Vienna University of Economics and Business (Austria)
Abstract: In model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Based on results reported for the Galaxy data set using different Bayesian approaches, concerns were raised because the prior assumptions imposed remained rather obscure while playing a major role in the results obtained and conclusions drawn. We address these concerns by shedding light on how the specified priors influence the number of estimated clusters. We perform a sensitivity analysis of different prior specifications for the mixtures of a finite mixture model, i.e., the mixture model where a prior on the number of components is included. We use an extensive set of different prior specifications in a full factorial design and assess their impact on the estimated number of clusters for the Galaxy data set. Results highlight the interaction effects of the prior specifications and provide insights into which prior specifications are recommended to obtain a sparse clustering solution. A simulation study with artificial data provides further empirical evidence to support the recommendations. A clear understanding of the impact of the prior specifications removes restraints preventing the use of Bayesian methods due to the complexity of selecting suitable priors.