B0857
Title: Group number identification in random projection ensemble model-based clustering
Authors: Laura Anderlucci - University of Bologna (Italy) [presenting]
Angela Montanari - Universita di Bologna (Italy)
Abstract: A novel procedure is proposed for model-based clustering of high-dimensional data, based on random projection ensembles. Specifically, a Gaussian mixture model is fit to random projections of the high-dimensional data and a subset of solutions is selected accordingly to the Bayesian Information Criterion; the multiple `base' results are then aggregated via consensus to obtain the final partition. The proposed algorithm is a very general tool for model-based clustering of high-dimensional data. We explore in detail its behaviour within the Gaussian mixture model framework only; however, many other distributions can in principle be used. The procedure is derived under the assumption that the number of clusters $G$ is fixed and known. However, in real-life applications, it may happen that there is no insight about the `true' number of homogeneous groups and such information has to be inferred from the data. Here, some model selection procedures are suggested as a valid tool to choose the number of clusters when such information is not available.