Title: Comparing EM to a greedy search algorithm to optimize ICL for mixture models
Authors: Arthur White - Trinity College Dublin (Ireland) [presenting]
Gilles Celeux - INRIA (France)
Jason Wyse - Trinity College Dublin (Ireland)
Abstract: The integrated complete-data likelihood (ICL) is a popular criterion in model-based clustering for choosing the number of clusters of a finite mixture model. Typically, the ICL is computed using a BIC-like approximation, which depends on maximum likelihood estimates that are found using the expectation-maximisation (EM) algorithm. An alternative method for clustering with the ICL calculates the exact ICL in closed form within a Bayesian framework. A greedy search (GS) algorithm is then used to allocate observations to clusters in order to maximise the ICL directly and hence obtain an optimal clustering solution. This approach can be used to simultaneously search the model space and cluster the data. To better understand the properties of the GS method, we conducted an extensive simulation study comparing its performance to the standard EM approach, in terms of number of clusters selected, cluster accuracy, and computational cost. The performance of the methods on real data is also discussed.