COMPSTAT 2016: Start Registration
View Submission - COMPSTAT
A0300
Title: Finding the number of groups in model-based clustering via constrained likelihoods Authors:  Luis Angel Garcia-Escudero - Universidad de Valladolid (Spain) [presenting]
Andrea Cerioli - University of Parma (Italy)
Agustin Mayo-Iscar - Universidad de Valladolid (Spain)
Marco Riani - University of Parma (Italy)
Abstract: One of the most difficult problems in clustering is how to choose the number of clusters $k$. In model-based clustering, it is quite common the use of complexity-penalized likelihoods where the penalty term takes into account the number of free parameters. For instance, the BIC and ICL criteria can be used depending on whether mixture or classification likelihoods are considered. Unfortunately, these likelihoods are unbounded. This can be solved by considering appropriate constraints on the clusters' scatter matrices which it also avoids traditional algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than $c(\geq 1)$ has been proposed. Developing the associated penalized likelihood criteria requires taking into account the higher model complexity that a higher $c$ entails. Clustering should not be seen as a fully automatic task and any user has to play an active role by specifying somehow the desired type of partitions. This specification can be done by fixing $c$ depending on the clustering application. A fully automatized procedure, leading to a small and ranked list of optimal $(k,c)$, will be presented. Extension to robust clustering will also be outlined.