A0248
Title: Clustering with diversity: A promising approach with the determinantal point process
Authors: Serge Vicente - McGill University (Canada) [presenting]
Abstract: A random restart of a given algorithm produces many partitions that can be aggregated to yield a consensus clustering. Ensemble methods have been recognized as more robust approaches for data clustering than single clustering algorithms. However, most current initial sets are generated with center points sampled uniformly at random, which can fail both to ensure diversity and obtain good coverage of all data facets. We propose the use of determinantal point processes or DPPs for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. DPPs favor diversity of the center points in initial sets, so that sets with similar points have less chance of being generated than sets with very distinct points. Extensive simulations show that DPPs ensure diversity and obtain good coverage of all data facets, two key properties that make DPPs achieve good performance. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperforms consensus clusterings which are based on a uniform random sampling of center points. The use of DPPs results in final clustering configurations with higher and less dispersed quality scores, when compared to clustering configurations based on uniform sampling of initial points. DPPs are then a promising approach for improving clustering results.