CFE 2019: Start Registration
View Submission - CMStatistics
B1154
Title: Robust clustering algorithm based on Median-of-Means statistics Authors:  Edouard Genetay - CREST-ENSAI (France) [presenting]
Adrien Saumard - Crest-Ensai (France)
Camille Saumard - artfact lumenAI (France)
Abstract: Classical clustering methods, such as K-means, suffer from a lack of robustness with respect to outliers. We propose a robust version of K-means, using median-of-means statistics, a strategy that has been recently put to emphasis for efficient robust machine learning. The algorithm is iterative, in a Lloyd-type fashion. We propose an efficient initalization and empirically show rapid convergence along the iteration steps. The algorithm clearly outperforms K-means on corrupted or heavy-tailed data and is competitive with other robust approaches, such as K-median for instance. As an additional outcome, our algorithm provides a detection of outliers.