EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0977
Title: Wasserstein k-centers clustering for distributional data Authors:  Ryo Okano - The University of Tokyo (Japan) [presenting]
Masaaki Imaizumi - The University of Tokyo (Japan)
Abstract: Distributional data arise when each data point can be regarded as a probability distribution, and its analysis is gaining increasing attention in statistics. Because the space of probability distributions does not have a vector space structure, distributional data cannot be analyzed using existing methods devised for Euclidean functional data. In particular, cluster analysis of distributional data is still under development. Adopting the Wasserstein metric, a novel clustering method for distributional data on the real line is proposed. The clustering method follows the k-centers clustering approach for functional data that accounts for the mean and the modes of variation differentials between clusters. The notions of Frechet mean, and geodesic principal component analysis are employed in the Wasserstein space to define the mean and the modes of variation structures of clusters of distributional data. These structures are used in a reclassification step to predict cluster membership of each distribution based on a non-parametric random-effect model. Through a simulation study and real data application, the proposed distributional clustering method is demonstrated to improve cluster quality compared to conventional clustering algorithms.