EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0762
Title: High-dimensional clustering via latent semiparametric mixture models Authors:  Boxiang Wang - University of Iowa (United States) [presenting]
Abstract: Cluster analysis is a fundamental task in machine learning. Several clustering algorithms have been extended to handle high-dimensional data by incorporating a sparsity constraint in estimating a mixture of Gaussian models. Though it makes some neat theoretical analysis possible, this approach is arguably restrictive for many applications. A novel latent variable transformation mixture model is introduced for clustering in which a mixture of Gaussians is assumed after some unknown monotone data transformation. A new clustering algorithm named CESME is developed for high-dimensional clustering under the assumption that optimal clustering admits a sparsity structure. The use of unspecified transformation makes the model far more flexible than the classical mixture of Gaussians. On the other hand, the transformation also brings quite a few technical challenges to the model estimation as well as the theoretical analysis of CESME. A comprehensive analysis of CESME is presented, including identifiability, initialization, algorithmic convergence, and statistical guarantees on clustering. Leveraging such a transition, a data-adaptive procedure is developed and substantially improves the computational efficiency of CESME. Extensive numerical study and real data analysis show that CESME outperforms the existing high-dimensional clustering algorithms.