COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0196
Title: Tandem clustering with ICS Authors:  Andreas Alfons - Erasmus University Rotterdam (Netherlands)
Aurore Archimbaud - Erasmus University Rotterdam (Netherlands)
Klaus Nordhausen - University of Jyvaskyla (Finland) [presenting]
Anne Ruiz-Gazen - Toulouse School of Economics (France)
Abstract: Tandem clustering is a well-known technique for dealing with high-dimensional or noisy data to better identify clusters. It is a sequential approach based on first reducing the dimension of the data and then performing the clustering. The most common method, based on principal component analysis (PCA), has been criticized for only focusing on maximizing inertia and not necessarily preserving the structure of interest for clustering. Therefore, we suggest a new tandem clustering approach based on invariant coordinate selection (ICS). This multivariate method is designed to identify the structure of the data by jointly diagonalizing two scatter matrices. More specifically, some theoretical results proved that under some elliptical mixture models, the first and/or last components carry the information regarding the clustering structure. The issues of choosing the pair of scatter matrices and the components to keep are the two challenges that must be addressed. For clustering purposes, we suggest that the best scatter pairs consist of one matrix which captures the within-cluster structure and another which captures the global structure. To this end, the local shape or pairwise scatters prove to be good choices for estimating the within-structure. The performance of ICS as a dimension reduction method is evaluated to determine its ability to preserve the cluster structure of the data.