CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0652
Title: Quantifying variable importance in cluster analysis Authors:  Christian Hennig - University of Bologna (Italy) [presenting]
Keefe Murphy - Maynooth University (Ireland)
Abstract: The quantification of variable importance in cluster analysis is of interest in order to interpret and understand the impact of the variables on clustering, and potentially also for variable selection. General clustering methods can be measured by comparing a clustering with all variables with a clustering in which a variable has been left out or permuted. These two approaches are compared regarding their ability to tell apart meaning from noise variables. A potential concern regarding clustering mixed continuous/categorical variables is that certain methods may be unduly dominated by either the continuous or the categorical variables. It is addressed by comparing methods such as latent class model-based clustering, distance-based clustering using Gowers distance with various weighting/standardisation schemes, or KAMILA regarding the relative importance of the continuous and categorical variables using a comprehensive simulation study and real data.