CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0226
Title: Imputation strategies for clustering mixed-type data with missing values Authors:  Rabea Aschenbruck - Stralsund University of Applied Sciences (Germany) [presenting]
Gero Szepannek - Stralsund University of Applied Sciences (Germany)
Adalbert Wilhelm - Constructor University Bremen gGmbH (Germany)
Abstract: Incomplete data sets with different data types are difficult to handle, but regularly to be found in practical clustering tasks. Therefore, two procedures for clustering mixed-type data with missing values are derived and analyzed in a simulation study with respect to the factors of partition, prototypes, imputed values, and cluster assignment. Both approaches are based on the k-prototypes algorithm (an extension of k-means), which is one of the most common clustering methods for mixed-type data (i.e., numerical and categorical variables). For k-means clustering of incomplete data, the k-POD algorithm recently has been proposed, which imputes the missings with values of the associated cluster centre. An adaptation of the latter is derived and additionally present a cluster aggregation strategy after multiple imputation. It turns out that even a simplified and time-saving variant of the presented method can compete with multiple imputations and subsequent pooling.