CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1101
Title: Initialization strategies for clustering mixed-type data with the k-prototypes algorithm Authors:  Adalbert Wilhelm - Constructor University Bremen gGmbH (Germany) [presenting]
Rabea Aschenbruck - Stralsund University of Applied Sciences (Germany)
Gero Szepannek - Stralsund University of Applied Sciences (Germany)
Abstract: One of the most popular partitioning cluster algorithms for mixed-type data is the k-prototypes algorithm. Due to its iterative structure, the algorithm may only converge to a local optimum rather than a global one. Therefore, the resulting cluster partition may suffer from the initialization. In general, there are two ways of achieving an improvement of the initialization: one possibility is to determine concrete initial cluster prototypes, and the other strategy is to repeat the algorithm with different randomly chosen initial objects. Different numbers of algorithm repetitions are analyzed and evaluated comparatively. It is shown that an improvement of the cluster algorithms' target criterion can be achieved by an appropriate choice of repetitions, even with manageable time expenditure.