A1542
Title: A deterministic information bottleneck method for clustering mixed-type data
Authors: Efthymios Costa - Imperial College London (United Kingdom) [presenting]
Ioanna Papatsouma - Imperial College London (United Kingdom)
Angelos Markos - Democritus University Of Thrace (Greece)
Abstract: A plethora of algorithms for cluster analysis have been developed in recent years, with most focusing on just continuous data and not being suitable for mixed-type data sets, that is, consisting of both continuous and categorical variables. The clustering techniques that have been proposed to deal with this heterogeneity treat categorical variables as being of the same type (either nominal or ordinal), and many fail to take into account that certain variables may be completely unrelated to the cluster structure. An information-theoretic approach is presented for clustering mixed-type data based on the deterministic variant of the information Bottleneck algorithm. The proposed method treats different variable types separately and seeks to optimally compress the data into clusters while retaining relevant information about the underlying structure. Furthermore, the selection of hyperparameters associated with this method provides the user with the flexibility of incorporating feature selection within the algorithm. The performance of the approach is compared to that of three well-established clustering methods (KAMILA, K-Prototypes, and partitioning around medoids with Gower's dissimilarity) on simulated and real-world datasets. The results demonstrate that the proposed approach represents a competitive alternative to conventional clustering techniques under specific conditions.