A1109
Title: Model-based clustering of categorical data based on the Hamming distance
Authors: Lucia Paci - Universita Cattolica del Sacro Cuore (Italy)
Raffaele Argiento - Università degli Studi di Bergamo (Italy) [presenting]
Edoardo Filippi-Mazzola - Universita della Svizzera Italiana (Switzerland)
Abstract: A model-based approach is proposed for clustering categorical data without a natural ordering. The proposed method leverages the Hamming distance to create a family of probability mass functions for modelling the data. These functions serve as kernels within a finite mixture model with an undetermined number of components. Conjugate Bayesian inference has been developed for the parameters of the Hamming distribution model. The mixture is situated within a Bayesian nonparametric framework, and a trans-dimensional blocked Gibbs sampler is introduced to facilitate comprehensive Bayesian inference on the number of clusters, their structure, and the group-specific parameters. This approach simplifies computation compared to traditional reversible jump algorithms. The proposed model includes a parsimonious latent class model as a special case when the number of components is predetermined. Model performance is evaluated through simulation studies and benchmark datasets, demonstrating improvements in clustering accuracy over existing methods.