CFE-CMStatistics 2025: Start Registration
View Submission - CFE-CMStatistics 2025
A0972
Title: Model-based clustering of categorical data based on the Hamming distance Authors:  Lucia Paci - Universita Cattolica del Sacro Cuore (Italy) [presenting]
Raffaele Argiento - Università degli Studi di Bergamo (Italy)
Abstract: A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with an unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting, and a trans-dimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure, and the group-specific parameters, facilitating the computation with respect to customary reversible jump algorithms. Extensions to overcome the independence assumption of the variables within the clusters are discussed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches.