EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1030
Title: Model-based clustering for categorical data via Hamming distance Authors:  Raffaele Argiento - Università degli Studi di Bergamo (Italy) [presenting]
Lucia Paci - Universita Cattolica del Sacro Cuore (Italy)
Edoardo Filippi-Mazzola - Universita della Svizzera Italiana (Switzerland)
Abstract: A model-based approach is introduced for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to model categorical data by defining a family of probability mass functions. The elements of this family are considered kernels of a finite mixture model with an unknown number of components. Fully Bayesian inference is provided using a sampling strategy based on a trans-dimensional blocked Gibbs sampler, facilitating the computation with respect to the customary reversible-jump algorithm. Model performances are assessed via a simulation study, showing improvements in clustering recovery over existing approaches. Finally, the method is illustrated with an application to reference datasets.