EcoSta 2022: Start Registration
View Submission - EcoSta2022
A1003
Title: Clustering categorical data via Hammingd istance Authors:  Raffaele Argiento - Università degli Studi di Bergamo (Italy) [presenting]
Lucia Paci - Universita Cattolica del Sacro Cuore (Italy)
Edoardo Filippi-Mazzola - Universita della Svizzera Italiana (Switzerland)
Abstract: Clustering methods have typically found their application when dealing with continuous data. However, in many modern applications data consist of multiple categorical variables with no natural ordering. In the heuristic framework, the problem of clustering these data is tackled by introducing suitable distances. We develop a model-based approach for clustering categorical data with a nominal scale. The approach is based on a mixture of distributions defined via the Hamming distance between categorical vectors. Maximum likelihood inference is delivered through an expectation-maximization algorithm. A simulation study is carried out to illustrate the proposed approach.