COMPSTAT 2023: Start Registration
View Submission - COMPSTAT2023
A0287
Title: Association-based distances for categorical and mixed-type data Authors:  Alfonso Iodice D Enza - Universita di Napoli Federico II (Italy) [presenting]
Michel van de Velden - Erasmus University Rotterdam (Netherlands)
Carlo Cavicchia - Erasmus University Rotterdam (Netherlands)
Angelos Markos - Democritus University Of Thrace (Greece)
Abstract: Several statistical methods are based on distances, that is, the quantification of the differences among observed values in a set of attributes. The definition of distance is not unique as it depends on the attributes describing the observations, and on the problem at hand. Distances between continuous observations result from the aggregation of attribute-wise differences, and attribute correlations may or may not be taken into account. For categorical observations, simplistic mis-matches counting aside, the definition of distance/dissimilarity is less intuitive, nor it is unique: several distance measures have been proposed, and choosing one is subjective, more so in unsupervised learning. In the mixed-data case, distances computation requires further choices, mostly to balance out the impact of the continuous and categorical attributes. An association-based distance for categorical and mixed data is proposed that takes into account the categorical/categorical and continuous/categorical attributes relations.