CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0767
Title: Mixed variables distances Authors:  Michel van de Velden - Erasmus University Rotterdam (Netherlands) [presenting]
Alfonso Iodice D Enza - Universita di Napoli Federico II (Italy)
Angelos Markos - Democritus University Of Thrace (Greece)
Carlo Cavicchia - Erasmus University Rotterdam (Netherlands)
Abstract: Gower's general coefficient of similarity provides an elegant and simple way to measure similarity between observations based on measurements of multiple variables of different types. That is, variables can be either numerical, binary, ordinal or categorical. The presence of variables of different types is referred to as mixed variables. Although alternative proposals allow distance calculations in mixed variables contexts, Gower's proposal remains popular. However, Gower's coefficient is typically used with "basic" settings; the original paper allows for quite some implementation flexibility. This flexibility is used and alternatives are proposed that overcome some of the shortcomings of the default implementation. In particular, using a very general framework for implementing distances for categorical data, a highly adaptable measure is proposed for dissimilarity for mixed variables that can easily be implemented and customized.