B0637
Title: Handling missing data in clustering using multiple imputation
Authors: Vincent Audigier - Conservatoire National des Arts et Metiers (France) [presenting]
Ndeye Niang - Conservatoire National des Arts et Metiers (France)
Abstract: Multiple imputation techniques are often used for addressing the missing data issue in statistical analysis. It is presented how it can be considered for addressing missing values in the context of clustering. To achieve this goal, a novel imputation method is presented entitled FCS-homo, as well as a pooling method for the set of partitions obtained from each imputed data set. The proposed methodology is evaluated using a simulation study in comparison with state-of-the-art methods. It started by treating the case where the observations are generated from a Gaussian mixture model with missing random values. Experiments are based on various real data sets where the distribution of the variables is unknown. These first results tend to show that multiple imputation is an efficient method for handling missing data in clustering, especially when the data distribution is unknown.