COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0471
Title: The hierarchical clustering-based method powered by the bootstrap approach for multiple imputations in categorical data Authors:  Jaroslav Hornicek - Prague University of Economics and Business (Czech Republic) [presenting]
Zdenek Sulc - Prague University of Economics and Business (Czech Republic)
Hana Rezankova - Prague University of Economics and Business (Czech Republic)
Jana Cibulkova - Prague University of Economics and Business (Czech Republic)
Abstract: The imputation of missing values in nominal variables is a crucial yet underexplored area of research. A novel imputation method based on agglomerative clustering was proposed. This method clusters objects in the dataset using modified techniques to evaluate the similarity between objects with missing values, followed by imputing the missing values based on the derived hierarchical scheme. To enhance this approach, a bootstrap method was incorporated to enable multiple imputation. After that, two sets of simulated data were generated: one with missing values under the missing not at random mechanism and another under the missing at random mechanism. The imputation results on these sets were compared with those obtained using multiple imputations by the MICE and the EM algorithms, applying various evaluation criteria. The proposed techniques were implemented using advanced programming tools to increase computational speed, such as the Rcpp package, which integrates C++ within the R environment. The novel approach performs comparably to established algorithms, offering the additional advantage of being nonparametric. The results also showed the significant influence of the ratio of missing values, the number of categories in variables, and the moderate impact of the strength of association between variables.