Title: Single imputation by data depth
Authors: Pavlo Mozharovskyi - Centre Henri Lebesgue (France) [presenting]
Julie Josse - Agrocampus Ouest (France)
Francois Husson - Agrocampus Rennes (France)
Abstract: Single imputation is an appropriate technique to handle missing data if one simply needs to complete a single data set, when no inference is required, when the applied statistical method is computationally too demanding for multiple data sets, or when a few values are missing only but one seeks an alternative to the list-wise deletion. The presented methodology for single imputation of missing values borrows the idea from data depth - a measure of centrality defined for an arbitrary point of the space with respect to a probability distribution or a data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. Being able to grasp the underlying data topology, the procedure is distribution free, allows to impute close to the data, preserves prediction possibilities different to local methods (nearest neighbor imputation, random forest), and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that its particular case - when using Mahalanobis depth - has direct connection to well known treatments for multivariate normal model, such as iterated regression or regularized PCA. Simulation and real data studies contrast the procedure with existing popular alternatives.