Title: An optimal nearest neighbor hot deck imputation based on a b-matching problem
Authors: Jan Pablo Burgard - Trier University (Germany)
Sven de Vries - Trier University (Germany)
Ulf Friedrich - Trier University (Germany)
Dennis Kreber - Trier University (Germany) [presenting]
Abstract: Almost all population surveys suffer from missing responses, inhibiting the direct application of estimation methods requiring complete data sets. In official statistics usually single imputation methods are applied to create one complete and coherent data set. A prominent single imputation variant is the nearest neighbor hot deck imputation. It replaces the missing values with observed values from the closest donor given a distance, e.g. the Gower distance. A repeated assignment of donors to non-respondents may lead to distortions in the distribution of the data. To avoid such a problem the proposed method allows to limit the number of maximum donations per unit. Then the sum over all distances between imputation pairs is minimized globally. This leads to a maximum weighted b-matching problem that is solved exactly by a combinatorial optimization procedure. The proposed method is compared to existing single imputation methods within a large scale Monte-Carlo simulation based on the Amelia dataset. The estimation of a total is studied under different missing patterns for a variety of variables, e.g. income. Variance estimation is performed via bootstrap. The Monte Carlo simulation indicates that the imposed restriction on the reuse of donors may lead to a lower bias and variance.