Title: Imputation and post-selection inference in models with missing data
Authors: Karen Messer - University of California, San Diego (United States) [presenting]
Abstract: It is common to encounter missing data among potential predictor variables in the setting of model selection. For example, we recently attempted to improve the commonly used risk prediction model for colorectal cancer screening, using pooled data from seven different large prospective studies. However, several important potential predictors were missing for more than half of subjects. Multiple imputation can effectively address missing data, and there are effective methods to incorporate variable selection into inference. However, there is not consensus on appropriate methods to address both issues simultaneously. We compare three approaches to such post-imputation-selection inference: a multiple- imputation approach; a single imputation-selection followed by bootstrap percentile confidence intervals; and a new bootstrap model-averaging approach. The `Rubin's' Rules' multiple imputation estimator can have severe under coverage, and is not recommended. The imputation-selection estimator with bootstrap percentile confidence intervals works well. The bootstrap-model-averaged estimator, with the `Efron's Rules' estimated variance, may be preferred if the true effect sizes are moderate.