Title: Handling missing data with multiple imputation using LC models to investigate predictors of HPV infection
Authors: Roberto Di Mari - University of Rome Tor Vergata (Italy) [presenting]
Jlenia Caccetta - Sapienza University of Rome (Italy)
Maura Mezzetti - University of Rome Tor Vergata (Italy)
Abstract: A statistical method is proposed to investigate whether various risk factors can successfully predict HPV infection, despite missing data, using data on a cohort of 864 women, with information on HPV risk level and 67 explanatory variables. Each record has at least one item non-response, with a total of 27$\%$ of missing values. Dealing with missing data, a very common issue in medical applications, is possible through multiple imputation (MI): once the missing values are imputed, the analysis can be done with standard techniques, without extra effort to interpret results. Whereas saturated log-linear models for imputation are unfeasible if the number of covariates is too large, the proposal is to approximate the conditional distribution of the missing data given the observed data using latent class analysis (LCA). This makes sure that, given a sufficiently large number of mixture components, complex associations between the variables are captured by the imputation model; in addition, covariates are included to further improve its accuracy. To reflect the uncertainty about the model parameters, non-parametric bootstrap is implemented. This is the first time that MI with LCA is used in a medical application, and the results obtained are in line with pathophysiology and the literature on HPV.