CMStatistics 2015: Start Registration
View Submission - CMStatistics
B1099
Title: Assessing good fitting models in nonparametric disclosure risk estimation Authors:  Silvia Polettini - Sapienza Universita di Roma (Italy) [presenting]
Cinzia Carota - University of Turin (Italy)
Maurizio Filippone - University of Glasgow (United Kingdom)
Abstract: We report ongoing research on identification of models with good predictive performance for dislcosure risk estimation. Disclosure risk is usually estimated through parametric models on contingency tables of key variables that permit re-identifying sampled individuals. It has been proposed a Poisson model with rates explained by a log-linear mixed model with Dirichlet process (DP) random effects, to account for lack of fit, allowing to identify good fitting models with a simple fixed-effects structure. Given the size of contingency tables involved, the severe sparsity issues, and the complexity of the proposed Bayesian approach, formal model selection is challenging. The nonparametric main effects model is a starting point for the identification of an ``optimal'' model; as opposite to parametric models, sensitivity to model specification is low, and often adding a single two-way interaction, no matter which, leads to satisfactory performances. Note that the significant reduction of the space of models to be examined makes the issue of selection bias less important. We consider criteria to assess model's predictive performance. We investigate the effectiveness of such criteria when estimating the disclosure risk, in applications to real data and focus on the role of nonparametric random effects in reducing the selection bias.