Title: Optimal nonparametric disclosure risk assessment
Authors: Francesca Panero - University of Oxford (United Kingdom) [presenting]
Stefano Favaro - University of Torino and Collegio Carlo Alberto (Italy)
Federico Camerlenghi - University of Milano-Bicocca and Collegio Carlo Alberto (Italy)
Zacharie Naulet - University of Toronto (Canada)
Abstract: An original nonparametric estimator is presented for the number of unique individuals in a sample that are also unique in the population, a classical measure of disclosure risk for microdata files. This estimator is easy to derive, scalable to massive datasets and can be interpreted as empirical Bayesian. We prove that it is nearly optimal by showing that the limit of predictability of it, in terms of vanishing normalized mean squared error, matches asymptotically with the maximum possible value that can be achieved by any nonparametric estimator. In particular, for a sample of size $n$ and a population of size $n+\lambda n,\lambda>0$, we show that our estimator is optimal for $\lambda$ growing not faster than the logarithm of $n$. This result answers a long standing question about the feasibility of nonparametric estimation of this problem under the only assumption of the Poisson abundance model.