CFE 2019: Start Registration
View Submission - CMStatistics
B1065
Title: Scaling Bayesian probabilistic record linkage with post-hoc blocking Authors:  Jared Murray - University of Texas at Austin (United States) [presenting]
Abstract: Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution). However, computational considerations severely limit the practical applicability of existing Bayesian approaches. We propose a new computational approach yielding a restricted MCMC algorithm that samples from an approximate posterior distribution. Our advances make it possible to efficiently perform Bayesian PRL for large problems. We demonstrate the methods on a subset of an OCR'd dataset, the California Great Registers, a collection of 57 million voter registrations from 1900 to 1968 that comprise the only panel data set of party registration collected before the advent of scientific surveys.