CMStatistics 2019: Start Registration
View Submission - CMStatistics
B0914
Title: Posterior prototyping for Bayesian entity resolution Authors:  Andee Kaplan - Colorado State University (United States) [presenting]
Brenda Betancourt - NORC at the University of Chicago (United States)
Rebecca Steorts - Duke University (United States)
Abstract: Entity resolution (record linkage or de-deduplication) is the process of merging noisy databases to remove duplicate entities, often in the absence of a unique identifier. One major challenge of linked data is identifying the most representative record to pass to an inferential or predictive task - the downstream task. To bridge the gap between entity resolution and the downstream task, we propose four methods - prototyping - to choose a representative record from linked data. The result is a representative data set to be passed on to the downstream task. To illustrate our proposed methodology, we first perform Bayesian entity resolution where the error can be propagated through to the downstream task. Second, we evaluate our proposed methods for prototyping. Third, we consider the downstream task of linear regression. The proposed methodology is illustrated and evaluated on five entity resolution data sets.