Title: Posterior prototyping for Bayesian entity resolution
Authors: Andee Kaplan - Colorado State University (United States) [presenting]
Brenda Betancourt - ()
Rebecca Steorts - Duke University (United States)
Abstract: Entity resolution (record linkage or de-deduplication) is the process of merging noisy databases to remove duplicate entities, often in the absence of a unique identifier. One major challenge of linked data is identifying the most representative record to pass to an inferential or predictive task - the downstream task. To bridge the gap between entity resolution and the downstream task, we propose four methods - prototyping - to choose a representative record from linked data. The result is a representative data set to be passed on to the downstream task. To illustrate our proposed methodology, we first perform Bayesian entity resolution where the error can be propagated through to the downstream task. Second, we evaluate our proposed methods for prototyping. Third, we consider the downstream task of linear regression. The proposed methodology is illustrated and evaluated on five entity resolution data sets.