EcoSta 2023: Start Registration
View Submission - EcoSta2023
A0630
Title: Adaptive prior distributions for record linkage tasks Authors:  Brenda Betancourt - NORC at the University of Chicago (United States) [presenting]
Abstract: In database management, record linkage aims to identify multiple records that correspond to the same individual. Record linkage can be treated as a clustering problem in which one or more noisy database records are associated with a unique latent entity. In contrast to traditional clustering applications, a large number of clusters with a few observations per cluster is expected. Hence, two new classes of prior distributions based on exchangeable sequences of clusters and allelic partitions are proposed for the small cluster setting of record linkage. The proposed priors facilitate the introduction of information about the cluster size distribution at different scales and naturally enforce sublinear growth of the maximum cluster size, known as the micro clustering property. In addition, a set of novel micro clustering conditions are introduced to impose further constraints on the cluster sizes a priori. The performance of the proposed classes of priors is evaluated using simulated data and official statistics data sets.