A1285
Title: Genealogical application of record linkage for black Americans in the Antebellum South
Authors: Hannah Butler - Colorado State University (United States) [presenting]
Andee Kaplan - Colorado State University (United States)
Abstract: Record linkage is used to connect records that come from the same entity across multiple data sources. Probabilistic record linkage is utilized without reliable identifying information to estimate probabilities that two records refer to the same entity. Entities recognized by alternate information in different contexts can manifest as multiple distinct records for a single entity appearing within or between different data sources. An alias is the occurrence of one or more duplications of an entity within a file, not due to error but rather due to a known alternative piece of information. Aliases are separate parts of the story and can provide richer data to link records. However, data containing aliases requires a more careful approach to statistical inference. In existing record linkage methodologies, pre- or post-hoc processing may be done to avoid or remove conflicting links due to aliases. This has the consequence of losing potentially valuable information or impairing the ability to quantify uncertainty. A fully Bayesian approach is proposed to record linkage that expands the existing methodology to account for and leverage known aliases of entities within data files to be linked. This approach also allows for uncertainty quantification and requires no post-hoc processing of link estimation. The performance of this approach is demonstrated in simulation, and the model is applied to two sources of data from Freedom-Seekers in the Antebellum South.