EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0613
Title: Data augmentation using aggregate statistics from big data and survey: 2nd order delta-method and bootstrap inference Authors:  Ryung Kim - Albert Einstein College of Medicine (United States) [presenting]
Abstract: It is often useful to analyze large but potentially biased big data jointly with smaller gold-standard surveys. Health surveys with higher quality and standardized measurements can benefit from the augmentation of electronic health records (EHR) originally collected for administrative and billing purposes. We recently showed the efficiency of an estimator that pools aggregate statistics from two sources. However, it remains unknown how to perform statistical inference based on the estimator. We develop two methods for statistical inference based on the Mosteller estimator that employ correction of bias and skewness: the second-order delta method, and a modified version of the biased-corrected and accelerated bootstrap approach. The methods are based on aggregated statistics obtained from two sources, one of which is potentially biased. In the numerical study, these methods provide valid coverage rates while the nave plug-in method and the first order delta method do not. Finally, the methods are demonstrated with two databases in South Korea: the Korea National Health and Nutrition Examination Survey and the National Health Insurance Service Sample Cohort. The prevalence of uncontrolled diabetes in the senior population was estimated typically be lower in the EHR database of health examinations compared to the gold-standard health survey. The proposed confidence intervals almost always were shorter than the interval solely based on the health survey.