CMStatistics 2023: Start Registration
View Submission - CMStatistics
B1040
Title: Efficient validation designs to support error-corrected analyses of EHR data Authors:  Pamela Shaw - Kaiser Permanente Washington Health Research Institute (United States) [presenting]
Bryan Shepherd - Vanderbilt University Medical Center (United States)
Jasper Yang - University of Washington (United States)
Thomas Lumley - University of Auckland (United States)
Abstract: Large epidemiologic studies often rely on data sources that are error-prone, such as those reliant on routine electronic health records data that were not collected for research purposes. Data errors in even a single covariate can bias multiple regression coefficients, including biasing coefficients of precisely measured variables. Error-prone outcome variables can be an additional source of bias, particularly when that error is related to other regression variables. Validation of a subsample of records is a practical way to obtain data regarding the nature of the errors, which can then be used to inform statistical adjustment methods to avoid error-induced biases in study analyses. Design-based estimation methods are attractive in settings where errors in multiple variables may be too complex to model reliably. The efficiency of these estimators can be improved by sampling more informative subjects into the validation subset. Strategies are presented to improve the efficiency of design-based estimators, which include generalized raking, multi-wave sampling, and strategies that can accommodate multiple outcomes of interest. Concepts are demonstrated with numerical studies and application to real data.