A1041
Title: Design and analysis of a multi-wave two-phase study to addresses data errors in a multinational HIV research network
Authors: Bryan E Shepherd - Vanderbilt University Medical Center (United States) [presenting]
Abstract: Routinely collected observational data from clinical and laboratory encounters are commonly used for HIV/AIDS research. There are worries about the quality of these data. The experience designing and carrying out a multi-wave validation study is described in over a dozen sites across Latin America and East Africa. The interest was to estimate the incidence of and risk factors for Kaposi Sarcoma (KS) among people living with HIV. The original error-prone dataset had data on over 257,000 patients, approximately 2,300 (<1\%) with KS. A two-wave validation sample of approximately 1,000 records is designed. Optimal sampling designs that minimize the variance of resulting estimators require information that is typically not available prior to doing the data validation. Hence, approximately 500 records in a first sampling wave are validated, and this information is used to optimize the design of the second sampling wave. Finally, the analyses combined the two-waves of validation data collected in the subset of 1,000 records with the error-prone data available on the full cohort using generalized raking and multiple imputation techniques to efficiently account for the errors in the original data.