CFE 2019: Start Registration
View Submission - CMStatistics
Title: Electronic health record phenotyping using anchor-positive and unlabeled patients Authors:  Lingjiao Zhang - University of Pennsylvania (United States)
Xiruo Ding - University of Washington (United States)
Yanyuan Ma - The Pennsylvania State University (United States)
Naveen Muthu - Childrens Hospital of Philadelphia (United States)
Imran Ajmal - University of Pennsylvania (United States)
Jason H Moore - University of Pennsylvania (United States)
Daniel Herman - University of Pennsylvania (United States)
Jinbo Chen - University of Pennsylvania (United States) [presenting]
Abstract: Phenotyping patients in electronic health records (EHRs) conventionally relied on algorithms learned from labeled cases and controls. Assigning labels requires manual medical chart review and therefore is an intensive labor. We developed a phenotyping method when identification of gold-standard controls is prohibitive, so a validation set is not available. The method relies on a random subset of cases, which can be specified using an expert-derived anchor variable that has an excellent positive predictive value and sensitivity independent of predictors. Adopting a maximum likelihood approach to efficiently leveraging data from the anchor-labeled cases and unlabeled patients to develop logistic regression phenotyping models, we propose novel statistical methods for internally assessing model calibration and predictive performance measures. Upon identification of an anchor variable by clinical experts that is scalable and transferable to different practices, the approach should facilitate development of scalable, transferable, and practice-specific phenotyping models. Through phenotyping two cardiovascular conditions in Penn Medicine EHRs, we demonstrate that the proposed method enables accurate semi-automated EHR phenotyping with minimal manual labeling and therefore is expected to greatly facilitate EHR clinical decision support and research.