Title: Enabling phenotypic big data with PheNorm
Authors: Sheng Yu - Tsinghua University (China) [presenting]
Tianxi Cai - Harvard School of Public Health (United States)
Abstract: EHR-based phenotyping infers whether a patient has a disease based on the information in their electronic health records (EHR). A human annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation as well as feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. We present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. PheNorm transforms predictive features, such as the number of ICD-9 codes or mentions of the target phenotype, to resemble a normal mixture distribution. The transformed features are then denoised and combined into a score for accurate disease classification. We validated the accuracy of PheNorm with four phenotypes: coronary artery disease, rheumatoid arthritis, Crohns disease, and ulcerative colitis. The AUC of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the four phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference.