Title: A Neyman-Pearson approach to feature ranking
Authors: Xin Tong - University of Southern California (United States) [presenting]
Abstract: Binary classification problems arise frequently in biomedical applications, such as cancer diagnosis using gene expression data. An important question in both basic science research and clinical applications is what genes have the highest predictive power for a certain type of cancer because these genes are possibly cancer driver genes that may serve as treatment targets and/or biomarkers that may improve diagnosis accuracy. Cancer diagnosis belongs to the type of binary classification where the two types of misclassification errors do not have the same priority, because misclassifying a diseased patient as healthy vs. misclassifying a healthy patient as disease would result in severely different consequences. We propose a feature ranking method under the NP paradigm, NP-Rank, motivated by the cancer diagnosis. NP-Rank ranks features based on their type II errors (the less severe type of misclassification error) with their type I errors (the more severe type of error) controlled under a user-specified threshold with high probability. NP-Rank has desirable theoretical guarantees when used with density plug-in classifiers. Extensive numerical studies show that NP-Rank, used with popular classification methods such as Logistic regression, outperforms traditional ranking methods under the classical paradigm.