A1069
Title: Feature screening with conditional rank utility for big-data classification
Authors: Chen Xu - University of Ottawa (Canada) [presenting]
Abstract: Feature screening is a commonly-used strategy to eliminate irrelevant features in high-dimensional classification. When one encounters big datasets with both high dimensionality and huge sample size, the conventional screening methods become computationally costly or even infeasible. A novel screening utility, Conditional Rank Utility (CRU), is introduced, and a distributed feature screening procedure for the big-data classification is proposed. The proposed CRU effectively quantifies the significance of a numerical feature on the categorical response. Since CRU is constructed based on the ratio of the mean conditional rank to the mean unconditional rank of a feature, it is robust against model misspecification and the presence of outliers. Structurally, CRU can be expressed as a simple function of a few component parameters, each of which can be distributively estimated using a natural unbiased estimator from the data segments. Under mild conditions, it is shown that the distributed estimator of CRU is fully efficient in terms of the probability convergence bound and the mean squared error rate; the corresponding distributed screening procedure enjoys the sure screening and ranking properties. Extensive numerical examples support the promising performances of CRU-based screening.