A0759
Title: Selective inference for correlation thresholding
Authors: Arkajyoti Saha - University of California, irvine (United States) [presenting]
Daniela Witten - University of Washington (United States)
Jacob Bien - University of Southern California (United States)
Abstract: Testing whether a set of Gaussian variables, selected from the data, is independent of the remaining variables is considered. It is assumed that this set is selected via a very simple approach that is commonly used across scientific disciplines: A set of variables is selected for which the correlation with all variables outside the set falls below some threshold. Unlike other settings in selective inference, failure to account for the selection step leads, in this setting, to excessively conservative (as opposed to anti-conservative) results. The proposed test properly accounts for the fact that the set of variables is selected from the data and thus is not overly conservative. To develop the test, the event that the selection resulted in the set of variables in question is conditioned on. To achieve computational tractability, a new characterization of the conditioning event is developed in terms of the canonical correlation between the groups of random variables. In simulation studies and in the analysis of gene co-expression networks, it is shown that the approach has much higher power than a naive approach that ignores the effect of selection. A potential extension to testing independence of groups of variables selected through feature screening in a high-dimensional setup is also discussed.