CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1721
Title: A distribution-free independence test for high dimension data Authors:  Zhanrui Cai - Iowa State University (United States) [presenting]
Abstract: Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. We propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high-dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a genetic dataset, where the high dimensionality makes existing methods hard to apply.