A0400
Title: OT rank tests for heterogeneous non-parametric two-sample testing
Authors: Ritwik Sadhu - Cornell University (United States) [presenting]
Nilanjan Chakraborty - Missouri University of Science and Technology (United States)
Trambak Banerjee - University of Southern California (United States)
Abstract: In many modern inference tasks, the data-generation process contains inherent source distribution heterogeneity, often resulting in multi-modality of the composite dataset on which statistical procedures must be performed. Traditional parametric methods tailored towards uni-modal data are inefficient when applied to such data, while non-parametric methods (including multivariate non-parametric methods) still hold some chance of success, at least for traditional inference questions such as testing equality of two distributions or independence of paired observations. However, data containing distributional heterogeneity opens up the chance to ask some more sophisticated inference questions, such as the testing for the presence of an entirely new component distributions in a subset of the data. For this testing problem, henceforth called the remodeling problem, a new test statistic is proposed, constructed using optimal transport (OT)-based multivariate ranks, which allows for asymptotically consistent testing with a correct asymptotic level under the assumption of well-separated component distributions in the null. To the best of knowledge, this is the first statistic in the literature for this problem with the aforesaid guarantees.