EcoSta 2023: Start Registration
View Submission - EcoSta2023
A0913
Title: Distance-based run tests from complex high-dimensional data Authors:  Debashis Mondal - Washington University in St Louis (United States) [presenting]
Arpita Mukherjee - Meta (United States)
Abstract: Distance-based two-sample run tests and computation for analyzing complex, high-dimensional data that arise from compositions, trees, graphs, or networks, are discussed. The distances considered are all non-Euclidean. They could be either non-metric dissimilarities that do not satisfy any triangular inequalities or even just discrete numbers, but they all arise from conditionally positive definite kernels. Examples of distances include the Bray- Curtis dissimilarity, the Unifrac metric, the Aitchison distance, graph kernels, spectral distances, and other distances based on optimal transport problems. The run test is constructed by counting runs along the shortest Hamiltonian paths/ loops of the data points. These run tests are shown to be exact, distribution-free, and consistent as the dimension of the data points goes to infinity, but the total number of data points is fixed. Asymptotic results are provided when the number of data points goes to infinity by expanding previous work. The method is illustrated through a simulation study with the Ewens sampling formula and a suite of dissimilarity measures. Two applications are further presented; one concern checking homogeneity across the forest dynamic plot of Barro Colorado Island, Panama, and the other analyzes 16S microbial community data. The work is supported by an NSF grant from the Division of Mathematical Sciences.