Title: Identifying differential distributions using the 2-Wasserstein distance, with application to single-cell RNA-sequencing
Authors: Roman Schefzik - German Cancer Research Center (DKFZ) (Germany) [presenting]
Abstract: A typical statistical issue is to check for differential distributions across two conditions. To address this, the use of the 2-Wasserstein distance is reviewed by putting related scattered literature results into an overarching context being useful in various applications. Specifically, the major causes of differences between distributions can be identified using a decomposition of the 2-Wasserstein distance into location, size and shape deviations. Moreover, different two-sample tests involving the 2-Wasserstein distance are presented: first, a semi-parametric, permutation-based test with a generalized Pareto distribution approximation, and second, a test based on asymptotic theory. Simulations using normal distribution models confirm the validity and usefulness of the findings. In an application, the concepts are specifically adapted to detecting differential gene expression distributions in data from single-cell RNA-sequencing, a recent biological breakthrough technology providing information from multiple individual cells. In particular, the adapted approach tests for differential proportions in zero expression using logistic regression, and for differences in non-zero expression using the semi-parametric 2-Wasserstein distance-based test. The competitiveness of the approach is confirmed in a real-data case study, in which known marker genes and biological patterns can be re-identified, along with additional insights. The methods are implemented in the R package waddR.