CMStatistics 2021: Start Registration
View Submission - CMStatistics
B1416
Title: A pseudo-metric between probability distributions based on depth-trimmed regions Authors:  Guillaume Staerman - Inria, Universite Paris-Saclay (France) [presenting]
Pavlo Mozharovskyi - Telecom Paris, Institut Polytechnique de Paris (France)
Abstract: The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on a Euclidean space, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element x with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets---the depth-trimmed regions---give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. After discussing the properties of this pseudo-metric inherited from data depth, we provide conditions under which it defines a distance. Interestingly, the derived non-asymptotic bound shows that in contrast to the widely used Wasserstein distance, the proposed pseudo-metric does not suffer from the curse of dimensionality. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension.