B1954
Title: Kernel-based extension of the halfspace depth
Authors: Arturo Castellanos - Telecom Paris (France) [presenting]
Pavlo Mozharovskyi - LTCI, Telecom Paris, Institut Polytechnique de Paris (France)
Florence d Alche-Buc - Telecom Paris (France)
Hicham Janati - Telecom Paris (France)
Abstract: Introduced by John W. Tukey in 1975, data depth is a statistical function that measures centrality of an observation with respect to a distribution or a dataset in multivariate space. In particular, the halfspace depth, by exploiting the geometry of data, is non-parametric and robust, and is used in a variety of tasks as a generalisation of quantiles in higher dimensions. Despite its desirable statistical properties, halfspace depth is often criticised - in particular among the machine learning community - for its inability to treat various types of data, its high computational cost, and the difficulty it has in reflecting multimodality of distributions. To improve on these aspects and unlock data depth computations for further types of data in a generic way, here, we propose an extension of the halfspace depth based on radial-basis kernels. We further show that the proposed depth notion not only satisfies desirable finite-sample and asymptotic properties, but is also able to treat multimodal data, and is optimisable using fast techniques such as gradient-descent. Finally, properties of this new depth are confirmed by simulations and real-data studies, including anomaly detection and rank tests.