A0530
Title: RKHS-based projection depths
Authors: Arturo Castellanos - Telecom Paris (France) [presenting]
Pavlo Mozharovskyi - Telecom Paris, Institut Polytechnique de Paris (France)
Florence d Alche-Buc - Telecom Paris (France)
Abstract: Data depth is a statistical function that measures the centrality of an observation with respect to a distribution or a data set in a multivariate space. By exploiting the geometry of data, the depth function is fully non-parametric, robust, satisfies affine invariance, and is used in a variety of tasks as a generalisation of quantiles in higher dimensions. Despite its desirable statistical properties, data depth is often criticized - in particular among the machine learning community - for its inability to treat various types of data, high computational cost and difficulty to reflect multimodality of distribution. To improve on these aspects and unlock data depth computations for further types of data in a generic way, here, the data depth is defined in a reproducing kernel Hilbert space (RKHS) after asymmetrizing its univariate constituent. Further, due to the richness of the RKHS space, the search should be restricted to a properly chosen subspace. This approach allows us not only to better tackle data with multimodal or non-convex support, but as well to run an optimisation routine for depth computation. The appealing properties of this new class of depths are confirmed by a simulation study and a real-data benchmark.