CFE-CMStatistics 2024: Start Registration
View Submission - CFECMStatistics2024
A0701
Title: Halfspace depth as a classification loss: A machine learning viewpoint on statistical data depth Authors:  Arturo Castellanos - Telecom Paris (France) [presenting]
Pavlo Mozharovskyi - LTCI, Telecom Paris, Institut Polytechnique de Paris (France)
Hicham Janati - Telecom Paris (France)
Abstract: Data depth is a score function that quantifies how deep a point is inside a distribution (or a data set) and has applications in multivariate analysis, anomaly detection, classification, and statistical testing, to name a few. Historically, the very first notion of data depth has been the halfspace depth introduced by John W. Tukey, which generalises the notion of quantile to the multivariate setting. Taking a different angle from the quantile point of view, it is shown that halfspace depth can also be regarded as the minimum loss of a set of classifiers for a specific labelling of the observations. A natural extension proposed is to change to different sets of classifiers, well-known in the machine learning literature, such as support vector machines or neural networks. Properties such as statistical convergence and speed of the optimization programs are naturally inherited from the literature on those classifiers. How theory can help pick the hyperparameters to get the most sensible results is discussed, with supportive simulations and experiments on data.