CMStatistics 2020: Start Registration
View Submission - CMStatistics
B1167
Title: Fair learning: An optimal transport-based approach Authors:  Paula Gordaliza - Basque Center for Applied Mathematics (Spain) [presenting]
Eustasio del Barrio - Universidad de Valladolid (Spain)
Jean-Michel Loubes - University of Toulouse (France)
Fabrice Gamboa - University of Toulouse (France)
Laurent Risser - University of Toulouse (France)
Philippe Besse - Institut de Mathematiques de Toulouse (France)
Abstract: The generalization of applications based on ML models in everyday life and the professional world has been accompanied by concerns about the ethical issues that may arise from the adoption of these technologies. First, we motivate the fairness problem by presenting some comprehensive results from the analysis of the disparate impact on a real dataset. We show that trying to make fair ML models may be a particularly challenging task, especially when the training observations contain bias. Then a review of Mathematics for fairness in ML is given with some novel contributions in the analysis of the price for fairness in regression and classification. We recast the links between fairness and predictability in terms of probability metrics. We analyze repair methods based on mapping conditional distributions to the Wasserstein barycenter and propose a random repair. Secondly, we consider the asymptotic theory of the empirical transportation cost. We provide a CLT for the Wasserstein distance between two empirical distributions with different sizes, for observations on R. In the case $p>1$, the assumptions are sharp in terms of moments and smoothness. We prove results dealing with the choice of centering constants. We provide a consistent estimate of the asymptotic variance, which enables to build two-sample tests and confidence intervals to certify the similarity between two distributions. These are used to assess a new criterion of dataset fairness in classification.