CMStatistics 2023: Start Registration
View Submission - CMStatistics
B0953
Title: Outlier detection in regression via mixed-integer optimization Authors:  Andres Gomez - University of Southern California (United States) [presenting]
Abstract: Common statistical techniques fail if the data used to train the model is corrupted by gross errors or outliers. In fact, even the presence of a single outlier may cause estimators to result in arbitrarily large errors. Several robust estimators have been proposed in the statistical literature, which automatically detect and discard outliers before fitting a model using the remaining data. Unfortunately, the resulting training problem is NP-hard and challenging to solve, even with modern optimization techniques. Thus, practitioners typically resort to heuristics, which have inferior statistical properties and may result in low-quality solutions unless stringent assumptions on the data-generation process are made. Recent results are discussed on mixed-integer optimization techniques to detect outliers in regression problems. In particular, conic formulations are proposed that are at least two orders of magnitude faster than natural big-M formulations that have been recently proposed in the literature. The resulting methods deliver solutions that are significantly better than existing heuristic methods.