COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0154
Title: A sequential method to search for multiple outliers in multivariate data Authors:  Trijya Singh - Le Moyne College, Syracuse, NY (United States) [presenting]
Abstract: In usual multivariate analysis methods such as principal components, discriminant analysis and so on, the sample mean vector and covariance matrix are utilized. These can be strongly affected by the presence of only a few outliers. The problem of detecting outliers in multivariate data sets can be difficult because classical methods based on Mahalanobis distances may work well for identifying scattered outliers but perform poorly in the case of multiple clustered outliers. Methods based on robust Mahalanobis distances also do not perform well when the fraction of contamination is high and can also be computationally expensive. A method of detecting multiple outliers in multivariate data is proposed, which involves sequential testing of outliers and utilizes the leave-one-out approach at many stages. The proposed method is applied to a well-known data set, and it is shown that it is marginally better to first obtain a clean sample to estimate the mean vector and covariance matrix and then apply classically efficient methods rather than using inefficient robust rules for estimation and subsequent outlier detection.