B1576
Title: Model-based clustering with cellwise outliers and missing data
Authors: Giorgia Zaccaria - University of Milano-Bicocca (Italy) [presenting]
Francesca Greselin - University of Milano Bicocca (Italy)
Agustin Mayo-Iscar - Universidad de Valladolid (Spain)
Abstract: In real-world applications, data is often affected by missing values and outliers. In the model-based clustering literature, outliers are typically treated as cases entirely contaminated (row-wise outliers). However, especially in high-dimensional settings, it is reasonable to assume that specific cells of a data matrix are contaminated rather than entire rows, with the remaining cells in the corresponding rows containing useful information to retain. A model-based clustering methodology is introduced which is able to handle missing data and cell-wise outliers. Parameter estimation is performed using an alternated expectation-conditional maximization algorithm, which includes a concentration step for detecting contaminated cells. The performance of the proposal is illustrated via its application to synthetic and real data sets.