Title: Robust variable selection for model-based learning in presence of adulteration
Authors: Andrea Cappozzo - University of Milano Bicocca (Italy) [presenting]
Francesca Greselin - University of Milano Bicocca (Italy)
Thomas Brendan Murphy - University College Dublin (Ireland)
Abstract: The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated in the past years. In particular, several methods for variable selection in model-based classification have been proposed. Surprisingly, the impact that outliers and wrongly labelled units cause on the determination of relevant predictors has received far less attention, with almost no dedicated methodologies available in the literature. We introduce two robust variable selection approaches: one that embeds a robust classifier within a greedy-forward procedure and the other based on the theory of maximum likelihood estimation and irrelevance. The former recasts the feature identification as a model selection problem, while the latter regards the relevant subset as a model parameter to be estimated. An experiment on synthetic data is provided to underline the benefits of the proposed methods in contrast with non-robust solutions. An application to a high-dimensional classification problem of contaminated spectroscopic data is provided.