View Submission

B1951

Title: Robust classification under sparse adversarial attacks for vision applications Authors: Payam Delgosha - UIUC (United States) [presenting]
Abstract: It is well known that machine learning models are vulnerable to small but cleverly designed adversarial perturbations that can cause misclassification. In order to have interpretable machine learning for scientific applications, it is crucial to make learning algorithms robust against such perturbations. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. We consider classification in the presence of L0-bounded adversarial perturbations, a.k.a. sparse attacks. This setting is significantly different from other $L_p$-adversarial settings, with $p\ge 1$, as the $L_0$-ball is non-convex and highly non-smooth. We discuss the fundamental limits of robustness in the presence of sparse attacks. Motivated by the theoretical success of the proposed algorithm, we discuss how to incorporate truncation as a new component into a neural network architecture, and verify the robustness of the proposed architecture against sparse attacks through several experiments in the vision domain. Finally, we investigate the generalization properties and sample complexity of adversarial training in this setting.