Title: Interpreting deep neural networks through variable importance
Authors: Jonathan Ish-Horowicz - Imperial College London (United Kingdom) [presenting]
Seth Flaxman - Imperial College London (United Kingdom)
Sarah Filippi - Imperial College London (United Kingdom)
Abstract: While the success of deep neural networks is well-established across a variety of domains, the ability to explain and interpret these methods is limited. Unlike previously proposed local methods which try to explain particular classification decisions, we focus on global interpretability and ask a universally applicable, and surprisingly understudied question: given a trained model, which features are the most important? In the context of neural networks, a feature is rarely important on its own, so our strategy is specifically designed to leverage partial covariance structures and incorporate variable interactions into our proposed feature ranking. The methodological contributions are three-fold. First, we propose a novel effect size analogue for the problem of global interpretability, which is appropriate for applications with highly collinear predictors (ubiquitous in computer vision). Second, we extend the recently proposed ``RelATive cEntrality'' (RATE) measure to the Bayesian deep learning setting. RATE applies an information theoretic criterion to the posterior distribution of effect sizes to assess feature significance. Unlike competing methods, our method has no tuning parameters to pick or costly randomization steps. Overall, we show state-of-the-art results applying our framework to several application areas including: computer vision, genetics, natural language processing, and social science.