COMPSTAT 2024: Start Registration
View Submission - COMPSTAT2024
A0387
Title: Shapley values for regression models with interactions Authors:  Mark van de Wiel - Amsterdam University Medical Centers (Netherlands) [presenting]
Abstract: In machine learning, Shapley values are popular variable importance metrics, as they provide a unique combination of properties, such as efficiency and linearity, which enhances their use and interpretability. For regression models with interactions, quantification of variable importance is not trivial either, as the variable's contribution is present in multiple terms. The use of Shapley values in this context is argued, and a computationally efficient formula is derived to compute those for the model. Importantly, when applying appropriate shrinkage, it is shown that appropriate inference is available via credible intervals. The Shapley values are illustrated in a large epidemiological study. First, the regression model is demonstrated with two-way interactions outperforming a regression model with main effects only and a random forest in terms of prediction in a setting with sample size $n=1000$, $p=14$ variables and $q = 85$ two-way interactions. Hence, the model is a good candidate for further use. Then, the Shapley values and their uncertainties illustrate how variable importance differs across individuals due to the interaction terms. It visualizes how the Shapley value nicely decomposes into contributions of the main effect and the interactions, which allows the assessment of the relative importance of these effects. All in all, the aim is to show that Shapley values are a useful addendum to the statistician's toolbox for interpreting non-trivial regression models.