A0285
Title: Sparse additive contextual bandits: A nonparametric approach for online decision-making with high-dimensional covariates
Authors: Wenjia Wang - HKUST (GZ) (China) [presenting]
Abstract: Personalized services are fundamental in the contemporary digital landscape, with their online decision-making often framed as contextual bandit problems, which are closely related to sequential design. Modern applications present two significant challenges for this framework: high-dimensional covariates and the necessity for nonparametric models to accurately reflect the complex relationships between rewards and covariates. A new contextual bandit algorithm is proposed, based on a sparse additive reward model, which effectively addresses both challenges. Following this, the statistical properties of the doubly penalized method applied to random regions are derived, incorporating new analyses under bandit feedback. It is proven that the cumulative regret of the algorithm is sublinear in the time horizon $T$ and grows linearly with the logarithm of the covariate dimensionality $\log(d)$. To the best of knowledge, this represents the first regret bound with polylogarithmic growth in~$d$ for nonparametric contextual bandits with high-dimensional covariates. Through extensive numerical experiments, the algorithm's superior performance is shown in high-dimensional settings compared to existing algorithms.