A0488
Title: Semiparametric contextual bandits with sufficient dimension reduction
Authors: Sakshi Arya - Case Western Reserve University (United States) [presenting]
Abstract: A novel semi-parametric framework is introduced for batched contextual multi-armed bandits that leverages a single-index regression model to flexibly capture relationships between covariates and arm rewards. The proposed algorithm, batched single-index dynamic binning and successive arm elimination (BIDS), combines dynamic binning based on the estimated single-index direction with a successive arm elimination strategy. This approach accommodates both settings where a pilot index direction is known and where it must be estimated from data. For both cases, theoretical regret guarantees are derived, and it is shown that, when the single-index direction is estimated with sufficient accuracy, BIDS achieves minimax-optimal regret rates comparable to nonparametric bandits with a one-dimensional covariate, thereby circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate that BIDS outperforms existing nonparametric batched bandit methods in both sample efficiency and empirical performance, establishing the practical value of leveraging single-index structures in batch decision-making.