A0186
Title: Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes
Authors: Yichun Hu - Cornell University, Johnson Graduate School of Management (United States) [presenting]
Abstract: A nonparametric contextual bandit problem is studied in which the expected reward functions belong to a Holder class with smoothness parameter $\beta$. It is shown how this interpolates between two extremes that were previously studied in isolation: Nondifferentiable bandits ($\beta$ at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite $\beta$), with which rate-optimal regret can be achieved with minimal or no exploration because of infinite extrapolatability. A novel algorithm is developed that carefully adjusts to all smoothness settings, and its regret is proven to be rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, the gap is bridged between the existing literature on parametric and nondifferentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.