View Submission

B0644

Title: Kernel epsilon-greedy strategy for contextual bandits Authors: Sakshi Arya - Case Western Reserve University (United States) [presenting]
Bharath Sriperumbudur - Pennsylvania State University (United States)
Abstract: Contextual bandit algorithms are popular for sequential decision-making in several practical applications, ranging from online advertisement recommendations to mobile health. The goal of such problems is to maximize cumulative reward over time for a set of choices/arms while considering covariate (or contextual) information. Epsilon-greedy is a popular heuristic for the multi-armed bandit problem, however, it is not one of the most studied algorithms theoretically in the presence of contextual information. The epsilon-greedy strategy is studied in nonparametric bandits, i.e. when no parametric form is assumed for the reward functions. The similarities between the covariates and expected rewards are assumed to be modelled as arbitrary linear functions of the contexts' images in a specific reproducing kernel Hilbert space (RKHS). A kernel epsilon-greedy algorithm is proposed and its convergence rates are established for estimation and cumulative regret, which are closely tied to the intrinsic dimensionality of the RKHS. The rates closely match the optimal rates for linear contextual bandits when restricted to a finite-dimensional RKHS.