EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0938
Title: Kernelized epsilon-greedy algorithm for nonparametric bandits with covariates Authors:  Sakshi Arya - Case Western Reserve University (United States) [presenting]
Abstract: Multi-armed bandit algorithms are popular for sequential decision-making in several practical applications, ranging from online advertisement recommendations to mobile health. The goal of such problems is to maximize cumulative reward over time for a set of choices/arms while considering covariate (or contextual) information. Epsilon-greedy is a popular heuristic for the multi-armed bandit problem; however, it is not one of the most studied theoretical algorithms for the presence of contextual information. The epsilon-greedy strategy is studied in nonparametric bandits, i.e. when no parametric form is assumed for the reward functions. The similarities between the covariates and expected rewards are assumed to be modeled as arbitrary linear functions of the contexts' images in a specific reproducing kernel Hilbert space (RKHS). A kernel epsilon-greedy algorithm is proposed, and its convergence rates are established for estimation and cumulative regret, with only a boundness assumption of the errors. The rates closely match the optimal rates for linear contextual bandits when restricted to a finite-dimensional RKHS. The results are then compared with existing algorithms like kernel upper confidence bounds (UCB) on synthetic data.