View Submission

B0386

Title: Thompson sampling with discrete prior Authors: Wei Zheng - University of Tennessee (United States) [presenting]
Xueru Zhang - Purdue University (United States)
Lan Gao - University of Tennessee Knoxville (United States)
Abstract: Thompson sampling is a popular algorithm for multi-armed bandit problems, but its Bayesian posterior update can be computationally expensive for complex reward distributions. Recently, prior discretization has been proposed to address this issue. A new prior discretization method is proposed that guarantees the same regret rate without requiring the unreasonable assumption that the true value of the parameter is one of the discrete points. Additionally, a modified posterior update approach is introduced that further improves the performance of discrete prior Thompson sampling. It is proven that the accumulated regret has a $O(log(T))$ convergence rate with high probability. In addition, numerical experiments are conducted to validate the theoretical analysis and demonstrate that the proposed algorithm outperforms both the standard discrete prior method and the Laplace approximation approach for the continuous prior.