EcoSta 2023: Start Registration
View Submission - EcoSta2023
A0275
Title: STEEL: Singularity-aware reinforcement learning Authors:  Zhengling Qi - The George Washington University (United States) [presenting]
Abstract: Batch reinforcement learning (RL) aims to find an optimal policy in a dynamic environment to maximize the expected total rewards by leveraging pre-collected data. A fundamental challenge behind this task is the distributional mismatch between the batch data-generating process and the distribution induced by target policies. Nearly all existing algorithms rely on the absolutely continuous assumption of the distribution induced by target policies with respect to the data distribution so that the batch data can be used to calibrate target policies via the change of measure. However, the absolute continuity assumption could be violated in practice, especially when the state-action space is large or continuous. A new batch RL algorithm is proposed without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions. Our algorithm is motivated by a new error analysis on off-policy evaluation, where maximum mean discrepancy, together with distributionally robust optimization, are used to characterize the error of off-policy evaluation caused by the possible singularity and to enable the power of model extrapolation. By leveraging the idea of pessimism and under some mild conditions, a finite-sample regret guarantee is derived for our proposed algorithm without imposing absolute continuity.