View Submission

A0441

Title: Value enhancement of reinforcement learning via efficient and robust trust region optimization Authors: Fan Zhou - Shanghai University of Finance and Ecnomics (China) [presenting]
Abstract: Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision-making. Most of the methods in the existing literature are developed in online settings where the data are easy to collect or simulate. Motivated by high-stake domains such as mobile health studies with limited and pre-collected data, offline reinforcement learning methods are studied. To efficiently use these datasets for policy optimization, a novel value enhancement method is proposed to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms. Specifically, when the initial policy is not consistent, the method will output a policy whose value is no worse and often better than that of the initial policy. When the initial policy is consistent, under some mild conditions, the method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired value enhancement property. The proposed method is generally applicable to any parametrized policy that belongs to a certain pre-specified function class (e.g., deep neural networks). Extensive numerical studies are conducted to demonstrate the superior performance of the method.