EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1270
Title: Understanding the difficulty of achieving dynamic optimality in time inconsistent problems Authors:  Jingxiang Tang - Nanyang Technological University (Singapore) [presenting]
Abstract: The variance of cumulative rewards arises naturally as part of the decision-making criterion in much important reinforcement learning (RL) applications, such as portfolio and resource allocations. The time inconsistency induced by such a criterion makes the search for globally-optimal policy difficult. Many proposals have been made to resolve this, with episodic policy gradient (EPG) as one popular method. This paper highlights the difficulties of actually attaining global optimality with EPG and introduces alternative optimality: subgame perfect equilibrium (SPE) that is achievable in RL. Both optimality types on portfolio optimization and optimal execution problems in finance are empirically evaluated. Our results suggest that there are some instances where EPG does not learn the desired globally optimal policy while SPE provides a better solution.