View Submission

A0634

Title: A robust test for the stationarity assumption in sequential decision making: Towards better policy under batch learning Authors: Zhenke Wu - University of Michigan (United States) [presenting]
Abstract: Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in sub-optimal policies learned under stationary assumptions. A doubly robust procedure is proposed for testing the stationarity assumption and detecting change points in offline RL settings, e.g., using data obtained from a completed sequentially randomized trial. The proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Simulations and a real-world interventional mobile health example illustrate the advantages of the method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.