EcoSta 2023: Start Registration
View Submission - EcoSta2023
A1275
Title: Post-episodic reinforcement learning inference Authors:  Ruohan Zhan - Hong Kong University of Science and Technology (Hong Kong) [presenting]
Vasilis Syrgkanis - Stanford (United States)
Abstract: Estimation and inference with data collected from episodic reinforcement learning (RL) algorithms are considered, i.e. adaptive experimentation algorithms that at each period (aka episode) sequentially interact multiple times with a single treated unit. The goal is to evaluate counterfactual adaptive policies after data collection and estimate structural parameters such as dynamic treatment effects, which can be used for credit assignment (e.g., what was the effect of the first-period action on the final outcome). Such parameters of interest can be framed as solutions to moment equations but not minimizers of a population loss function, leading to Z-estimation approaches in the case of static data. However, such estimators fail to be asymptotically normal in the case of adaptive data collection. A re-weighted Z-estimation approach is proposed with carefully designed adaptive weights to stabilize the episode-varying estimation variance resulting from the nonstationary policy typical episodic RL algorithms invoke. Proper weighting schemes are identified to restore the consistency and asymptotic normality of the re-weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing reliable confidence regions for target parameters of interest. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.