EcoSta 2024: Start Registration
View Submission - EcoSta2024
A0333
Title: Policy learning with continuous actions under unmeasured confounding Authors:  Yuhan Li - University of Illinois Urbana-Champaign (United States)
Eugene Han - University of Illinois Urbana-Champaign (United States)
Wenzhuo Zhou - University of California Irvine (United States)
Zhengling Qi - The George Washington University (United States)
Yifan Cui - Zhejiang University (China)
Ruoqing Zhu - University of Illinois at Urbana-Champaign (United States) [presenting]
Abstract: In the field of reinforcement learning applied to personalized medicine, unmeasured confounding variables often hinder the optimization of treatment policies, particularly in offline settings. While most existing methods focus on off-policy evaluation (OPE), they are generally not directly suited for learning optimal policies. For example, common assumptions that the behavior policy depends solely on unobserved state variables can be practically violated in real-world medical scenarios. A novel identification framework is introduced to estimate policy values accurately. This is achieved by identifying a set of variables that are not involved in policy determination but can potentially affect the reward. By appropriately constructing bridge functions, an optimal policy is learned based on observed states, thereby enabling practical implementation. The framework additionally tackles the dose-finding problem in personalized medicine by considering a continuous action space. The asymptotic properties of the proposed estimators are also explored under suitable conditions. The method is applied to a study of romantic relationships from Germany.