A0787
Title: Imputation-based q-learning with regression trees for censored survival data
Authors: Youngjoo Cho - Konkuk University (Korea, South) [presenting]
Xue Yang - University of Pittsburgh (United States)
Abdus Wahed - University of Pittsburgh (United States)
Yu Cheng - University of Pittsburgh (United States)
Abstract: Dynamic treatment regimes (DTRs) are sets of decision rules that guide individualized, time-varying treatments in multistage therapy. In the presence of right-censored survival data, many methods have been proposed to determine optimal DTRs that maximize the expected overall survival time. Q-learning is a commonly used and straightforward reinforcement learning algorithm for this purpose. However, it is sensitive to model misspecification, a well-known limitation. To address this issue, tree-assisted imputation-based Q-learning (TAI Q-learning) is proposed. In this method, double-robust survival trees and tree ensembles are used to estimate the optimal treatment rules at each stage, while multivariate imputation by chained equations (MICE) is employed to predict the optimal survival time. Through extensive simulation studies, the performance of traditional Cox proportional hazards (Cox PH) and accelerated failure-time (AFT) models with nonparametric tree-based methods in the optimization step are compared, and hot-deck multiple imputation is compared with MICE in the imputation step. The simulation results show that MICE is easier to implement and often outperforms hot-deck imputation. Moreover, in multilevel treatment scenarios, tree-based methods outperform standard Cox PH and AFT models in estimating optimal treatment rules.