A0615
Title: Sequential knockoffs for variable selection in reinforcement learning
Authors: Jin Zhu - London School of Economics and Political Science (United Kingdom) [presenting]
Abstract: In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. The notion of a minimal sufficient state is introduced in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same reward function as the original process. A novel SEEK algorithm is proposed that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method achieves selection consistency. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy learning. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods regarding variable selection accuracy and regret.