View Submission

A1425

Title: Consistent order determination of Markov decision process Authors: Chuyun Ye - Beijing Normal University (China) [presenting]
Ruoqing Zhu - University of Illinois at Urbana-Champaign (United States)
Lixing Zhu - Beijing Normal University (China)
Abstract: Reinforcement learning (RL) leverages the Markov decision process (MDP), which fundamentally relies on the Markov property. However, numerous real-world systems exhibit extended temporal dependencies, demanding higher-order Markov models beyond the typical first-order assumption. The aim is to tackle the challenge of consistently estimating the order of such Markov processes, a problem where traditional sequential testing methods are hindered by limitations in sensitivity and consistency. The purpose is to introduce a novel, two-stage estimation procedure: first, a function is defined that precisely captures the k-order Markov assumption, guaranteeing sensitivity to all violations; second, a signal statistic is constructed that consistently identifies the true order by exploiting a distinct pattern of minimizers. This approach yields a consistent estimator and facilitates efficient implementation. Furthermore, the characteristic curve pattern of the signal statistic aids in visual inspection, that could simplify the order determination process in practical applications. The effectiveness of the method is validated through simulations and a real-world dataset, representing a significant stride in accurately modeling and applying RL to systems with complex temporal dependencies.