[Question] Termination with timeout break the Markov assumption #3353
Replies: 1 comment
-
Thank you for posting this. Terminating episodes with a timeout can break the Markov assumption if the time limit is not provided as part of the agent's observation. In tasks like navigation, simply enforcing a timeout without exposing the remaining time in the observation space leads to a situation where the agent is acting in a partially observable Markov decision process (POMDP) rather than a true Markov Decision Process (MDP).12 Why Timeouts Break the Markov AssumptionThe Markov property requires that all information necessary to predict the next state (and reward) is fully captured in the current state representation. When an environment terminates due to a fixed step limit, but the agent's observation does not include how close it is to that limit, there will be latent state information (remaining timesteps) influencing state transitions that the agent cannot infer from its observations. This "hidden variable" leads to state aliasing, because multiple "identical" observations have different true next-state distributions depending on how much time remains.231 Navigation and Timeout TerminationsIn navigation tasks, a timeout might represent the agent failing to reach its goal within a certain number of steps. If the agent does not know how many steps are left before forced termination, it cannot condition its actions on this crucial piece of environment state. For example, it cannot distinguish whether it has one step left (requiring urgency) or many steps left (allowing a more exploratory policy).412 Proper Handling to Preserve Markov PropertyTo maintain the Markov structure:
Key Takeaways
This is a great topic for our Discussions section. I will move your post there may you need further help, and for others to follow up. Footnotes |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question
I'm confuesed if it will break the Markov assumption in some tasks(e.g. navigation) which add timeout in termination term but not give sufficent time limit information in observation term.
Beta Was this translation helpful? Give feedback.
All reactions