-
Notifications
You must be signed in to change notification settings - Fork 86
3.2 RL Environment
The environment, defined as ctc-executioner-v0, is a child class of gym.Env where the methods step and reset are implemented in order to simulate order executions.
This section will first provide an overview of the environment and then describe each component and their functionalities.

The environment covers the entire process of an order execution such that an agent that makes use of this environment does not have to be aware of the inner workings and can regard the execution process as a black-box.
Upon initialization, an order book and a match engine is provided. The order book is the essential core that implicitly defines the state space and the outcome of each execution. All the other components, including the match engine, are therefore abstractions and mechanisms in order to construct the environment that allows to investigate and learn to place orders.
During the execution process, which is initiated by an agent, a memory stores the ongoing execution updates its values while the agent proceeds its epochs. In the current implementation, only one execution can be stored in the memory and therefore the environment supports only one agent at a time as multiple agents would cause raise conditions.
Unlike in most traditional reinforcement learning environments, each step taken by the agent leads to a complete change of the state space. Consider a chess board environment, where the state space is the board equipped with figures. After every move taken by the agent, the state space would look exactly the same, except of the figure moved with that step. This process would go on until the agent either wins or looses the game and the state space would be reset to the very same state as in the beginning of the previous epoch. In the execution environment, however, the state will likely never be the same.
An agent that is compatible with the OpenAI gym.Env interface will be able to make use of this environment.