I'm curious about how to construct the scenarios used in your ICLR paper. Meanwhile, is the collision with the obstacles like the wall of the Maze not regarded as an unsafe state? Can the agents observe the states of different scenarios, for example, the shape, size and location of the wall?