Will require modifying the environment class, as rewards do not know the position in an episode.
Will require modifying the environment class, as rewards do not know the position in an episode.