CartPole-v0 environment solved using the REINFORCE algorithm. Framework used -> PyTorch Demo You can find an example of the trained agent here