I am running the script here but even after 500 episodes it does not converge. You can see the graph I get below:

In contrast this is the supposedly converged graph from repo:

Can you please advise why this is the case? I did not change any parameters, just ran the script as it is.