Replies: 1 comment 1 reply
-
Hey 👋 From the logs it looks like your agent is learning (scores increasing within episodes), but is not trending upwards. So it seems that the training doesn’t converge and the average score keeps collapsing. This might be a matter of hyperparameter tuning (clip ratio, learning rate, batch size, reward scaling, etc.). Have you seen the burn-rl examples from another user? Perhaps this could help as a reference. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team,
I am trying to implement a collection of basic RL algorithms in Rust with Burn. I found that the PPO algorithm, and potentially the other two that I had implemented, are not learning during the training. I think they are not learning because I don't see the scores are updating a lot:
Here is the ppo implementation: https://github.com/AspadaX/minimalRL-rs/blob/main/src/ppo.rs
I am new to RL. There might be critical things that I missed in the Burn docs and codes. Could you find anything wrong with my training code?
Beta Was this translation helpful? Give feedback.
All reactions