Model does not seem to learn? #3662

AspadaX · 2025-09-04T07:27:13Z

AspadaX
Sep 4, 2025

Hi team,

I am trying to implement a collection of basic RL algorithms in Rust with Burn. I found that the PPO algorithm, and potentially the other two that I had implemented, are not learning during the training. I think they are not learning because I don't see the scores are updating a lot:

# of episode :1660, avg score : 6.9
[src/ppo.rs:344:2] score = 5.0
[src/ppo.rs:344:2] score = 6.0
[src/ppo.rs:344:2] score = 7.0
[src/ppo.rs:344:2] score = 8.0
[src/ppo.rs:344:2] score = 9.0
[src/ppo.rs:344:2] score = 10.0
[src/ppo.rs:344:2] score = 11.0
[src/ppo.rs:344:2] score = 12.0
[src/ppo.rs:344:2] score = 19.0
[src/ppo.rs:344:2] score = 20.0
[src/ppo.rs:344:2] score = 35.0
[src/ppo.rs:344:2] score = 39.0
[src/ppo.rs:344:2] score = 43.0
[src/ppo.rs:344:2] score = 44.0
[src/ppo.rs:344:2] score = 45.0
[src/ppo.rs:344:2] score = 46.0
[src/ppo.rs:344:2] score = 47.0
[src/ppo.rs:344:2] score = 48.0
[src/ppo.rs:344:2] score = 53.0
[src/ppo.rs:344:2] score = 54.0
# of episode :1680, avg score : 2.7
[src/ppo.rs:344:2] score = 12.0
[src/ppo.rs:344:2] score = 14.0
[src/ppo.rs:344:2] score = 23.0
[src/ppo.rs:344:2] score = 24.0
[src/ppo.rs:344:2] score = 25.0
[src/ppo.rs:344:2] score = 26.0
[src/ppo.rs:344:2] score = 31.0
[src/ppo.rs:344:2] score = 38.0
[src/ppo.rs:344:2] score = 42.0
[src/ppo.rs:344:2] score = 43.0
[src/ppo.rs:344:2] score = 44.0
[src/ppo.rs:344:2] score = 45.0
[src/ppo.rs:344:2] score = 54.0
[src/ppo.rs:344:2] score = 55.0
[src/ppo.rs:344:2] score = 56.0
[src/ppo.rs:344:2] score = 57.0
[src/ppo.rs:344:2] score = 65.0
[src/ppo.rs:344:2] score = 66.0
[src/ppo.rs:344:2] score = 79.0
[src/ppo.rs:344:2] score = 80.0
# of episode :1700, avg score : 4
[src/ppo.rs:344:2] score = 1.0
[src/ppo.rs:344:2] score = 2.0
[src/ppo.rs:344:2] score = 3.0
[src/ppo.rs:344:2] score = 4.0
[src/ppo.rs:344:2] score = 9.0
[src/ppo.rs:344:2] score = 16.0
[src/ppo.rs:344:2] score = 17.0

Here is the ppo implementation: https://github.com/AspadaX/minimalRL-rs/blob/main/src/ppo.rs

I am new to RL. There might be critical things that I missed in the Burn docs and codes. Could you find anything wrong with my training code?

laggui · 2025-09-08T14:21:22Z

laggui
Sep 8, 2025
Maintainer

Hey 👋

From the logs it looks like your agent is learning (scores increasing within episodes), but is not trending upwards.

So it seems that the training doesn’t converge and the average score keeps collapsing.

This might be a matter of hyperparameter tuning (clip ratio, learning rate, batch size, reward scaling, etc.).

Have you seen the burn-rl examples from another user? Perhaps this could help as a reference.

1 reply

AspadaX Sep 9, 2025
Author

Hello 👋

Thanks to your material.

I fixed the issue in my latest PR and the model is learning although not scoring as much as the Python version. I will check the link you sent and what I can do to improve

Best,
Xinyu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model does not seem to learn? #3662

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Model does not seem to learn? #3662

Uh oh!

AspadaX Sep 4, 2025

Replies: 1 comment · 1 reply

Uh oh!

laggui Sep 8, 2025 Maintainer

Uh oh!

Uh oh!

AspadaX Sep 9, 2025 Author

AspadaX
Sep 4, 2025

Replies: 1 comment 1 reply

laggui
Sep 8, 2025
Maintainer

AspadaX Sep 9, 2025
Author