Performance difference of skrl compared to sb3 and rsl_rl #416
Unanswered
glmzsemanur
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I am relatively new to Isaac Lab and skrl, so I apologize if I've missed something fundamental. I am currently working on a robotics project at my university and have encountered a consistent performance divergence based on the hardware used.
Environment:
Task: Isaac-Velocity-Flat-Unitree-A1-v0
Library: skrl (PPO)
Hardware 1: RTX 5090 (Lab machine) - Works perfectly (Walks).
Hardware 2: RTX 5070 Ti (Personal machine) - Converges to "standing still."
OS: Ubuntu 22.04
Isaac Sim 5.1.0
The Issue:
Using identical configurations (the original task -no modifications), seeds, and environment counts (4096), the agent on the RTX 5090 learns a stable gait. However, on my RTX 5070 Ti, the agent consistently falls into a local minimum where it prefers to stand still. I checked the training process and the robot is able to take actions, but prefers not to as training progresses.
Key Observations:
Cross-Library Check: On the same 5070 Ti machine, rsl_rl and SB3 both successfully train walking policies for this task. (both with isaaclab_tasks and with robotlab_tasks)
Inference Check: I loaded the weights trained on the 5090 onto the 5070 Ti machine, and the robot walks perfectly.
Attempted Fixes: I have tried adjusting hyperparameters, but the behavior persists on the 5070 Ti. I have changed the seed from 42 (original) to 40 (arbitrary), the reward has doubled and some was able to learn to walk. But I don't think changing the seed is a robust solution.
Has anyone else experienced this hardware-dependent convergence with skrl?
UPDATE: I have run several trainings with different seeds on both computers. It turns out, in each computer different seeds result in different rewards, and matching the seeds on both computers does not give the same results. Essentially, it is a matter of luck to find a good seed for the spesific computer. My question is, why? Why skrl is so dependent on the seed, while sb3 results in almost identical agents when trained with the same set of seed?
In the below image, you can see the effect of seed on the result. The upper group learns to walk, while the lower group prefers to stay still.


The results are from my personal computer (5070ti), but they were quite similar on the 5090 as well.
Beta Was this translation helpful? Give feedback.
All reactions