Performance difference of skrl compared to sb3 and rsl_rl #416

glmzsemanur · 2026-02-25T12:13:16Z

glmzsemanur
Feb 25, 2026

Hi everyone,

I am relatively new to Isaac Lab and skrl, so I apologize if I've missed something fundamental. I am currently working on a robotics project at my university and have encountered a consistent performance divergence based on the hardware used.

Environment:

Task: Isaac-Velocity-Flat-Unitree-A1-v0

Library: skrl (PPO)

Hardware 1: RTX 5090 (Lab machine) - Works perfectly (Walks).
Hardware 2: RTX 5070 Ti (Personal machine) - Converges to "standing still."

OS: Ubuntu 22.04
Isaac Sim 5.1.0

The Issue:
Using identical configurations (the original task -no modifications), seeds, and environment counts (4096), the agent on the RTX 5090 learns a stable gait. However, on my RTX 5070 Ti, the agent consistently falls into a local minimum where it prefers to stand still. I checked the training process and the robot is able to take actions, but prefers not to as training progresses.

Key Observations:

Cross-Library Check: On the same 5070 Ti machine, rsl_rl and SB3 both successfully train walking policies for this task. (both with isaaclab_tasks and with robotlab_tasks)
Inference Check: I loaded the weights trained on the 5090 onto the 5070 Ti machine, and the robot walks perfectly.
Attempted Fixes: I have tried adjusting hyperparameters, but the behavior persists on the 5070 Ti. I have changed the seed from 42 (original) to 40 (arbitrary), the reward has doubled and some was able to learn to walk. But I don't think changing the seed is a robust solution.

Has anyone else experienced this hardware-dependent convergence with skrl?

UPDATE: I have run several trainings with different seeds on both computers. It turns out, in each computer different seeds result in different rewards, and matching the seeds on both computers does not give the same results. Essentially, it is a matter of luck to find a good seed for the spesific computer. My question is, why? Why skrl is so dependent on the seed, while sb3 results in almost identical agents when trained with the same set of seed?

In the below image, you can see the effect of seed on the result. The upper group learns to walk, while the lower group prefers to stay still.

The results are from my personal computer (5070ti), but they were quite similar on the 5090 as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance difference of skrl compared to sb3 and rsl_rl #416

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Performance difference of skrl compared to sb3 and rsl_rl #416

Uh oh!

Uh oh!

glmzsemanur Feb 25, 2026

Replies: 0 comments

glmzsemanur
Feb 25, 2026