Strange speed differences of reinforcement learning in MuJoCo on different types of GPUs and on different numbers of GPUs. #20797

NEUQer-xing · 2024-04-17T10:11:27Z

NEUQer-xing
Apr 17, 2024

Hello everyone,

I've been conducting some experiments using Jax for reinforcement learning in MuJoCo and I've encountered some puzzling results that I hope the community can help clarify. My tests were performed on four different setups: single GPU (A100), 8 GPUs (A100), single GPU (4090), and multiple GPUs (4090).

Here's a summary of my findings:

Single GPU (A100): 4min53s
8 GPUs (A100): 3min48s
Single GPU (4090): 2min59s
Multiple GPUs (4090): 3min11s

Surprisingly, the training speeds did not scale linearly with the number of GPUs, nor did they improve proportionally with the increased performance capabilities of the GPUs. In fact, the training speed on a single 4090 GPU was faster than on multiple 4090 GPUs.

This result is counterintuitive and differs significantly from my expectations. I was anticipating that increasing the number of GPUs would proportionally speed up the training and that the more powerful 4090 GPUs would outperform the A100s in all scenarios.

I would really appreciate if someone could shed some light on why this might be happening. Is there something specific about Jax's implementation, or could this be related to how MuJoCo interacts with different GPU architectures? Any insights or similar experiences would be very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strange speed differences of reinforcement learning in MuJoCo on different types of GPUs and on different numbers of GPUs. #20797

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Strange speed differences of reinforcement learning in MuJoCo on different types of GPUs and on different numbers of GPUs. #20797

Uh oh!

NEUQer-xing Apr 17, 2024

Replies: 0 comments

NEUQer-xing
Apr 17, 2024