How to implement parallel SGD with multiple GPUs? #6844

shenzebang · 2021-05-27T06:17:28Z

shenzebang
May 27, 2021

Hi guys,

I am wondering how we can implement parallel SGD efficient over multiple GPUs?
If there is only a single GPU, I know that we could simply use vmap to parallel every SGD step (let us say we are running $v$ SGDs on a single GPU). Now suppose that I want to run $v * p$ SGDs over $p$ GPUs. How can I do that efficiently? Do I compose pmap with vmap? This strategy seems to be a bit complicated especially if we need to be able to adapt to different $p$.
I was also thinking about using pmap to simulate the Map-Reduce scheme. However, since pmap automatically jits the input function, it will unroll the SGD loops which leads to infinity long compiling time.

Best,
Zebang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to implement parallel SGD with multiple GPUs? #6844

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to implement parallel SGD with multiple GPUs? #6844

Uh oh!

shenzebang May 27, 2021

Replies: 0 comments

shenzebang
May 27, 2021