Do I need to introduce "dummy" batches when using PMAP? #7303

shahbuland · 2021-07-16T10:02:20Z

shahbuland
Jul 16, 2021

I'm trying to do gradient accumulation over some microbatches (pmap'd over the number of microbatches). I want each device to get gradient for one microbatch. Let's say there's 8 microbatches of size 4, and I have 5 devices. Spreading the first 5 microbatches across the 5 devices is simple, but after I've calculated those gradients, I now need to calculate for the 3 microbatches left over. The most obvious solution that comes to mind is to pad with 2 dummy microbatches so I can give something to all devices. Then just ignore the gradients returned by the devices with dummy data. Is that the move? Or is there something in jax for cases like this that I should use.

lamflokas · 2021-07-18T14:20:34Z

lamflokas
Jul 18, 2021

Per the documentation of pmap, when the batching dimension is smaller than the device count then pmap automatically uses a subset of the machines. No need to pad for pmap to work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do I need to introduce "dummy" batches when using PMAP? #7303

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Do I need to introduce "dummy" batches when using PMAP? #7303

Uh oh!

shahbuland Jul 16, 2021

Replies: 1 comment

Uh oh!

lamflokas Jul 18, 2021

shahbuland
Jul 16, 2021

lamflokas
Jul 18, 2021