How should the number of steps be set against to processed data when using ddp and multi GPU #11192

kash203 · 2021-12-21T03:20:19Z

kash203
Dec 21, 2021

Hi I'm newbie for pytorch-lightning.
Please teach me about this topic.

I want to process 100,000 records. So I set max_step = 100,000.
And to speed up learning, I also set strategy = ddp and use 4 GPUs with single node.
But when I observe behavior of pytorch-lightning, it seems process 400,000 records.
but step number is still 100,000 steps.

Is there any recognition that the number of steps specified when using multiple GPUs needs to be divided by the number of GPUs with the expected number of steps?
(on above example, should I set max_step to 25,000? )

(my pytorch-lightinng version is 1.5.4)

Answered by rohitgr7

Dec 22, 2021

hey @kash203

with DDP if batch_size is 32 with 4 GPUs then the effective batch size actually 32*4. See the docs here.

now in your case, there are 100_000 samples. Considering batch_size as 1, a single training step with 4 GPUs means 4 calls to the dataloader at 4 samples are covered in a single step. Thus your max_steps here should be 25_000 as you stated.

In case you want to process all the samples only once, you can just set max_epoch=1. max_steps is generally used for sequential learning tasks where data is iteratively created.

View full answer

rohitgr7 · 2021-12-22T13:25:32Z

rohitgr7
Dec 22, 2021

hey @kash203

with DDP if batch_size is 32 with 4 GPUs then the effective batch size actually 32*4. See the docs here.

now in your case, there are 100_000 samples. Considering batch_size as 1, a single training step with 4 GPUs means 4 calls to the dataloader at 4 samples are covered in a single step. Thus your max_steps here should be 25_000 as you stated.

In case you want to process all the samples only once, you can just set max_epoch=1. max_steps is generally used for sequential learning tasks where data is iteratively created.

1 reply

kash203 Dec 22, 2021
Author

Thank you for your kindness @rohitgr7 san!
I overlooked it , and I got it.

In fact, I also set the batch size to 32 and considered scheduling the learning rate, so your answer is very helpful.
(Actually, my actual work and the example are a little different. so I also need to use step counts.
I should also be careful besides accumulate_grad_batches to set batch size. :) )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How should the number of steps be set against to processed data when using ddp and multi GPU #11192

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How should the number of steps be set against to processed data when using ddp and multi GPU #11192

Uh oh!

Uh oh!

kash203 Dec 21, 2021

Replies: 1 comment · 1 reply

Uh oh!

rohitgr7 Dec 22, 2021

Uh oh!

kash203 Dec 22, 2021 Author

kash203
Dec 21, 2021

Replies: 1 comment 1 reply

rohitgr7
Dec 22, 2021

kash203 Dec 22, 2021
Author