Skip to content

How should the number of steps be set against to processed data when using ddp and multi GPU #11192

Discussion options

You must be logged in to vote

hey @kash203

with DDP if batch_size is 32 with 4 GPUs then the effective batch size actually 32*4. See the docs here.

now in your case, there are 100_000 samples. Considering batch_size as 1, a single training step with 4 GPUs means 4 calls to the dataloader at 4 samples are covered in a single step. Thus your max_steps here should be 25_000 as you stated.

In case you want to process all the samples only once, you can just set max_epoch=1. max_steps is generally used for sequential learning tasks where data is iteratively created.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@kash203
Comment options

Answer selected by kash203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants