Skip to content

What happens if the sampler of the training dataloader has a varying size for each epoch ? #16652

Discussion options

You must be logged in to vote

Hi @juliendenize

Is it ok to have a varying number of batches at each epoch without using iterable datasets?

  • It is ok if training with single-device/single-GPU training.
  • It is NOT ok if training with DDP in general (your training loop will fall out of sync and eventually hang)
  • It is ok if training with DDP AND you can guarantee that each process/GPU has the same epoch length. If the length changes from epoch N to N + 1, it has to change the same way in all processes.

If you aren't intending to train with DDP, you should be good. If you do, and since you have a custom sampler, you will have to make your sample distributed (let me know if you need details on this).

What is initialized …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@juliendenize
Comment options

Answer selected by juliendenize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment