Understanding batch size with multiple dataloaders #14235
Unanswered
Michael-Geuenich
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment 4 replies
-
this is slightly incorrect. This applies only to the progress bar, not the actual global step
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently working on a model with multiple dataloaders of different sizes. My first dataset is composed of 176 samples that are loaded in their entirety in my training loop (
batch_size=176
). My second dataset is composed of ~22,000 samples, though I am using the same batch size of 176.When I implemented this in plain pytorch I went through one batch (176 samples) per epoch which is what I wanted (3 total epochs, three batches sampled, three total steps).
I'm only testing things out at the moment, so I've run this in PL using 3 epochs. I was expecting to run through 3 global steps, however, PL runs through 332 global steps and I don't understand how it arrives at this number.
If the end of an epoch is defined as having sampled the entirety of the larger loader then I would expect to go through
22,000 / 176 * 3 = 375
global steps.Note that I also have a validation step with two dataloaders (using datasets with 61 and 3k samples, as well as a batch size of 61 for both). I've also specified
val_check_interval=1
in my trainer call.I am aware of this question/answer (https://forums.pytorchlightning.ai/t/weird-number-of-steps-per-epoch/773) that states that the
global steps = total train + total val steps
. However, if that is the case, shouldn't the total steps PL goes through be even higher than 332 or 375?Beta Was this translation helpful? Give feedback.
All reactions