Training Steps Erroneous #6602
Unanswered
ThiruRJST
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
Do you mean the numbers displayed in the progress bar? Not that it includes the training + validation steps combined. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to train resnet50 model with FGVC8 dataset which has 18632 images totally. While performing training using this lightning module, The max steps for an epoch becomes 292 and the batch size is 64 which when multiplied gives 18688. I didn't set any
max_steps
explicitly. I usedStratifiedKFold
to split the dataset into training and validation set. The number of splits is 5.While splitting the dataset becomes 14905 for training and 3727 for validation. I'm passing this pytorch dataset into dataloader with
batch size 64
andnum_workers 4
After splitting, The steps for training must be 14905/64 which is around 232. But the network trains for 292 steps. Why is this happening. Is the network being trained on the whole dataset. I even didn't perform any augmentations.
Link to my Source code: Resnet50 Training FGVC8
Dataset:
Training Loop:
My Lightning Module:
Beta Was this translation helpful? Give feedback.
All reactions