Different batch sizes and/or number of GPUs results in different test metrics #6859
Unanswered
carsonmclean
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have noticed that when running a
trainer.test()
, my experiment will have different test metrics for different batch sizes and/or number of GPUs used. I am fairly certain that the same input sample will result in the exact same logit output from the model regardless of batch size or # of GPUs, so the only remaining code in the pipeline is the evaluation & metrics. That code is as follows:The metrics print out at the end of the test epoch looks at follows:
I believe what is happening is that the displayed metrics are only being calculated on the final batch of one of the GPUs, rather than across all batches on all GPUs in the test epoch. My understanding from the LightningModule documentation is that calling
.log()
would "automatically reduces the requested metrics across the full epoch". I have tried setting bothon_epoch
andsync_dist
toTrue
but still see inconsistencies. This is being run on a single machine with 4 GPUs and theDDP
accelerator.What is the simplest (and proper) way of calculating consistent test metrics for any batch size and number of GPUs?
Beta Was this translation helpful? Give feedback.
All reactions