Cuda out of memory #11582
Unanswered
dmandair
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Cuda out of memory
#11582
Replies: 1 comment 9 replies
-
hey @dmandair ! the extra computation in |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm getting the dreaded 'Cuda out of memory error'. I originally had batch sizes of ~200 and reduced this to 175 and this seemed to fix the issue - the problem is I now am intermittently getting the same error on this decreased batch size without changing anything in my code (except for one line in my validation_epoch_end function, bolded below). Am I doing something wrong or that can be obviously fixed? These training batch sizes seem small for what I've typically otherwise been able to handle with other models. My lightning module is below, the underlying models are essentially a convnet encoder followed by an attention mechanism - I'd love any help with this if possible.
Full error:
RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Pytorch version: 1.10.1
Cuda 11.3
Beta Was this translation helpful? Give feedback.
All reactions