CUDA OOM for tensorized neural network #7914
Unanswered
WouterDurnez
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
Update: looks as though the problem is my (triple) use of |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
(Copied this from the old Pytorch Lightning forum, since I didn't spot that it was being moved here.)
I'm trying to train a model on my university's HPC. It has plenty of GPUs (each with 32 GB RAM). I ran it with 2 GPUs, but I'm still getting the dreaded
CUDA out of memory
error (after being in the queue for quite a while, annoyingly).My model is a 3D UNet that takes on 4x128x128x128 input. My batch size is already 1. The problem is that I'm replacing the conv layers with tensor networks to reduce the number of calculations, but that this (somewhat ironically) blows up my memory demand due to the
unfold
operations I'm using to achieve that.These are the parameters I'm using with the trainer.
My question: how can I make better use of the GPU RAM? It should be a combined 64GB, but this output (see below) gives me the impression that the demand is not appropriately distributed.
PS: I just now added a
plugins='ddp_sharded
parameter (having installed fairscale in my venv as well), but I fear that won't be enough. Still in the queue though, will update once it runs.Beta Was this translation helpful? Give feedback.
All reactions