CUDA out of memory for `SwinUNETR` for input patch-size `(192, 192, 192)` with `DP`, `DDP`, and `FSDP` parallelizations #6532

ahxmeds · 2023-05-19T18:56:45Z

ahxmeds
May 19, 2023

I asked this question here #6103 on the MONAI forum and a similar question on the PyTorch forum here.

Basically, I want to train a SwinUNETR network using input patch-size (192, 192, 192) (and also perform validation/testing on inference window size (192, 192, 192). It is crucial for me to use this very patch size. I am working on a single node with 4 GPUs (16 GiB RAM each). I tried wrapping this SwinUNETR with torch.nn.DataParallel(.), torch.nn.parallel.DistributedDataParallel(.), as well as the more advanced parallelization, torch.distributed.fsdp.FullyShardedDataParallel(.), but for all those cases, I ran out of CUDA GPU memory.

Do we have an efficient implementation of SwinUNET pipeline with FSDP parallelization? Is it possible to find another way to solve this problem?

Alternatively, does the input patch size during training/inference matter for the performance of SwinUNETR? I know it matters for UNet as can be seen from some of my results shown in #6103 as well as the discussion in #2924.

Please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA out of memory for `SwinUNETR` for input patch-size `(192, 192, 192)` with `DP`, `DDP`, and `FSDP` parallelizations #6532

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

CUDA out of memory for SwinUNETR for input patch-size (192, 192, 192) with DP, DDP, and FSDP parallelizations #6532

Uh oh!

ahxmeds May 19, 2023

Replies: 0 comments

CUDA out of memory for `SwinUNETR` for input patch-size `(192, 192, 192)` with `DP`, `DDP`, and `FSDP` parallelizations #6532

ahxmeds
May 19, 2023