Cuda out of memory on Large-v2 using Tesla T4 #1319

HalukMaestra · 2023-05-05T10:15:39Z

HalukMaestra
May 5, 2023

Hey everyone, I am running a Gunicorn server with Tesla T4 gpu (16gb of vram) and I am getting the following error while the gpu is being initialized on startup

ERROR:__mp_main__:Error initializing model: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 14.76 GiB total capacity; 4.20 GiB already allocated; 24.75 MiB free; 4.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried clearing torch cache with torch.cuda.empty_cache() and I also tried reducing batch size on my dockerfile with ENV PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:128" (tried with batch sizes 512,256 and 128) none of these solutions seemed to help. What could I been missing here? I am able to run my server with medium model but My T4 with 16gb of vram can't seem to hold whisper Large which requires 12. Does anyone have an idea what might be the problem?

Answered by jongwook

May 5, 2023

Can you check if there're any other processes occupying the VRAM? You can run nvidia-smi to check the memory usage before starting the server.

Also, please make sure the model is not getting loaded multiple times, which can happen if load_model() is called multiple times, potentially from different processes (workers) that gunicorn server may launch.

View full answer

jongwook · 2023-05-05T23:08:48Z

jongwook
May 5, 2023
Maintainer

Can you check if there're any other processes occupying the VRAM? You can run nvidia-smi to check the memory usage before starting the server.

Also, please make sure the model is not getting loaded multiple times, which can happen if load_model() is called multiple times, potentially from different processes (workers) that gunicorn server may launch.

2 replies

HalukMaestra May 8, 2023
Author

Hello @jongwook thank you for your response. I immensely appreciate all the work you guys are doing. My load_model is the first thing that runs on server initialization and it's where I get the cuda out of memory error, so I am pretty certain there aren't other applications clogging up the memory. My load_model also runs only once during startup and never again.

jongwook May 8, 2023
Maintainer

Could you check if you're launching multiple gunicorn worker processes, or see if forcing it to use a single worker like below fixes the OOM?

gunicorn [...] --workers 1

If you can call load_model() from a Jupyter notebook or a python shell without an OOM, it'd be another indication that gunicorn is somehow affecting the memory usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda out of memory on Large-v2 using Tesla T4 #1319

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Cuda out of memory on Large-v2 using Tesla T4 #1319

Uh oh!

HalukMaestra May 5, 2023

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

jongwook May 5, 2023 Maintainer

Uh oh!

HalukMaestra May 8, 2023 Author

Uh oh!

Uh oh!

jongwook May 8, 2023 Maintainer

HalukMaestra
May 5, 2023

Replies: 1 comment 2 replies

jongwook
May 5, 2023
Maintainer

HalukMaestra May 8, 2023
Author

jongwook May 8, 2023
Maintainer