Efficiently use nvidia A2 (16 GB) gpu for inference. #12414
-
|
Hello, there I have trained spacy model with custom dataset and it's working fine for inference (or prediction task) on CPU. Now I 'm looking to upgrade to GPU, Here I'm facing an issue related to GPU memory overload. I have created a django project in which I use this spacy model, and deployed the Django project using gunicorn and nginx here is my gunicorn config For every new request on server the model loads in Gpu and it's memory consumption increases till it's get overloaded. How can I overcome this issue?? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
This is more a question about web service backend design than a question about spaCy, so we can't be of much help here. The issue is probably that gunicorn starts 12 workers, each of which might load the model on the GPU. Furthermore, depending on when gunicorn forks workers, there may be bad interactions with threading. So you probably want to build something into your application that puts an acceptable upper bound on the number of spaCy models in GPU memory. |
Beta Was this translation helpful? Give feedback.
This is more a question about web service backend design than a question about spaCy, so we can't be of much help here. The issue is probably that gunicorn starts 12 workers, each of which might load the model on the GPU. Furthermore, depending on when gunicorn forks workers, there may be bad interactions with threading. So you probably want to build something into your application that puts an acceptable upper bound on the number of spaCy models in GPU memory.