CUDA Out Of Memory error when running inference - NER transformer model #11970
-
Hello 👋 I've been dealing with CUDA Out Of Memory (OOM) errors when running inference with a fine-tuned NER transformer model on Google Compute Engine (GCE). I'm aware this is not a new issue and there are a lot of discussions here about this problem, but most deal with the error at training stage, and nothing that I learnt from these discussions and implemented in my code has worked so far. Thus, I am hoping for some additional advice. GCE Virtual Machine
Docker configuration:
Packages installed via pip:
The modelA fine-tuned NER model, based on Spacy's Inference codeInference is done via a python module ( What I attempted so far to address the CUDA OOM error(1) used the GPU, with memory allocations directed via PyTorch.In
(2) added the
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The main setting to adjust in inference is the batch size, either by modifying The batch size of The maximum batch size that can run without OOM errors depends a lot on the document lengths, so you may need to take a look at the distribution of text lengths in your input data, because one extremely long text can push an individual batch over the limit. If your text lengths vary a lot, you may want to split long texts for processing to keep the memory usage similar across batches. You shouldn't need to add You should usually let pytorch handle the memory management automatically. If you're emptying the cache manually it may slow down processing and it probably isn't addressing the underlying issue that's leading to the OOM error. |
Beta Was this translation helpful? Give feedback.
The main setting to adjust in inference is the batch size, either by modifying
nlp.batch_size
ornlp.pipe(batch_size=)
. See also: #8600The batch size of
2000
in your script is a lot higher than the default of64
inen_core_web_trf
. Our usual default recommendations fortrf
pipelines are64
or128
, so I would recommend starting in that range while testing and monitoring the maximum memory usage. If there is still lots of free memory, you can raise the batch size.The maximum batch size that can run without OOM errors depends a lot on the document lengths, so you may need to take a look at the distribution of text lengths in your input data, because one extremely long text can push an indi…