CUDA Runtime Error during Spacy Transformers NER Model Training #13129
Unanswered
iamhimanshu0
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Please see the links under "I'm getting Out of Memory errors" here: #8226 What does your data look like? What exactly have you tried (with the exact details from your config)? It would probably also be helpful to try to use a newer version of python so that you can use newer versions of pytorch, which may have performance improvements. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I am currently training a custom NER model (having 90k data records), using Spacy Transformers (en_core_web_trf) and I'm encountering an issue where the training process is taking an unusually long time and eventually gets killed, throwing a CUDA runtime error.
Here's a brief overview of my setup:
The error message I'm receiving is:
"RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 14.76 GiB total capacity; 11.19 GiB already allocated; 78.75 MiB free; 12.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
I've tried a few troubleshooting steps such as checking GPU memory usage, killing processes that might be engaging GPU memory, and adjusting batch sizes, but the issue persists.
I would appreciate any insights or suggestions on how to resolve this issue. Has anyone else encountered this problem and found a solution?
Thank you in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions