Parameters: nlp.batch_size and components.transformer.max_batch_size no effect on VRAM usage #13615
Unanswered
whranclast
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I am currently experiencing constantly GPU OOM issues as some of my data that I train/test with grew significantly. I've experimented with adjusting the batch_size and max_batch_size to very low number (4 and 32 accordingly), however absolutely no change in the usage of VRAM. I currently training on NVIDIA A10G with 24GB of VRAM. The only way to reduce the usage of VRAM is to lower significantly my test set.
As I've read here #8600 I've experimented also with
max_lenght
but that did nothing for me.I've also read this interesting response by @mbru that no matter the config settings, the entire dev corpus would be loaded into the VRAM at least once. There must be a workaround on that as I imagine people are training with much more data than me. My total amount of data is around 60 000 articles, on average with length of 800-1000 characters, where I have some that are 16 000 characters but also some that are 200 characters.
I also tried with manually lowering the
max_batch_size
from pytorch withos.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:124"
and what did this cause was after my GPU went to 20GB of usage, it dropped to 4GB of usage and just stayed there, however if I use again the entire test set I still run out of memory.I've experimented quite a bit but unfortunately whatever I change on the config had no effect. I can also say that I am using a fine-tuned transformer based on
transformer = SentenceTransformer('nli-distilroberta-base-v2')
Some more information about my system:
Python version: 3.9.14
Spacy version: 3.6.0
Operating system: Ubuntu 22.04
This is the config file I am currently working with:
Beta Was this translation helpful? Give feedback.
All reactions