OOM error for textcat multilabel #9775
-
Hi! I'm training a model with a textcat multilabel pipe on a kubernetes node that which limits the memory usage to 16GB. We are consistently adding new training data to our corpus, and recently the memory required to train the model has surpassed 16GB causing an OOM error. I want to make sure that during training spacy does not load in more data into memory than our node can handle, setting a limit would be best so that as the dataset grows we are still able to train with restricted memory. I've attached the current config.cfg, here is what I have tried so far...
None of these changes reduced the memory usage of the train job (when i run it on my local machine, the memory usage shoots up to around 17~18GB during the first training cycle). For more context, the size of the data sets are Training Size: 808837 And the size of the spacy doc files are 66.3MB, 22.2MB, and 22.2MB respectively. So, I'm confused why the training job is taking up so much memory and why the steps I took did not reduce it. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
By default spaCy loads all the training data into memory in order to shuffle it. If you have too much data for that you should use a custom data loader, which will let you control how data is loaded into memory. |
Beta Was this translation helpful? Give feedback.
By default spaCy loads all the training data into memory in order to shuffle it. If you have too much data for that you should use a custom data loader, which will let you control how data is loaded into memory.