OOM error for textcat multilabel #9775

jamesmcaul · 2021-11-29T20:20:01Z

jamesmcaul
Nov 29, 2021

Hi!

I'm training a model with a textcat multilabel pipe on a kubernetes node that which limits the memory usage to 16GB. We are consistently adding new training data to our corpus, and recently the memory required to train the model has surpassed 16GB causing an OOM error. I want to make sure that during training spacy does not load in more data into memory than our node can handle, setting a limit would be best so that as the dataset grows we are still able to train with restricted memory.

I've attached the current config.cfg, here is what I have tried so far...

Reduced nlp.batch_size from 1000 to 100, then 10
Set training.batcher.size to 10
Set training.max_epochs to -1 in so that the corpus is streamed rather than loaded into memory

None of these changes reduced the memory usage of the train job (when i run it on my local machine, the memory usage shoots up to around 17~18GB during the first training cycle).

For more context, the size of the data sets are

Training Size: 808837
Test Size: 269613
Validation Size: 269613

And the size of the spacy doc files are 66.3MB, 22.2MB, and 22.2MB respectively.

So, I'm confused why the training job is taking up so much memory and why the steps I took did not reduce it.
Thanks in advance for the help!
config.txt

Answered by polm

Nov 30, 2021

By default spaCy loads all the training data into memory in order to shuffle it. If you have too much data for that you should use a custom data loader, which will let you control how data is loaded into memory.

View full answer

polm · 2021-11-30T05:19:09Z

polm
Nov 30, 2021

By default spaCy loads all the training data into memory in order to shuffle it. If you have too much data for that you should use a custom data loader, which will let you control how data is loaded into memory.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OOM error for textcat multilabel #9775

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

OOM error for textcat multilabel #9775

Uh oh!

jamesmcaul Nov 29, 2021

Replies: 1 comment

Uh oh!

polm Nov 30, 2021

jamesmcaul
Nov 29, 2021

polm
Nov 30, 2021