Limit the memory usage for pretrained models inside the container programmatically #9521

VMAtm · 2021-10-11T09:40:58Z

VMAtm
Oct 11, 2021

How to reproduce the behaviour

This is related to my previous issue #8554
We're using pretrained spaCy models inside the K8S containers, calling them via fastAPI service, without GPU usage.
We have limited memory for a given container (8G), and we're facing OOM errors during the peak load while processing long English texts by transformer model en_core_web_trf .
It looks like the model wrongly calculates amount of memory available inside the container and trying to allocate too much for it's calculations.
We've tried to find method similar to TORCH.CUDA.SET_PER_PROCESS_MEMORY_FRACTION, but didn't find any usable option for the CPU-based processing.

Is there any way to limit memory programmatically, similar to limiting CPU being used (as in #8554)?

Your Environment

spaCy version: 3.1.0
Platform: Linux-5.8.9-1.el7.elrepo.x86_64-x86_64-with-centos-7.9.2009-Core
Python version: 3.7.9
Pipelines: zh_core_web_lg (3.1.0), fr_core_news_lg (3.1.0), en_core_web_trf (3.1.0), de_core_news_lg (3.1.0)

Answered by polm

Oct 26, 2021

We don't have any feature for this, no. You might want to limit the longest input you'll handle, or the number of simultaneous requests, or use a worker queue.

It looks like the model wrongly calculates amount of memory available inside the container and trying to allocate too much for it's calculations.

As far as I'm aware the models don't check the total available memory, and like most programs they just use what's available as needed. So it's not making a mistake about the amount of memory available, and there's no bug or anything that we can fix, it's just that you're using more memory than is available.

View full answer

polm · 2021-10-26T03:33:03Z

polm
Oct 26, 2021

We don't have any feature for this, no. You might want to limit the longest input you'll handle, or the number of simultaneous requests, or use a worker queue.

It looks like the model wrongly calculates amount of memory available inside the container and trying to allocate too much for it's calculations.

As far as I'm aware the models don't check the total available memory, and like most programs they just use what's available as needed. So it's not making a mistake about the amount of memory available, and there's no bug or anything that we can fix, it's just that you're using more memory than is available.

1 reply

VMAtm Oct 27, 2021
Author

Hi, @polm THank you for mentioning this. However, I perfectly aware about such behaviour and our options to limit the input.

Our previous problem about cores (#8554) was solved by limiting the pytorch.num_threads (#8554 (comment)), and I hoped that similar action can be done for the memory itself. It looks like that right now there is no way to do that, which is unfortunate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Limit the memory usage for pretrained models inside the container programmatically #9521

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Limit the memory usage for pretrained models inside the container programmatically #9521

Uh oh!

VMAtm Oct 11, 2021

How to reproduce the behaviour

Your Environment

Replies: 1 comment · 1 reply

Uh oh!

polm Oct 26, 2021

Uh oh!

VMAtm Oct 27, 2021 Author

VMAtm
Oct 11, 2021

Replies: 1 comment 1 reply

polm
Oct 26, 2021

VMAtm Oct 27, 2021
Author