spacy+gunicorn setup with preload app #10219
-
How to reproduce the behaviourHere is the following code for running my server, server.py
and i run my application using the following code
the description of what Now I want to test my application using a 1 lakh sentences using the following code,
along with the above code , i also run another script
After starting the test.py file, i print the info stored in five_workers.txt using the command
and after completing all the sentences I, again print memory occupied by all the workers using the command `tail -n 5 four_workers.txt
But when i run the application with 1 gunicorn worker, before starting the test.py script, the ram occupied by the application is
and ram occupied by the workers after completely processing the sentences is
According to my understanding about For 1 workers, the memory increase is around 22 mb whereas for 4 workers the memory increase put together for all workers is 64mb. Can you let me know whether a new word added in the vocab by one worker will be accessible by all the other workers? Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I'm not very familiar with gunicorn but it sounds like each worker is a separate process and is getting a separate copy of the model/vocab, or at least part of it; this Stack Overflow question also suggests that's the case. Since you aren't seeing 4x memory usage, I suspect that some portions of the model that are read-only are being automatically shared, while other parts (probably the vocab) are not being shared. It's possible to share memory between processes with You can trivially check if the vocabs are consistent between workers by having them log whether they contain each incoming token. If you want to save memory in a situation like this, what I would suggest is having a separate spaCy process that calls spaCy with |
Beta Was this translation helpful? Give feedback.
I'm not very familiar with gunicorn but it sounds like each worker is a separate process and is getting a separate copy of the model/vocab, or at least part of it; this Stack Overflow question also suggests that's the case. Since you aren't seeing 4x memory usage, I suspect that some portions of the model that are read-only are being automatically shared, while other parts (probably the vocab) are not being shared.
It's possible to share memory between processes with
mmap
and other techniques but that requires some work on the implementation side (spaCy side in this case) and gets very tricky if you're writing to the memory.You can trivially check if the vocabs are consistent between wor…