spacy+gunicorn setup with preload app #10219

rkoystart · 2022-02-03T18:03:32Z

rkoystart
Feb 3, 2022

How to reproduce the behaviour

Here is the following code for running my server, server.py

from flask import Flask, request, jsonify
import spacy

app = Flask(__name__)
model =  spacy.load("en_core_web_sm")


@app.route("/serve", methods=["POST"])
def serve():
   params =  request.json
   doc =  model(params["text"][0])
   return "success"

and i run my application using the following code

gunicorn server:app --worker 4 --preload

the description of what --preload does - https://docs.gunicorn.org/en/stable/settings.html#preload-app

Now I want to test my application using a 1 lakh sentences using the following code,

test.py

import requests

data = open("1l_test.txt","r").read().splitlines()

for i in data:
    inp["text"] = [i]
    response =  requests.post(url="http://0.0.0.0:8000/serve", json=inp)
#    print(response.status_code, response.text)

along with the above code , i also run another script memory_monitor.py to print the memory_occupied by all the master and child process for every second.

import psutil
pid=37410 # master pid of the gunicorn process
while True:
     with open("four_workers.txt","a") as writer:
          process =  psutil.Process(pid)
          memory = process.memory_info().rss/1000000
          children = process.children(recursive=True)
          writer.write(str(process)+"\t"+str(memory)+"\n")
          for child in children:
              writer.write(str(child)+"\t"+str(child.memory_info().rss/1000000)+"\n")

After starting the test.py file, i print the info stored in five_workers.txt using the command head -n 5 four_workers.txt

psutil.Process(pid=37410, name='gunicorn', status='sleeping', started='22:17:22')       327.155712
psutil.Process(pid=37419, name='gunicorn', status='sleeping', started='22:17:27')       189.648896
psutil.Process(pid=37420, name='gunicorn', status='sleeping', started='22:17:27')       189.661184
psutil.Process(pid=37421, name='gunicorn', status='sleeping', started='22:17:27')       189.72672
psutil.Process(pid=37422, name='gunicorn', status='sleeping', started='22:17:27')       189.72672

and after completing all the sentences I, again print memory occupied by all the workers using the command `tail -n 5 four_workers.txt

psutil.Process(pid=37410, name='gunicorn', status='sleeping', started='22:17:22')       327.155712
psutil.Process(pid=37419, name='gunicorn', status='sleeping', started='22:17:27')       204.197888
psutil.Process(pid=37420, name='gunicorn', status='sleeping', started='22:17:27')       204.476416
psutil.Process(pid=37421, name='gunicorn', status='sleeping', started='22:17:27')       205.033472
psutil.Process(pid=37422, name='gunicorn', status='sleeping', started='22:17:27')       207.95392

But when i run the application with 1 gunicorn worker, before starting the test.py script, the ram occupied by the application is

psutil.Process(pid=39442, name='gunicorn', status='sleeping', started='23:10:48')       327.593984
psutil.Process(pid=39455, name='gunicorn', status='sleeping', started='23:10:52')       189.833216

and ram occupied by the workers after completely processing the sentences is

psutil.Process(pid=39442, name='gunicorn', status='sleeping', started='23:10:48')       327.593984
psutil.Process(pid=39455, name='gunicorn', status='sleeping', started='23:10:52')       211.288064

According to my understanding about --preload command line argument, if there are multiple workers, all the workers will only be using the same spacy model for inference and whenever spacy comes across a new word during inference, a new lexeme object is being created(using malloc for obtaining memory) and the correponding lexeme object is added in vocab. So if the first worker comes across a new word and this new word is added in vocab(additional memory is consumed by the application), and again the second worker comes across the word previously added by first worker, it will not create a new lexeme object ,it will just make use of the lexeme object created by the first worker.

For 1 workers, the memory increase is around 22 mb whereas for 4 workers the memory increase put together for all workers is 64mb.
If all the workers are using the same model for inference(basically accessing the same memory location where the model is stored), then the increase found with 1 gunicorn worker should have been same when compared to 4 gunicorn workers.

Can you let me know whether a new word added in the vocab by one worker will be accessible by all the other workers?

Your Environment

Operating System: ubuntu
Python Version Used: 3.8
spaCy Version Used: 3.1.3
Environment Information:
spaCy version: 3.1.3
- Platform: Linux-5.10.3-051003-generic-x86_64-with-glibc2.10
- Python version: 3.8.8
- Pipelines: en_core_web_sm (3.1.0), es_core_news_sm (3.1.0), en_core_web_md (3.1.0)

Answered by polm

Feb 6, 2022

I'm not very familiar with gunicorn but it sounds like each worker is a separate process and is getting a separate copy of the model/vocab, or at least part of it; this Stack Overflow question also suggests that's the case. Since you aren't seeing 4x memory usage, I suspect that some portions of the model that are read-only are being automatically shared, while other parts (probably the vocab) are not being shared.

It's possible to share memory between processes with mmap and other techniques but that requires some work on the implementation side (spaCy side in this case) and gets very tricky if you're writing to the memory.

You can trivially check if the vocabs are consistent between wor…

View full answer

polm · 2022-02-06T11:12:03Z

polm
Feb 6, 2022

I'm not very familiar with gunicorn but it sounds like each worker is a separate process and is getting a separate copy of the model/vocab, or at least part of it; this Stack Overflow question also suggests that's the case. Since you aren't seeing 4x memory usage, I suspect that some portions of the model that are read-only are being automatically shared, while other parts (probably the vocab) are not being shared.

It's possible to share memory between processes with mmap and other techniques but that requires some work on the implementation side (spaCy side in this case) and gets very tricky if you're writing to the memory.

You can trivially check if the vocabs are consistent between workers by having them log whether they contain each incoming token.

If you want to save memory in a situation like this, what I would suggest is having a separate spaCy process that calls spaCy with nlp.pipe or multiprocessing options and having your workers talk to it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

spacy+gunicorn setup with preload app #10219

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

spacy+gunicorn setup with preload app #10219

Uh oh!

rkoystart Feb 3, 2022

How to reproduce the behaviour

Your Environment

Replies: 1 comment

Uh oh!

polm Feb 6, 2022

rkoystart
Feb 3, 2022

polm
Feb 6, 2022