Skip to content
Discussion options

You must be logged in to vote

I'm not very familiar with gunicorn but it sounds like each worker is a separate process and is getting a separate copy of the model/vocab, or at least part of it; this Stack Overflow question also suggests that's the case. Since you aren't seeing 4x memory usage, I suspect that some portions of the model that are read-only are being automatically shared, while other parts (probably the vocab) are not being shared.

It's possible to share memory between processes with mmap and other techniques but that requires some work on the implementation side (spaCy side in this case) and gets very tricky if you're writing to the memory.

You can trivially check if the vocabs are consistent between wor…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
third-party Third-party packages and services scaling Scaling, serving and parallelizing spaCy
2 participants
Converted from issue

This discussion was converted from issue #10206 on February 06, 2022 11:12.