Spacy Inference speed for single text #11554
marzooq-unbxd
started this conversation in
Help: Best practices
Replies: 1 comment 2 replies
-
The easiest ways to improve processing speed are to either batch requests (reducing setup/teardown costs) or to do less work (using a smaller architecture). If your API doesn't allow batching requests externally, you can batch them internally. A simple way to do that would be to have a separate thread with a queue that handles batching and calling |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I was trying to deploy my spacy NER model to a Flask web-app with a gunicorn layer.
I read about improving the inference speed , by moving from
nlp(text)
tonlp.pipe([list_of_texts])
, But in order for it to re-use resources , shouldnt list_of_texts be len()>1.I send NER entities for every single request that comes to my service(len=1).I was having high latency just by increasing rps.Changing n_process wouldnt help according to #10087 .Changing n_threads isnt an option anymore, (doesnt remove GIL lock?)
Is the only way to solve this by reducing model arhcitecture size?
Beta Was this translation helpful? Give feedback.
All reactions