NER model inference on Big Data, using a Transformer-based model #10031
Replies: 2 comments 9 replies
-
Let me start by linking the speed FAQ, though it looks like you're already following most of the advice in it. |
Beta Was this translation helpful? Give feedback.
-
It sounds like you are using a Transformer model with a single GPU, in which case we don't recommend using multiprocessing - your GPU memory will fill up too quickly to make it usable, and the benefit of using extra CPU cores is not significant since most of the computation is on GPU anyway. The error is due to a limitation in CUDA, and like it says you need to set the multiprocessing mode to "spawn". This is a setting in the multiprocessing library in Python. Please see the multiprocessing docs. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I had implemented a custom NER extraction model some time ago. It does OK, in regards of the performance in small batches; however I am looking towards scaling up its use, to be able to process several thousands of documents, in a reasonable amount of time.
First things first, I am using Python 3.9.9 and spaCy 3.2.1. The code I am using, goes as follows:
The pipeline obtained is the following:
A bit afterwards, and only with testing purposes, I have the following code, which manages to process my text test batch:
However and after some comparison, I realized that the previous code, performs marginally better than just using
skill_ner
inside afor
loop (code not shown here), dealing with a single text at time:Trying to speed up, I realized that nlp.pipe has the variable
n_process
, which theoretically should increase the processing speed, since I'd be using more cores of my computer. In other words:Which instead of processing things faster, gets me the following error:
I suspect it has something to do with the original NER extractor transformer architecture, however I don't know what exactly to troubleshoot at this point. After some quick research, it seems that error is closely related with PyTorch, as seen here, here, and here, among the most popular results in Google; however nothing closely related with spaCy as such.
Questions:
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions