NER model inference on Big Data, using a Transformer-based model #10031

dave-espinosa · 2022-01-11T20:12:05Z

dave-espinosa
Jan 11, 2022

Hello everyone,

I had implemented a custom NER extraction model some time ago. It does OK, in regards of the performance in small batches; however I am looking towards scaling up its use, to be able to process several thousands of documents, in a reasonable amount of time.

First things first, I am using Python 3.9.9 and spaCy 3.2.1. The code I am using, goes as follows:

# a previously transformer-based NER model, 'Model 6':
skill_ner = spacy.load(R"./model-bucket/Model6/model-best") 

@Language.component("skill_cleaner")
def skill_cleaner(doc):
    # Only filter those entities that are 'SKILL' (custom-made) & 'LANGUAGE' (spacy default)
    doc.ents = [ent for ent in list(doc.ents) if ((ent.label_ == 'SKILL') | (ent.label_ == 'LANGUAGE'))]
    return doc

# Safety component removal, to enable re-running of this cell
if "skill_cleaner" in list(skill_ner.analyze_pipes()['summary'].keys()):
    skill_ner.remove_pipe("skill_cleaner")

# Adding custom component
skill_ner.add_pipe(
    "skill_cleaner"
)

# What do we have here?
skill_ner.analyze_pipes()

The pipeline obtained is the following:

{'summary': {'transformer': {'assigns': ['doc._.trf_data'],
   'requires': [],
   'scores': [],
   'retokenizes': False},
  'ner': {'assigns': ['doc.ents', 'token.ent_iob', 'token.ent_type'],
   'requires': [],
   'scores': ['ents_f', 'ents_p', 'ents_r', 'ents_per_type'],
   'retokenizes': False},
  'skill_cleaner': {'assigns': [],
   'requires': [],
   'scores': [],
   'retokenizes': False}},
 'problems': {'transformer': [], 'ner': [], 'skill_cleaner': []},
 'attrs': {'token.ent_type': {'assigns': ['ner'], 'requires': []},
  'token.ent_iob': {'assigns': ['ner'], 'requires': []},
  'doc.ents': {'assigns': ['ner'], 'requires': []},
  'doc._.trf_data': {'assigns': ['transformer'], 'requires': []}}}

A bit afterwards, and only with testing purposes, I have the following code, which manages to process my text test batch:

revision_texts = []
t1 = tm.time()
# df["posting"] contains the texts, with ~10K characters each
for doc in skill_ner.pipe(df["posting"], batch_size=10):
    revision_texts.append(list(doc.ents))
t2 = tm.time()
print(t2-t1)

However and after some comparison, I realized that the previous code, performs marginally better than just using skill_ner inside a for loop (code not shown here), dealing with a single text at time:

No. of processed texts	t_for_loop [s]	t_spacy [s]
10	0.564377	0.443946
100	4.382640	4.353652
1000	51.291995	53.609408

Trying to speed up, I realized that nlp.pipe has the variable n_process, which theoretically should increase the processing speed, since I'd be using more cores of my computer. In other words:

revision_texts = []
t1 = tm.time()
# df["posting"] contains the texts, with ~10K characters each
for doc in skill_ner.pipe(df["posting"], batch_size=10, n_process=8):
    revision_texts.append(list(doc.ents))
t2 = tm.time()
print(t2-t1)

Which instead of processing things faster, gets me the following error:

ValueError: [E871] Error encountered in nlp.pipe with multiprocessing:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/spacy/language.py", line 2185, in _apply_pipes
    byte_docs = [(doc.to_bytes(), doc._context, None) for doc in docs]
  File "/opt/conda/lib/python3.7/site-packages/spacy/language.py", line 2185, in <listcomp>
    byte_docs = [(doc.to_bytes(), doc._context, None) for doc in docs]
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1609, in _pipe
    for doc in docs:
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1599, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/transition_parser.pyx", line 230, in pipe
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1548, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1599, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/pipeline_component.py", line 212, in pipe
    self.set_annotations(subbatch, self.predict(subbatch))
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/pipeline_component.py", line 228, in predict
    activations = self.model.predict(docs)
  File "/opt/conda/lib/python3.7/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/layers/transformer_model.py", line 185, in forward
    model_output, bp_tensors = transformer(wordpieces, is_train)
  File "/opt/conda/lib/python3.7/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/opt/conda/lib/python3.7/site-packages/thinc/layers/pytorchwrapper.py", line 133, in forward
    Xtorch, get_dX = convert_inputs(model, X, is_train)
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/layers/transformer_model.py", line 214, in _convert_transformer_inputs
    "input_ids": xp2torch(wps.input_ids).long().to(device=hf_device),
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 164, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I suspect it has something to do with the original NER extractor transformer architecture, however I don't know what exactly to troubleshoot at this point. After some quick research, it seems that error is closely related with PyTorch, as seen here, here, and here, among the most popular results in Google; however nothing closely related with spaCy as such.

Questions:

What is the recommended and fastest option, to perform NER extraction on hundreds of thousands of texts? (keeping Issue # 7593 under control) Please notice that this pipeline includes both custom components, as well as a transformer-based architecture.
Maybe this question eventually becomes a bit redundant with "# 1" but, how to overcome, or what to pay attention to, in the error I am getting?

Thank you very much.

polm · 2022-01-12T03:48:53Z

polm
Jan 12, 2022

Let me start by linking the speed FAQ, though it looks like you're already following most of the advice in it.

6 replies

polm Jan 13, 2022

Use a smaller model: Not possible, I am aiming for high accuracy, and according to spaCy's usage, transformers are the recommended option.

You need to actually check whether the difference in accuracy between the CPU models and GPU models is actually significant for your application. Transformers are typically more accurate, but sometimes there's basically no difference.

dave-espinosa Jan 14, 2022
Author

Hello @polm ,

Due to some time restrictions today, I had the chance to train a "small tok2vec-based model" only, and yeah, the accuracy gets reduced around a 3% (from ~99% to 96%); however, and since we are dealing with a lot of documents, errors became more noticeable 🤔 ... Anyways, we know we have an "ace in the hole" just in case there, because speed got reduced by a near ~4x factor with the defaults, and by nearly a ~16x factor with multiprocessing 😄 !

I will explore a "large tok2vec-based model" tomorrow, so I can see if I can get "the best of both worlds" (i.e., a balance between accuracy and speed) with it.

I'll let you know, thank you.

dave-espinosa Jan 15, 2022
Author

Hello everyone,

Just if anyone is interested, training a Tok2Vec-based (large) model, obtains a nice balance between prediticion metrics and speed. In our use case, we have updated our NER model to use Tok2Vec, and will drop momentarily the Transformer-based model, because of all the issues it is currently causing, mentioned along this thread.

This thread will remain open from our side, since our solution is kind of a containment only.

Thanks.

beingfanfan Mar 26, 2023

Hi dave,
Do you mean that you finally gave up the "en_core_web_trf" model and chose the "en_core_web_lg" model?

dave-espinosa Mar 26, 2023
Author

Hello @beingfanfan ,

In regard of your query:

Do you mean that you finally gave up the "en_core_web_trf" model and chose the "en_core_web_lg" model?

The answer is: yes. In fact, using multiprocessing with GPU is nowadays not recommended, as suggested at the bottom of Multiprocessing documentation:

Multiprocessing on GPU: Multiprocessing is not generally recommended on GPU because RAM is too limited. If you want to try it out, be aware that it is only possible using spawn due to limitations in CUDA.

Multiprocessing with transformer models: In Linux, transformer models may hang or deadlock with multiprocessing due to an issue in PyTorch. One suggested workaround is to use spawn instead of fork and another is to limit the number of threads before loading any models using torch.set_num_threads(1).

If you're in a hurry, you may find en_core_web_lg faster to adapt to your needs, and as I said somewhere along this thread, it has been working fine so far, for my use case.

Hope it helps.

polm · 2022-01-12T03:54:43Z

polm
Jan 12, 2022

It sounds like you are using a Transformer model with a single GPU, in which case we don't recommend using multiprocessing - your GPU memory will fill up too quickly to make it usable, and the benefit of using extra CPU cores is not significant since most of the computation is on GPU anyway.

The error is due to a limitation in CUDA, and like it says you need to set the multiprocessing mode to "spawn". This is a setting in the multiprocessing library in Python. Please see the multiprocessing docs.

3 replies

dave-espinosa Jan 12, 2022
Author

Hello again @polm ,

My feedback and comments, on top of the ones of your own:

It sounds like you are using a Transformer model with a single GPU [...]

Indeed, I am using exactly that.

[...] in which case we don't recommend using multiprocessing - your GPU memory will fill up too quickly to make it usable, and the benefit of using extra CPU cores is not significant since most of the computation is on GPU anyway.

Do you mean by any chance that "with my current setting, the speed I am getting, is the highest I can achieve"? Do you think it would be worth chasing "splitting the NER into multiple processeses, a.k.a. 'using n_processes'" (i.e., would CPU multiprocessing be faster than GPU processing, at least on a general or average level)?

Now, moving towards the multiprocessing docs (and more specifically to "Multiprocessing with transformer models"), I have read two basic suggestions, which led me towards more questions, I hope you can help me out clarifying them a little more:

use spawn instead of fork: Do you have any spaCy focused working example, about what to modify and where? The search keywords "spacy use spawn instead of fork" in Google, other than providing the conceptual differences between those implementations (at least in the top results), do not throw much light regarding the code part itself.
limit the number of threads before loading any models using torch.set_num_threads(1): What blogs and help forums summarize about this, is "You have to set it at the beginning of the Worker() function for it to have an effect on the newly created process". But once again, I think this hint is mostly addressed to NN models built in PyTorch... Where or How does this suggestion fit in one spaCy architecture (or project documentation)?

Thank you very much for your time, hope to read about this topic soon.

polm Jan 13, 2022

Do you mean by any chance that "with my current setting, the speed I am getting, is the highest I can achieve"?

No. I just said that multiprocessing with a GPU model on a single GPU doesn't help. There may be other things you can tweak but I'm honestly not sure what to recommend given that setup.

Do you think it would be worth chasing "splitting the NER into multiple processeses, a.k.a. 'using n_processes'" (i.e., would CPU multiprocessing be faster than GPU processing, at least on a general or average level)?

If you use a CPU model I would expect it to handle parallel processing easily and fit better in RAM, so with the same number of machines you could get higher throughput.

use spawn instead of fork

In the docs you linked to there is a link to the Python multiprocessing docs that explains how to configure this - it is a Python setting, not a spaCy one.

limit the number of threads before loading any models using torch.set_num_threads(1)

HuggingFace Transformers uses Torch internally. You want it as a setting for your process so just call it at the top of your code somewhere

dave-espinosa Jan 14, 2022
Author

Hello @polm ,

Here my thoughts about your input.

In the docs you linked to there is a link to the Python multiprocessing docs that explains how to configure this - it is a Python setting, not a spaCy one.

Yes, very general explanations. Long story short, I ended up trying a setting inspired by these experiments (of course, using spawn instead of fork), which lead me to get the following error:

/opt/conda/lib/python3.7/site-packages/spacy/language.py:1564: UserWarning: [W114] Using multiprocessing with GPU models is not recommended and may lead to errors.
  warnings.warn(Warnings.W114)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'skill_cleaner' on <module '__main__' (built-in)>

BTW, that's the closest I have got so far. Again, time run short today 😕.

HuggingFace Transformers uses Torch internally. You want it as a setting for your process so just call it at the top of your code somewhere

Keeping in mind that spaCy's Multiprocessing documentation literally suggests "One suggested workaround is to use spawn instead of fork and another is to limit the number of threads before loading any models using torch.set_num_threads(1)", I first used this hint isolated. The results:

ValueError: [E871] Error encountered in nlp.pipe with multiprocessing:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/spacy/language.py", line 2185, in _apply_pipes
    byte_docs = [(doc.to_bytes(), doc._context, None) for doc in docs]
  File "/opt/conda/lib/python3.7/site-packages/spacy/language.py", line 2185, in <listcomp>
    byte_docs = [(doc.to_bytes(), doc._context, None) for doc in docs]
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1609, in _pipe
    for doc in docs:
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1599, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/transition_parser.pyx", line 230, in pipe
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1548, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 1599, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/pipeline_component.py", line 212, in pipe
    self.set_annotations(subbatch, self.predict(subbatch))
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/pipeline_component.py", line 228, in predict
    activations = self.model.predict(docs)
  File "/opt/conda/lib/python3.7/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/layers/transformer_model.py", line 185, in forward
    model_output, bp_tensors = transformer(wordpieces, is_train)
  File "/opt/conda/lib/python3.7/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/opt/conda/lib/python3.7/site-packages/thinc/layers/pytorchwrapper.py", line 133, in forward
    Xtorch, get_dX = convert_inputs(model, X, is_train)
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/layers/transformer_model.py", line 214, in _convert_transformer_inputs
    "input_ids": xp2torch(wps.input_ids).long().to(device=hf_device),
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 164, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Then, I decided to use both at the same time, which anyways triggered me this error seen before:

/opt/conda/lib/python3.7/site-packages/spacy/language.py:1564: UserWarning: [W114] Using multiprocessing with GPU models is not recommended and may lead to errors.
  warnings.warn(Warnings.W114)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'skill_cleaner' on <module '__main__' (built-in)>

As I mentioned in the reply above, for the time being, we still have one experiment more, scheduled for tomorrow, to see if we can "bypass" this "transformed-based model need". Let's see how it goes 🤞...

Some (more) comments:

Right now, I totally agree with that Using multiprocessing with GPU models is not recommended and may lead to errors warning 😅... Transformer-based models still depend on lots of trial-and-error for these speed-up optimizations, using the default commands mentioned in the documentation, which gives the developer the sensation that 'it is not quite there... but close enough'... At least it is not as easily usable as the "tok2vec-based models" counterpart.

If you have another clue or suggestion, it'll be well appreciated 🤝.

Thank you.

Uh oh!

NER model inference on Big Data, using a Transformer-based model #10031

Uh oh!

Uh oh!

dave-espinosa Jan 11, 2022

Replies: 2 comments · 9 replies

Uh oh!

polm Jan 12, 2022

Uh oh!

polm Jan 13, 2022

Uh oh!

Uh oh!

dave-espinosa Jan 14, 2022 Author

Uh oh!

dave-espinosa Jan 15, 2022 Author

Uh oh!

beingfanfan Mar 26, 2023

Uh oh!

Uh oh!

dave-espinosa Mar 26, 2023 Author

Uh oh!

polm Jan 12, 2022

Uh oh!

Uh oh!

dave-espinosa Jan 12, 2022 Author

Uh oh!

Uh oh!

polm Jan 13, 2022

Uh oh!

Uh oh!

dave-espinosa Jan 14, 2022 Author

dave-espinosa
Jan 11, 2022

Replies: 2 comments 9 replies

polm
Jan 12, 2022

dave-espinosa Jan 14, 2022
Author

dave-espinosa Jan 15, 2022
Author

dave-espinosa Mar 26, 2023
Author

polm
Jan 12, 2022

dave-espinosa Jan 12, 2022
Author

dave-espinosa Jan 14, 2022
Author