nlp.pipe is freezed after changing n_process for spaCy 3.0 #10678

binh0206 · 2022-04-20T07:55:06Z

binh0206
Apr 20, 2022

If I keep the n_process of the nlp.pipe equal to 1 such as nlp.pipe(n_process = 1), there is no issue; however, whenever I increase n_process to any numbers larger than 1 such as 2 or 3 or 12, the Pycharm IDE is stopped working. My code:

import tensorflow
import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import Matcher

comment_sentiment_data = pd.read_pickle("comment_sentiment_data.pkl")
nlp = spacy.load("en_core_web_lg")
from datetime import datetime
start_time = datetime.now()
matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": {"IN": ["#", "$"]}, "OP": "+"}, {"TEXT": {"REGEX": "[A-Za-z]+"}}]
matcher.add("stockHashTag", [pattern])
doc = nlp.pipe(comment_sentiment_data["body"][:5000], n_process=2, batch_size=10000,disable=['parser', 'tagger', 'ner','attribute_ruler', 'lemmatizer'])
doc1 = list(doc)
matches = [matcher(subdoc) for subdoc in doc1]
outside = []
for t,sub in zip(matches,doc1):
    inside = []
    for x in t:
        inside.append(sub[x[1]:x[2]])
    outside.append(inside)
end_time = datetime.now()
print(end_time - start_time)

At line doc1 = list(doc), the code stops working. If n_process is 1, the code finishes in almost 4 seconds. I am using spaCy 3.2.4.

Answered by adrianeboyd

Apr 21, 2022

Hmm, I can't reproduce this. Can you provide more info about your platform with spacy info --markdown?

The main reason that users report pipelines hanging with multiprocessing is in linux with trf/transformer models, which is related to an issue in pytorch.

Does this hang for you (it's fine on my end with v3.2.3)?

import spacy
from spacy.matcher import Matcher
from datetime import datetime

nlp = spacy.blank("en")
start_time = datetime.now()
matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": {"IN": ["#", "$"]}, "OP": "+"}, {"TEXT": {"REGEX": "[A-Za-z]+"}}]
matcher.add("stockHashTag", [pattern])
docs = list(nlp.pipe(["a #AAAA"] * 1000, n_process=4, batch_size=100))
matches = [matcher(subdoc) f…

View full answer

adrianeboyd · 2022-04-21T15:20:38Z

adrianeboyd
Apr 21, 2022

Hmm, I can't reproduce this. Can you provide more info about your platform with spacy info --markdown?

The main reason that users report pipelines hanging with multiprocessing is in linux with trf/transformer models, which is related to an issue in pytorch.

Does this hang for you (it's fine on my end with v3.2.3)?

import spacy
from spacy.matcher import Matcher
from datetime import datetime

nlp = spacy.blank("en")
start_time = datetime.now()
matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": {"IN": ["#", "$"]}, "OP": "+"}, {"TEXT": {"REGEX": "[A-Za-z]+"}}]
matcher.add("stockHashTag", [pattern])
docs = list(nlp.pipe(["a #AAAA"] * 1000, n_process=4, batch_size=100))
matches = [matcher(subdoc) for subdoc in docs]
outside = []
for t,sub in zip(matches,docs):
    inside = []
    for x in t:
        inside.append(sub[x[1]:x[2]])
    outside.append(inside)
end_time = datetime.now()
print(end_time - start_time)

4 replies

binh0206 Apr 21, 2022
Author

I appreciate your response. I apologize for missing the info. Here it is:

Info about spaCy

spaCy version: 3.2.4
Platform: Windows-10-10.0.19041-SP0
Python version: 3.6.8rc1
Pipelines: en_core_web_lg (3.2.0), en_core_web_trf (3.2.0)

I am using Python 3.8 for this code in an environment named nlp in Anaconda, I do not know why it shows Python 3.6.8rc1. I think might be because I type python -m spacy info --markdown in cmd Windows 10 command. The dataframe that I used has 2 columns and 5000 rows. I tried the code you mentioned above, but it is still freezing.

adrianeboyd Apr 22, 2022

It sounds like something might be wrong with your python installation or virtual environment. It's pretty unusual to be using a release candidate version like 3.6.8.rc1 and if you're expecting 3.8, then something in your environment might be misconfigured? Try to make sure that you've activated conda correctly and created a new environment before installing spacy.

binh0206 Apr 23, 2022
Author

I appreciate your answer. I create a new environment and it works; however, when I compare two environments, the only difference I observe is when I execute

import spacy

The old environment shows:

Backend QtAgg is interactive backend. Turning interactive mode on.

After a while, it shows an error:

Process finished with exit code -1073740791 (0xC0000409)

This issue does not show in the new environment.
I cannot figure out why.

adrianeboyd Apr 25, 2022

Glad to hear you got a new venv working! Sorry, it's really hard to diagnose/debug this kind of problem from a distance.

cphoover · 2023-01-26T05:54:10Z

cphoover
Jan 26, 2023

I'm having the same issue if n_process = 1 everything runs... but if I increase processing freezes

1 reply

danieldk Jan 26, 2023

Could you please open a new thread and post the output of spacy info --markdown? Could you also include a minimal code example to reproduce the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

nlp.pipe is freezed after changing n_process for spaCy 3.0 #10678

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

nlp.pipe is freezed after changing n_process for spaCy 3.0 #10678

Uh oh!

binh0206 Apr 20, 2022

Replies: 2 comments · 5 replies

Uh oh!

adrianeboyd Apr 21, 2022

Uh oh!

binh0206 Apr 21, 2022 Author

Info about spaCy

Uh oh!

adrianeboyd Apr 22, 2022

Uh oh!

binh0206 Apr 23, 2022 Author

Uh oh!

adrianeboyd Apr 25, 2022

Uh oh!

cphoover Jan 26, 2023

Uh oh!

danieldk Jan 26, 2023

binh0206
Apr 20, 2022

Replies: 2 comments 5 replies

adrianeboyd
Apr 21, 2022

binh0206 Apr 21, 2022
Author

binh0206 Apr 23, 2022
Author

cphoover
Jan 26, 2023