SpanRuler removes all existing labels if overwrite is set to True #11772
Answered
by
NixBiks
NixBiks
asked this question in
Help: Coding & Implementations
-
If I add a I'd expect the following code to pass import spacy
text = "Apple is opening its first big office in San Francisco."
nlp = spacy.blank("en")
ruler = nlp.add_pipe(
"span_ruler", config={"validate": True, "spans_key": None, "annotate_ents": True}
)
patterns = [
{"label": "ORG", "pattern": "Apple"},
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]},
]
ruler.add_patterns(patterns)
assert len(nlp(text).ents) == 2
nlp.add_pipe(
"span_ruler",
name="span_ruler2",
config={
"validate": True,
"spans_key": None,
"annotate_ents": True,
"overwrite": True,
},
)
assert len(nlp(text).ents) == 2 # <- this fails because it now has length 0 instead I'm not sure if this is intended or not? |
Beta Was this translation helpful? Give feedback.
Answered by
NixBiks
Nov 8, 2022
Replies: 1 comment 3 replies
-
Ahh it's probably intended. Instead I should not overwrite but replace the The following works import spacy
text = "Apple is opening its first big office in San Francisco."
nlp = spacy.blank("en")
ruler = nlp.add_pipe(
"span_ruler", config={"validate": True, "spans_key": None, "annotate_ents": True}
)
patterns = [
{"label": "ORG", "pattern": "Apple"},
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]},
]
ruler.add_patterns(patterns)
assert len(nlp(text).ents) == 2
ruler2 = nlp.add_pipe(
"span_ruler",
name="span_ruler2",
config={
"validate": True,
"spans_key": None,
"annotate_ents": True,
"overwrite": False,
"ents_filter": {"@misc": "spacy.prioritize_new_ents_filter.v1"},
},
)
ruler2.add_patterns([
{"label": "ABC", "pattern": "Apple"},
])
doc = nlp(text)
assert len(doc.ents) == 2
assert doc.ents[0].label_ == "ABC", doc.ents[0].label_ |
Beta Was this translation helpful? Give feedback.
3 replies
Answer selected by
adrianeboyd
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ahh it's probably intended. Instead I should not overwrite but replace the
ents_filter
with{"@misc": "spacy.prioritize_new_ents_filter.v1"}
The following works