Entity ruler doesn't catch multi-token entities #12271
-
Hello! First off: Thanks for the great package! What I already checked:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Probably because of some old variables floating around (no idea how, I restarted the kernel multiple times) The code where each word is separated now works after restarting VS code completely. |
Beta Was this translation helpful? Give feedback.
-
Hi @SjoerdBraaksma, two issues with this:
If you make the following adjustments, this will work: ...
with nlp.select_pipes(enable="ner"):
ruler_names.add_patterns(
[
{"label": "ACHTERNAAM", "pattern": [{"LOWER": "lepelaar"}]},
{"label": "ACHTERNAAM", "pattern": [{"LOWER": "van"}, {"LOWER": "walderveen"}]}
]
)
...
# Test:
with nlp.select_pipes(enable=["sentencizer", "names_ruler"]):
doc1 = nlp("Ik ben Andre Lepelaar")
doc2 = nlp("Ik ben Andre van Walderveen")
for ents in doc1.ents:
print(ents, ents.label_)
print("---")
for ents in doc2.ents:
print(ents, ents.label_) |
Beta Was this translation helpful? Give feedback.
-
Converting this to a discussion, as this is a usage question and not an issue with spaCy. |
Beta Was this translation helpful? Give feedback.
Hi @SjoerdBraaksma, two issues with this:
LOWER:
should be in lowercase. I'm surprised that "Lepelaar" is recognized for you, because it doesn't work for when running your code (and I wouldn't expect it to).If you make the following adjustments, this will work: