-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
feat / spanrulerFeature: Entity and span rulerFeature: Entity and span rulerusageGeneral spaCy usageGeneral spaCy usage
Description
Applying 'LOWER' to span matcher does not seem to work well.
brands = ['nike corp', 'adidas corp']
(1) patterns = [{"label": "BRAND", "pattern": brand} for brand in brands]
In scenarios (1) and (2) does not detect reliably the brand names.
How to reproduce the behaviour
brands = ['nike corp', 'adidas corp']
nlp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("span_ruler")
#patterns = [{"label": "BRAND", "pattern": brand} for brand in brands]
patterns = [{"label": "BRAND", "pattern": [{"LOWER": brand }]} for brand in brands]
with nlp.select_pipes(enable="tagger"):
ruler.add_patterns(patterns)
print(patterns)
text = "nike corp is a brand."
doc1 = nlp(text)
print([(span.text, span.label_, span.start) for span in doc1.spans["ruler"]])
text = "NIKE CORP is a brand."
doc1 = nlp(text)
print([(span.text, span.label_, span.start) for span in doc1.spans["ruler"]])
Your Environment
- Operating System: Windows
- Python Version Used: 3.7.2
- spaCy Version Used: 3.4.1
- Environment Information:
Metadata
Metadata
Assignees
Labels
feat / spanrulerFeature: Entity and span rulerFeature: Entity and span rulerusageGeneral spaCy usageGeneral spaCy usage



