Skip to content

LOWER not working for span matcher  #11814

@iamyihwa

Description

@iamyihwa

Applying 'LOWER' to span matcher does not seem to work well.

brands = ['nike corp', 'adidas corp']
(1) patterns = [{"label": "BRAND", "pattern": brand} for brand in brands]

image

(2) patterns = [{"label": "BRAND", "pattern": [{"LOWER": brand }]} for brand in brands]

image

In scenarios (1) and (2) does not detect reliably the brand names.

How to reproduce the behaviour

brands = ['nike corp', 'adidas corp']
nlp = spacy.load("en_core_web_sm")
ruler = nlp.add_pipe("span_ruler")
#patterns = [{"label": "BRAND", "pattern": brand} for brand in brands]
patterns = [{"label": "BRAND", "pattern": [{"LOWER": brand }]}   for brand in brands]
with nlp.select_pipes(enable="tagger"):
    ruler.add_patterns(patterns)
print(patterns)

text = "nike corp is a brand."
doc1 = nlp(text)
print([(span.text, span.label_, span.start) for span in doc1.spans["ruler"]])

text = "NIKE CORP is a brand."
doc1 = nlp(text)
print([(span.text, span.label_, span.start) for span in doc1.spans["ruler"]])

image

image

Your Environment

  • Operating System: Windows
  • Python Version Used: 3.7.2
  • spaCy Version Used: 3.4.1
  • Environment Information:

Metadata

Metadata

Assignees

No one assigned

    Labels

    feat / spanrulerFeature: Entity and span rulerusageGeneral spaCy usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions