LOWER not working for span matcher #11815
-
Applying 'LOWER' to span matcher does not seem to work well. brands = ['nike corp', 'adidas corp'] In scenarios (1) and (2) does not detect reliably the brand names. How to reproduce the behaviour
Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi @iamyihwa, the components of the patterns passed on If you want to want to match for spans consisting of multiple tokens (such as "nike corp" or "adidas corp"), the pattern has to reflect this (see here for another example with a pattern for "san francisco"). So instead of patterns = [{"label": "BRAND", "pattern": [{"lower": "nike corp"}]}] you'll want to do: patterns = [{"label": "BRAND", "pattern": [{"lower": "nike", "lower": "corp"}]}] In your example you can replace patterns = [{"label": "BRAND", "pattern": [{"LOWER": brand }]} for brand in brands] with patterns = [{"label": "BRAND", "pattern": [{"lower": token} for token in brand.split()]} for brand in brands] to obtain correct results. |
Beta Was this translation helpful? Give feedback.
Hi @iamyihwa, the components of the patterns passed on
span_ruler
have to correspond to spaCy's tokenization. I.e. each part of your pattern has to align with a token as recognized by spaCy.If you want to want to match for spans consisting of multiple tokens (such as "nike corp" or "adidas corp"), the pattern has to reflect this (see here for another example with a pattern for "san francisco"). So instead of
you'll want to do:
In your example you can replace