Skip to content
Discussion options

You must be logged in to vote

Hi @iamyihwa, the components of the patterns passed on span_ruler have to correspond to spaCy's tokenization. I.e. each part of your pattern has to align with a token as recognized by spaCy.

If you want to want to match for spans consisting of multiple tokens (such as "nike corp" or "adidas corp"), the pattern has to reflect this (see here for another example with a pattern for "san francisco"). So instead of

patterns = [{"label": "BRAND", "pattern": [{"lower": "nike corp"}]}]

you'll want to do:

patterns = [{"label": "BRAND", "pattern": [{"lower": "nike", "lower": "corp"}]}]

In your example you can replace

patterns = [{"label": "BRAND", "pattern": [{"LOWER": brand }]}   for brand in brands]

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@iamyihwa
Comment options

@iamyihwa
Comment options

@rmitsch
Comment options

@iamyihwa
Comment options

Answer selected by rmitsch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / spanruler Feature: Entity and span ruler
2 participants
Converted from issue

This discussion was converted from issue #11814 on November 16, 2022 15:55.