Not sure how Matcher with alignments is supposed to work #10818
-
Hello! Today, I discovered a For example, I want to match hashtags, but I want to cover more cases that are listed in the documentation example.
As you can see, I can distinguish two patterns only because of the match length, but if the length was the same it wouldn't be possible. Maybe I am not supposed to put several different patterns under one match_id? But in that case, why are we able to specify multiple patterns at the same time? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
You're not missing anything here. It's fine to provide multiple patterns for a single The issue here is that while the Matcher has been in spaCy for a long time, alignments were not initially planned as a feature and were only added relatively recently. Unfortunately they don't cover this kind of case where it's not clear how they map back because you have multiple patterns. That said, in this case you can check whether the first token of your match is a If you really need to differentiate two patterns, then it would be better to add them with different labels. |
Beta Was this translation helpful? Give feedback.
You're not missing anything here. It's fine to provide multiple patterns for a single
match_id
, but it's true it doesn't always work well with alignments.The issue here is that while the Matcher has been in spaCy for a long time, alignments were not initially planned as a feature and were only added relatively recently. Unfortunately they don't cover this kind of case where it's not clear how they map back because you have multiple patterns.
That said, in this case you can check whether the first token of your match is a
#
or not to tell between your two patterns.If you really need to differentiate two patterns, then it would be better to add them with different labels.