How can I use SpaCy Matcher (or PhraseMatcher) class for the extracting the sequence of 2 items? #10120
-
I'm trying to move from NLTK to Spacy, and one of the functionalities I need is matching "subtrees" with regex. In the simple cases Matcher is doing just fine:
The problem starts when I need to match only one of the groups. For instance, if I need a noun following an an adjective, but I only want to match the noun and not the entire pattern. In a simple regex, I would put the desired group in parenthesis like so (with an imaginary function):
My temporary solution is to grab only some of the tokens in a callback function, like this:
However, this solution suffers from several problems: I want to extract the list of patterns to an external source, so the callback function must be the same for all, though for each pattern I need to select a different group (sometimes the first, sometimes the second, sometimes the entire pattern). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I think you just need to use the |
Beta Was this translation helpful? Give feedback.
I think you just need to use the
with_alignments
feature, which will give you a list that tells you which rule in the input pattern matches each token in the match. It's a relatively new feature but will let you map your matched tokens back to where in the rule they match, so you can make the non-required parts optional.