Possible to get partial matches from PhraseMatcher? #10118
-
Suppose I have the following pattern texts in the PhraseMatcher.
Given a Doc with I was thinking of implementing a trie based matcher, but it turns out the PhraseMatcher's implementation already uses a trie behind the scenes. I basically want the furthest path into the trie, and the non matching patterns beyond that point. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
No, there is no feature like that. What you can do is individually tokenize each pattern and register the subpatterns. You'll have to handle when one prefix is shared between multiple patterns, but otherwise it should be pretty straightforward, if tedious. (I thought about making things after the first optional, but then you have to deal with things like "A B C" getting a partial match on "A C" and the like.) I don't think I've ever seen a feature like that in another trie-based matcher either, though that's probably because if you're matching characters partial matches are so common as to be uninteresting. I would need to look at it to see if the functions are structured the right way, but it might be possible to re-use our trie implementation to get the longest match at any point in the Doc, though it might require interfacing with Cython. |
Beta Was this translation helpful? Give feedback.
No, there is no feature like that.
What you can do is individually tokenize each pattern and register the subpatterns. You'll have to handle when one prefix is shared between multiple patterns, but otherwise it should be pretty straightforward, if tedious. (I thought about making things after the first optional, but then you have to deal with things like "A B C" getting a partial match on "A C" and the like.)
I don't think I've ever seen a feature like that in another trie-based matcher either, though that's probably because if you're matching characters partial matches are so common as to be uninteresting.
I would need to look at it to see if the functions are structured the right way, b…