Skip to content
Discussion options

You must be logged in to vote

What you're trying to do makes sense, but you have to take into account that the Matcher always matches on Token level. So the expression {"ENT_TYPE": "GPE"} matches exactly one Token which is part of a GPE entity, which is why you're getting just "Francisco" and just "New" instead of the full entity. Because each entity consists of two tokens.

To match more than one token, you can use the + operator like so:
patterns = [{"ENT_TYPE": "GPE", "OP": "+"}, {"ORTH": "+"}, {"ENT_TYPE": "GPE", "OP": "+"}]
Before 2.1.0, this operator would behave greedily and would pretty much return exactly what you want. Unfortunately because of possible mixing of operators, this greedy behaviour was not consis…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / matcher Feature: Token, phrase and dependency matcher
2 participants