LIKE_NUM
behavior is inconsistent for English.
#10498
-
I am interested in using import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
text = [
"three hundred and sixty five days",
"fifty days",
"45,646 days",
"45, 646 days",
"3 years is 1,095 days",
]
matcher = Matcher(nlp.vocab)
pattern = [{'LIKE_NUM': True}, {'OP': "+"}, {'LOWER': 'days'}]
matcher.add('num', [pattern], greedy="LONGEST")
for doc in nlp.pipe(text, disable=['ner']):
print(f"# {doc.text}")
for token in doc:
print(f"{token.text}\t{token.tag_}\tlike_num={token.like_num}")
matches = matcher(doc)
for m in matches:
print(f"MATCH: {doc[m[1]:m[2]]}")
if not matches:
print("MATCH: none")
print() Output:
Info about spaCy
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Removing
|
Beta Was this translation helpful? Give feedback.
-
Apologies, I realize now that The docs are pretty clear about how the |
Beta Was this translation helpful? Give feedback.
-
That's right, this is the expected behavior for that pattern. You might want to match any token at a certain position ( |
Beta Was this translation helpful? Give feedback.
That's right, this is the expected behavior for that pattern. You might want to match any token at a certain position (
{}
is also a valid token dict) or one or more tokens between two other tokens, so I don't think a warning or error makes sense here.