Spacy Regular Expression #10258
-
I am trying to match a pattern in using spacy's entity ruler. My code: name = "logistics"
i = re.sub(r"[\([{})\]]", "", name)
i = re.split('(\W)',i)
i = list(filter(lambda x: x != ' ', i))
up_pt = [{'LOWER': {'REGEX': f'{i[j]}\w?'}} if i[j] not in ['/','-'] else {'IS_PUNCT': True, 'OP': '?'} for j in range(0,len(i))]
pt = [{"label":"LO","pattern":up_pt}]
#print(pt)
nlp = spacy.blank("en")
ruler = nlp.add_pipe('entity_ruler')
ruler.add_patterns(pt)
doc = nlp("It is logisticssk")
print("Entities", [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]) Output: I try map words like logistics or someone misspelled logisticsk. So I have added zero or one occurrence. But when I type logistics+sk, that means two occurrences after logistic, it still matches the pattern. Is my syntax incorrect? If so, then what is the correct syntax for zero or one occurrence? I also want to know the syntax for one or more, zero or more occurrences. It would be great to know more documentation in addition to it. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey @tansifanzar, thanks for the question. First, I would try to use a single regex to capture what you need. What happens right now with your chain of commands and the conditional Second, you can test out regular expressions with an online tool. I usually use www.regex101.com. There you type your regex at the top, type in an example in the box below, and you'll see an explanation of what your regular expression is doing on the right. Given all that, what you might want is a regular expression like |
Beta Was this translation helpful? Give feedback.
Hey @tansifanzar, thanks for the question.
First, I would try to use a single regex to capture what you need. What happens right now with your chain of commands and the conditional
if
inside the match rule means it will be very difficult to trace back what was actually matched when your matcher applies.Second, you can test out regular expressions with an online tool. I usually use www.regex101.com. There you type your regex at the top, type in an example in the box below, and you'll see an explanation of what your regular expression is doing on the right.
Given all that, what you might want is a regular expression like
logistic\S+
which will match the stringlogistic
, then any non-whites…