Phrase Matcher space sensitive issue #4926
-
|
terms = ["Barack Obama", "Angela Merkel", "Washington, D.C."] If I enter an extra space between the words "Barack Obama", the phrase matcher does not work since it is space sensitive. Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
The One option is to expand your Another option is to pre-process your input texts and remove multiple spaces, if those are a frequent problem in your input text (do this before you do any spaCy processing at all). A final option I can think of, is to look into Matching regular expressions on the full text. With regular expressions, you can succintly match on various lexical spelling variants. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for your response @svlandeg. input_text = "German Chancellor Angela Merkel and US President Barack Obama converse in the Oval Office inside the White House in Washington, D.C." |
Beta Was this translation helpful? Give feedback.
The
PhraseMatcheris not just space sensitive - it really requires the terms to appear in the text exactly as in thetermslist.One option is to expand your
termslist with lexical variants such as added spaces, but that's not a very elegant solution.Another option is to pre-process your input texts and remove multiple spaces, if those are a frequent problem in your input text (do this before you do any spaCy processing at all).
A final option I can think of, is to look into Matching regular expressions on the full text. With regular expressions, you can succintly match on various lexical spelling variants.