Phrase Matcher space sensitive issue #4926

AnishaMohandass · 2020-01-20T05:10:18Z

AnishaMohandass
Jan 20, 2020

terms = ["Barack Obama", "Angela Merkel", "Washington, D.C."]
doc = nlp("German Chancellor Angela Merkel and US President Barack Obama "
"converse in the Oval Office inside the White House in Washington, D.C.")

If I enter an extra space between the words "Barack Obama", the phrase matcher does not work since it is space sensitive.
Is there a way to overcome this space sensitive issue?

Your Environment

Operating System: Windows 8
Python Version Used: 3.7
spaCy Version Used: 2.2.3
Environment Information: Conda

Answered by svlandeg

Jan 21, 2020

The PhraseMatcher is not just space sensitive - it really requires the terms to appear in the text exactly as in the terms list.

One option is to expand your terms list with lexical variants such as added spaces, but that's not a very elegant solution.

Another option is to pre-process your input texts and remove multiple spaces, if those are a frequent problem in your input text (do this before you do any spaCy processing at all).

A final option I can think of, is to look into Matching regular expressions on the full text. With regular expressions, you can succintly match on various lexical spelling variants.

View full answer

svlandeg · 2020-01-21T20:34:48Z

svlandeg
Jan 21, 2020

The PhraseMatcher is not just space sensitive - it really requires the terms to appear in the text exactly as in the terms list.

One option is to expand your terms list with lexical variants such as added spaces, but that's not a very elegant solution.

Another option is to pre-process your input texts and remove multiple spaces, if those are a frequent problem in your input text (do this before you do any spaCy processing at all).

A final option I can think of, is to look into Matching regular expressions on the full text. With regular expressions, you can succintly match on various lexical spelling variants.

0 replies

AnishaMohandass · 2020-01-22T04:18:02Z

AnishaMohandass
Jan 22, 2020
Author

Thanks for your response @svlandeg.
Pre-processing the input text to remove multiple spaces, works fine.

input_text = "German Chancellor Angela Merkel and US President Barack Obama converse in the Oval Office inside the White House in Washington, D.C."
sentence = re.sub(' +',' ', input_text)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phrase Matcher space sensitive issue #4926

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Phrase Matcher space sensitive issue #4926

Uh oh!

AnishaMohandass Jan 20, 2020

Your Environment

Replies: 2 comments

Uh oh!

svlandeg Jan 21, 2020

Uh oh!

AnishaMohandass Jan 22, 2020 Author

AnishaMohandass
Jan 20, 2020

svlandeg
Jan 21, 2020

AnishaMohandass
Jan 22, 2020
Author