Skip to content
Discussion options

You must be logged in to vote

The PhraseMatcher is not just space sensitive - it really requires the terms to appear in the text exactly as in the terms list.

One option is to expand your terms list with lexical variants such as added spaces, but that's not a very elegant solution.

Another option is to pre-process your input texts and remove multiple spaces, if those are a frequent problem in your input text (do this before you do any spaCy processing at all).

A final option I can think of, is to look into Matching regular expressions on the full text. With regular expressions, you can succintly match on various lexical spelling variants.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / matcher Feature: Token, phrase and dependency matcher
2 participants