Wrong location detection in Spanish #8778
Replies: 1 comment 1 reply
-
First, for general notes about wrong model predictions, please see #3052. It's important to understand that the models are statistical and will be wrong sometimes, even in apparently simple cases. This is a bit of a special case because this isn't really a wrong prediction so much as undesirable tokenizer behavior. spaCy is designed with newspaper articles as the default model of text, and the way your text is punctuated (the spaces) is kind of unusual. It looks like the behavior is the same in English and Spanish for hyphens like you have, and that means that spaCy can't apply entity labels to sub-parts of a token, so you need to modify the way the tokenizer works to fix this. I would take a good look at the tokenizer docs. I think you can fix this by adding |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
How to reproduce the behaviour
With this simple code:
the output is:
as you can see, it detects India- instead of India as a location
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions