Spanish lemmatizer doesn't work for future tense verbs #10376
buhrmann
started this conversation in
Language Support
Replies: 1 comment 4 replies
-
I had a look at the underlying lemmatizer rules and I think this case would work correctly if the import spacy
nlp = spacy.load("es_core_news_sm")
doc = nlp("trabajaremos")
assert str(doc[0].morph) == "Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin"
doc[0].set_morph("Mood=Ind|Number=Plur|Person=1|Tense=Fut|VerbForm=Fin")
assert nlp.get_pipe("lemmatizer").rule_lemmatize(doc[0]) == ['trabajar'] The training corpus probably does not contain a lot of 1st person or future verbs, so the I think that improving the morph tags would improve the lemmas for this kind of case, but it's not as simple as fixing as lemmatizer rules unfortunately. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How to reproduce the behaviour
The following applies to the attached text: Castilla y León Programa electoral 2022.txt
As you can see almost all lemmas are wrong, adding an extra "er" suffix to the correct lemma form.
Here is more detail on one particular verb in context:
Your Environment
Environment Information:
es.meta
: es_meta.json.txtes.config
: es_config.json.txtBeta Was this translation helpful? Give feedback.
All reactions