Italian lemmatizer low performance on agglitinated verbs #12911
Replies: 1 comment 1 reply
-
The provided You can try switching from the trainable lemmatizer to the default lemmatizer as described here: https://spacy.io/models/#design-modify . This is a POS-lookup-based lemmatizer, so in cases where the POS is wrong the lemma will frequently also be wrong, but you can always easily see exactly which lemmas are provided by the lookup tables: https://github.com/explosion/spacy-lookups-data/tree/master/spacy_lookups_data/data |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I recently used Spacy 3.4.4 to classify Italian verbs, but ran into the following problem using the pretrained model it_core_news_lg:
`
sentence = "aprimi la porta"
--- output ---
`
Sadly, the lemmatizer recognizes the verb "aprimi" as an adjective and in other cases it fails to recognize the right conjugation (I used "leggimi un libro" as sentence and Spacy said that "leggimi" comes from the verb "leggimare").
In general it seems that spacy has difficulty recognizing agglutinated verbs that involve pronouns. I tried to update Spacy to version 3.6.1, but the problem persists.
Is there any reason that explains it?
Many thanks!
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions