Incorrect lemmas for Italian language #7939
acazzaro
started this conversation in
Language Support
Replies: 1 comment
-
Hi, the current Italian lemmatizer is very simple lookup lemmatizer with one big table for all words with no information about parts-of-speech, so in cases that are ambiguous, it just has one fixed lemma for each form, which may not correspond to the usage in the text. A user has been working on an improved lemmatizer that uses better POS-based lookup tables, see: #7824 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I think there might be a bug in the lemmatizer for Italian language. When I try to access the lemma_ of a NOUN, the result is the lemma of a VERB. Below my code:
title = "Gli studi sul Covid: creato un 'super-anticorpo' che blocca anche la variante indiana"
description = "Lo studio europeo sul nuovo monoclonale pubblicato su Nature"
Result:
studi
anticorpo
variante
studio
monoclonale
nature
['studio', 'anticorpo', 'variare', 'studiare', 'monoclonale', 'natura']
As you can see both NOUNS 'variante' and 'studio' have their lemma transformed into a VERB 'variare' and 'studiare'. The plural of 'studio', 'studi' is lemmatized correctly into 'studio'. It seems the issue is only for singular NOUNS.
Thanks
My Environment:
Beta Was this translation helpful? Give feedback.
All reactions