German lemmatizer confused by capitalization #9466
giopina
started this conversation in
Language Support
Replies: 1 comment
-
The current extremely simple lookup lemmatizer is just not very good, a closely related discussion: #8695 (comment). It doesn't know anything about POS, casing, or spelling variation. We do have some good news to report, though: we have internal work-in-progress on a statistical lemmatizer that should be much, much better than the lookup lemmatizer, which was only meant to be a stopgap solution and has been the default for German for way too long at this point. With the new lemmatizer, the accuracy on TIGER is ~97%. Keep an eye out for an official announcement soon! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm having issues with the lemmatizer for German (both in v3.0.6 and v2.3.2, using both
de_core_news_lg
andde_dep_news_trf
).Basically, the lemmatizer gets confused by the capitalization of the verb, and can't assign the right lemma to it (while the POS tagger is actually correct).
Code:
Output:
(the lemma of meldet should be melden)
Is there some general solution/fix to this issue?
Originally posted by @giopina in #2668 (comment)
Beta Was this translation helpful? Give feedback.
All reactions