Incorrect lemma casing for English proper adjectives #9056
-
How to reproduce the behaviour
returns Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi, the lemmas depend on the POS ( The issue is that the relatively simple |
Beta Was this translation helpful? Give feedback.
Hi, the lemmas depend on the POS (
token.pos_
), so it depends on whether this is tagged asADJ
orPROPN
, so I think it's currently the expected results that you'd getAmerican
for "He is an American" andamerican
for "He is an American citizen". But also see #3052 and be aware that tagging errors can lead to unexpected lemmas in some cases.The issue is that the relatively simple
ADJ
rules in the rule-based lemmatizer treat "happy" and "American" in the same way and lowercase both. If you really need the lemma "American" here, you can add exceptions to theadj
table in the lemma exceptions table stored here:nlp.get_pipe("lemmatizer").lookups.get_table("lemma_exc")