Skip to content
Discussion options

You must be logged in to vote

Hi, the lemmas depend on the POS (token.pos_), so it depends on whether this is tagged as ADJ or PROPN, so I think it's currently the expected results that you'd get American for "He is an American" and american for "He is an American citizen". But also see #3052 and be aware that tagging errors can lead to unexpected lemmas in some cases.

The issue is that the relatively simple ADJ rules in the rule-based lemmatizer treat "happy" and "American" in the same way and lowercase both. If you really need the lemma "American" here, you can add exceptions to the adj table in the lemma exceptions table stored here: nlp.get_pipe("lemmatizer").lookups.get_table("lemma_exc")

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@danmysak
Comment options

@adrianeboyd
Comment options

@danmysak
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / en English language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants
Converted from issue

This discussion was converted from issue #9051 on August 26, 2021 06:33.