Skip to content
Discussion options

You must be logged in to vote

The type of lemmatizer varies across languages, so check nlp.get_pipe("lemmatizer").mode to see for sure for a particular pipeline. Some are rule-based and some are lookup or POS-based lookup lemmatizers, and some languages have their own customizations for what a mode like rule does.

The en_core pipelines include the default English rule-based lemmatizer, and the rule-based lemmatizers depend on token.pos, so typically what's happening in cases like this is that the tagger has made an error between NOUN / PROPN or NOUN / VERB or ADJ / VERB so different rules are applied. Very short phrases like are more likely to be tagged incorrectly than words with more context.

In your example, look a…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / en English language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants
Converted from issue

This discussion was converted from issue #9185 on September 17, 2021 08:30.