Skip to content
Discussion options

You must be logged in to vote

The lemmas and the normalizations come from two separate sources that may or may not be in sync depending on the language defaults and pipeline configuration. There were some regressions in lemmas for contractions in the v3.0.0 pretrained pipelines vs. the v2.3.x pipelines. In the upcoming v3.1.0 models, lemmas for contractions in English will be improved to be more like the v2.3.x models.

If you want to modify the normalizations or lemmas provided by an existing pipeline, there's no good alternative to making manual changes in some form, modifying language defaults, lemmatization tables, attribute ruler rules, or adding a custom component, etc. In this case, my first recommendation would…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@phipsgabler
Comment options

@adrianeboyd
Comment options

@phipsgabler
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / en English language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants
Converted from issue

This discussion was converted from issue #8615 on July 07, 2021 06:49.