Skip to content
Discussion options

You must be logged in to vote

The English models use a rule-based lemmatizer based on the POS, but POS can be incorrect or the rules might not be 100% correct in all cases.

In this case, I think the POS is probably incorrect due to the missing article ("a storytelling game") and if I add the article with en_core_web_sm (v3.4.0) I do get storytelling/NOUN.

If this is a frequent case or a problem for you task, you can add exceptions to the lemmatizer like this, here an individual exception for "storytelling" as a verb:

nlp = spacy.load("en_core_web_sm")
nlp.get_pipe("lemmatizer").lookups.get_table("lemma_exc")["verb"]["storytelling"] = ["storytelling"]

There's a lemmatizer cache, so you need to do this before processin…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@pejmannavi
Comment options

Answer selected by pejmannavi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants
Converted from issue

This discussion was converted from issue #11161 on July 21, 2022 11:57.