Skip to content
Discussion options

You must be logged in to vote

Hi, to handle cases like this you want to add a custom norm value for the lexeme (https://spacy.io/api/lexeme#attributes). The norm is used as a model feature to make it easier for the model to generalize across variants like this, also for things like favorite / favourite.

We already do this for some currency symbols by default for all languages, treating them all as $:

"€": "$",
"£": "$",
"¥": "$",
"฿": "$",
"US$": "$",
"C$": "$",
"A$": "$",
"₺": "$",
"₹": "$",
"৳": "$",
"₩": "$",
"Mex$": "$",
"₣": "$",
"E£": "$",

So to have a similar entry for Rs

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants
Converted from issue

This discussion was converted from issue #7042 on February 12, 2021 08:43.