Add custom token to the nlp #7045
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi, to handle cases like this you want to add a custom We already do this for some currency symbols by default for all languages, treating them all as spaCy/spacy/lang/norm_exceptions.py Lines 45 to 58 in 26bf642 So to have a similar entry for nlp.vocab["Rs"].norm_ = "$" If you save this model with Some pretrained pipelines have additional language-specific norms, which are available in the package |
Beta Was this translation helpful? Give feedback.
-
It works line charm. Thanks a lot for your quick reply 👍 . |
Beta Was this translation helpful? Give feedback.
Hi, to handle cases like this you want to add a custom
norm
value for the lexeme (https://spacy.io/api/lexeme#attributes). Thenorm
is used as a model feature to make it easier for the model to generalize across variants like this, also for things likefavorite
/favourite
.We already do this for some currency symbols by default for all languages, treating them all as
$
:spaCy/spacy/lang/norm_exceptions.py
Lines 45 to 58 in 26bf642
So to have a similar entry for
Rs