Customizing default construction of lemmas and norms #9885
-
How are token lemmas and norms generated? I've changed norms on specific tokens in a pipeline component before, but I'd like to change them as a whole in my spaCy model in the following ways.
I'm not sure how to globally influence the lemmas and norms. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Norms and lemmas are completely separate from each other. What do you want to do with norms and lemmas? Norms are primarily used as input features for the models (all currency symbols are replaced with the norm The norms need to be something that you can generate without any previous statistical components in the pipeline, so you could create norms from lookup lemmas, but not from rule-based lemmas, since you need the norms to predict the POS for use with the rule-based lemmatizer. But you could use a lookup lemmatizer table to set custom norms for all the lexemes in the table in your vocab before processing, or you could set token norms however you want (from a table, with a method) with a pipeline component than runs before the statistical components. The lexeme norms are stored in a table in the vocab, so if you want generated norms, you should set token norms in a pipeline component. If you take a trained pipeline like Set a lexeme norm: nlp.vocab["WORD"].norm_ = "word" Set a token norm: doc[0].norm_ = "word" |
Beta Was this translation helpful? Give feedback.
Norms and lemmas are completely separate from each other. What do you want to do with norms and lemmas?
Norms are primarily used as input features for the models (all currency symbols are replaced with the norm
$
, in English British and American spellings for common words are mapped to the same form, etc.), while lemmas are typically just output and not used as model features (it's technically possible with some extra settings, but no components useLEMMA
as a feature by default).The norms need to be something that you can generate without any previous statistical components in the pipeline, so you could create norms from lookup lemmas, but not from rule-based lemmas, since you need the …