Does lemmatization of training data helps to improve real performance #8407
Pandalei97
started this conversation in
Help: Best practices
Replies: 1 comment
-
By default lemmas are not used in training any models. I think it's possible to specify them as a feature to use in tok2vec, though you'd have to put a lemmatizer before the tok2vec to make them available. (Technically the extra attributes are just for subword features, but I think any attribute can be used.) If you're working with English I think the most useful types of words for NER or textcat don't usually inflect, so using lemmas doesn't help you much. An exception would be if you have noisy data that has a lot of case issues, in which case you can just use lower instead. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi ! I'm thinking of training NER models and textcat models with lemmatized data, in order to improve the scores of my models.
When the NER/textcat models do predictions, are they using the lemmatized form or the raw form of the doc ? If they just use the raw form, even if my models get good scores, it may not be good for practical use.
Beta Was this translation helpful? Give feedback.
All reactions