Adding lemmatizer and ner to pipeline #7091
Replies: 2 comments 4 replies
-
The attribute ruler doesn't include rules by default, so you'll need to add rules that map/copy If you have separate training corpora, it works best to have separate We usually train separate pipelines for each corpus (so no freezing, just one config with tagger+parser, one config with ner) and then use the |
Beta Was this translation helpful? Give feedback.
-
Hi @adrianeboyd, as you can imagine I managed to train my model following your intructions, thank you for your help :) Now I have another question. I want to make a spacy project with my pipeline so that you might consider adding Swedish to your core models. I was looking at the existing transformer models and even uncompressed they are at most ~500 MB. My model would have one transformer for the tagger+parser and one for ner, resulting in almost 1GB. Thank you again! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to train a pipeline for Swedish that will do tagging, parsing, lemmatizing, sentence segmentation and ner.
I have to use different datasets for the tagger and parser on the one hand and the ner component on the other hand because I don't have everything annotated on the same data.
I have a couple of questions:
I can't seem to be able to add a lemmatizer to the already trained tagger. I sourced the tagger from a trained model and I added an
AttributeRuler
to the pipeline but I still get the warning that the lemmatizer won't work. Does the lemmatizer need to be added to the pipeline at the same time as the tagger? I am hypothesizing that since the tagger is frozen in my pipeline theAttributeRuler
isn't doing anything andtoken.pos_
is still empty.I keep getting this warning that the performance of the tagger and parser will be degraded if I freeze them and keep training the transformer alone. As I understand it I need the transformer for the ner component so I can't freeze it as well. I tried to use
replace_listeners = ["model.tok2vec"]
to make a copy and decouple the tagger and parser from the transformer but it doesn't seem to be working. What is it that I am not getting?Beta Was this translation helpful? Give feedback.
All reactions