How to train the trainable lemmatizer #10806
-
Hi, I'd like to train the trainable lemmatizer from spacy 3.3. I have added the correct lemma labels to the dataset by assigning the lemma value for all tokens like this:
I have not initialized the labels of the lemmatizer yet and I wanted to generate the lemmatizer labels json file like described here using init labels but that results in this error, which kind of feels like a loop:
And I haven't been able to find out how to provide a representative batch of examples either. Here is the config file I'm using: So the question is, what am I doing wrong and how can I initialize the trainable lemmatizer component? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
Be aware that Or you can skip this and just provide your training corpus with If you want to use from spacy.training import Corpus
nlp = spacy.blank("en")
examples = list(Corpus("/path/to/train.spacy")(nlp))
nlp.initialize(lambda: examples) It is on our to-do list to improve all the |
Beta Was this translation helpful? Give feedback.
Be aware that
spacy.read_labels.v1
fails silently unless you addrequire = true
, so it could be something as simple as an incorrect path.Or you can skip this and just provide your training corpus with
spacy train
, though, and that will also initialize the labels fromtrain.spacy
.If you want to use
nlp.initialize
, then it looks like this:It is on our to-do list to improve all the
Component.initialize
docs because they don't really show how to do it properly.