config file - [initialize] vectors = null meaning #11575
-
Hi, I am trying to train new NER models from scratch, for a number of existing spaCy pipelines (en_core_web_lg, and en_core_sci_lg + en_core_sci_scibert from scispaCy, if you need to know). To generate the config files for training new NER models in these pipelines, I used the config generation script from Explosion's ner_demo_replace project (https://github.com/explosion/projects/blob/v3/pipelines/ner_demo_replace/scripts/create_config.py). I noticed that in either [initialize] or [paths] section of my config files, vectors is given the value 'null'. Does this mean spaCy is also training the word vectors for these pipelines from scratch? I want to take advantage of the pre-trained embeddings for each model, so this would not be ideal. I am copy and pasting one such config file I've generated using this script.
Thank you in advance - cheers! :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
There are two kinds of vectors in spaCy:
The Note that in generated configs, usually vectors will have a value if you are not using a GPU and choose "accuracy". In this case it sounds like you want to use whatever you can from the existing pipelines - in that case I would recommend using the word vectors by simply writing the pipeline name in About your config - you're sourcing and then freezing many components, but if you want to train new NER models and add them to that pipeline, I would recommend you train the NER components by themselves one at a time, with no sourced components, just using the word vectors. Then you can assembled your NER components and the original pipeline into one pipeline; this example project may be helpful. Also note that when sourcing, you should replace listeners on any statistical components, which in this case would include the parser, tagger, and (if you want it) existing NER. |
Beta Was this translation helpful? Give feedback.
There are two kinds of vectors in spaCy:
The
vectors
entry in the config, as well asuse_static_vectors
, refers to word vectors. If those are null, spaCy will just not use word vectors at all, and use other features of tokens as input to the tok2vec/…