Additional vectors usage in training process to increase model accuracy #7776

notiqq · 2021-04-13T23:10:13Z

notiqq
Apr 13, 2021

Hi,

I'm looking for potential ways to increase trained model accuracy and was thinking that the increasing amount of vectors might help.
During the model training process for Ukrainian language I have noticed that spaCy has downloaded package with data (probably I would assume it was vectors package for Ukrainian language??) But in documentation, it says that spaCy works on the base of tok2vec vectors that are actually generated in pretraining process. And it actually means that there is no direct way to substitute existing vectors package with something else?
Could you please clarify whether it's pretrained actually or it uses some additional tokens that being downloaded?
And maybe there will be another suggestions on potential ways to increase model accuracy?
P.S. for training process the standard UD package for Ukrainian language has been used.

Answered by polm

Apr 14, 2021

The way vectors are used in spaCy is that pretrained vectors are used as input when training a tok2vec model, which is then used as the input to other pipeline components like the tagger, parser, and NER. The pretrained vectors are also exposed in the final pipeline and can be used in similarity calculations, for example.

For the pretrained models, the small models do not contain static vectors, but medium and large ones do. All models have a trained tok2vec layer.

You can add existing vectors to a model, but you need to convert them first with spacy init vectors. You can read more about how to specify custom vectors for training models here, or how the static vectors are used here.

And …

View full answer

polm · 2021-04-14T05:40:52Z

polm
Apr 14, 2021

The way vectors are used in spaCy is that pretrained vectors are used as input when training a tok2vec model, which is then used as the input to other pipeline components like the tagger, parser, and NER. The pretrained vectors are also exposed in the final pipeline and can be used in similarity calculations, for example.

For the pretrained models, the small models do not contain static vectors, but medium and large ones do. All models have a trained tok2vec layer.

You can add existing vectors to a model, but you need to convert them first with spacy init vectors. You can read more about how to specify custom vectors for training models here, or how the static vectors are used here.

And maybe there will be another suggestions on potential ways to increase model accuracy?

Generally I would suggest using the quickstart with the "accuracy" option to establish a baseline, and then clarify what you specifically want to improve the accuracy of.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Additional vectors usage in training process to increase model accuracy #7776

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Additional vectors usage in training process to increase model accuracy #7776

Uh oh!

notiqq Apr 13, 2021

Replies: 1 comment

Uh oh!

polm Apr 14, 2021

notiqq
Apr 13, 2021

polm
Apr 14, 2021