More information on transformers and hyperparameter tuning #6479

kommerzienrat · 2020-12-02T08:58:10Z

kommerzienrat
Dec 2, 2020

Hello, this concerns spacy-nightly (v3).
In general, I have got three questions regarding how things are being handled in the new spacy, but at first let me thank you for your outstanding work!

How can I continue training a pretrained transformer model? I guess that the pretrain command would be correct here, e.g. for building some kind of a SciBERT out of BERT. That is, continue pretraining a language model on a new domain such as scientific texts, user reviews, etc. But is the training process here the same as used in huggingface's transformers library? In what ways do they differ?
The same applies to the fine-tuning of a transformer. It seems like training a tagger similar to NER is done with the 'usual' spacy way. But what way exactly is that compared to the approach that transformers usually have, i.e. huggingface transformers. Is the last layer different here? I am asking because this must be known when reporting what architecture was used building a tagger.
Is there (yet) a way to tune/search hyperparameters automatically? I know about [spacy-ray](https://github.com/explosion/spacy-ray), but that doesn't seem to do that yet. Transformers have just introduced a hyperparameter tuning method (also using ray).

I am currently building my first project in spacy v3 and of course I read the documentation extensively. :) Especially this was helpful so far, but also this and especially this concerning transformers in spacy. I also came across these issues that may be related to the transformers: #5408 #4204
However, I was not able to answer my questions myself, so I am asking them here. Thanks in advance!

svlandeg · 2020-12-03T16:33:42Z

svlandeg
Dec 3, 2020

Hi! Happy to hear you've found the documentation useful. I'll try to fill in some remaining gaps. I can't really comment on the difference with HuggingFace's implementations though, because I don't know their codebase all that well.

1. How can I continue training a pretrained transformer model? I guess that the `pretrain` command would be correct here, e.g. for building some kind of a `SciBERT` out of `BERT`. That is, continue pretraining a language model on a new domain such as scientific texts, user reviews, etc. But is the training process here the same as used in huggingface's `transformers` library? In what ways do they differ?

Hm, I guess in theory you should be able to run spacy's pretraining on a transformer model. I'm really unsure how that will work out though. In general we've had two main use-cases in mind:

Have a pretrained HF transformer and tune it by backpropagating the loss from an upstream component such as an NER or textcat.
Use a more "simple" Tok2vec layer from spaCy (also for use on CPU) and pretrain that in case you have little annotated data for your upstream challenge.

I've recently proposed a change that would allow you to specify the layers you'd add to the tok2vec component for pretraining, so that would in theory allow you to implement whatever similarity objective you'd like: PR #6451

2. The same applies to the fine-tuning of a transformer. It seems like training a tagger similar to NER is done with the 'usual' `spacy` way. But what way exactly is that compared to the approach that `transformers` usually have, i.e. huggingface `transformers`. Is the last layer different here? I am asking because this must be known when reporting what architecture was used building a `tagger`.

The losses that are being backpropagated to tune the transformer, indeed depend on the type of challenge and architecture you're building on top of it. I think spaCy 3 provides a huge improvement here because the model architectures are now very explicitely recorded in the config file. You can find the pre-defined ones here and there implementations in code can be found in this folder.

You can also implement your own - cf the documentation here (I know you said you read the docs, but I also know there are a lot of them ;-))

3. Is there (yet) a way to tune/search hyperparameters automatically? I know about `[spacy-ray](https://github.com/explosion/spacy-ray)`, but that doesn't seem to do that yet. Transformers have just [introduced a hyperparameter tuning method](https://huggingface.co/blog/ray-tune) (also using [ray](https://docs.ray.io/en/latest/tune/api_docs/execution.html#tune-run)).

We currently don't have functionality to do automated hyperparameter search. With the config system though, it should be relatively easy to set that up yourself. https://github.com/explosion/projects/tree/v3/integrations/wandb contains an example project where we programmatically create variants of the config to run a simple hyperparameter grid search, logging the results with Weights & Biases. It would be better to properly parallellize that though ;-)

I hope that gives you some useful pointers!

0 replies

kommerzienrat · 2020-12-04T18:54:21Z

kommerzienrat
Dec 4, 2020
Author

Thank you!
So, my main take-aways are:

I cannot pretrain a transformer using the Tok2vec layer, i.e. language model training of a transformer is not possible here (but indeed the training of the Tok2vec layer of spacy).
I can fine-tune a transformer using the spacy-layers that are provided, not the original layer that was added by the authors of BERT or huggingface transformers, etc.
There is no way to auto-tune / search / optimize hyperparameters entirely, but spacy-ray and wandb-support are underway or implemented.
Thanks for pointing this informaiton out!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

More information on transformers and hyperparameter tuning #6479

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

More information on transformers and hyperparameter tuning #6479

Uh oh!

kommerzienrat Dec 2, 2020

Replies: 2 comments

Uh oh!

svlandeg Dec 3, 2020

Uh oh!

kommerzienrat Dec 4, 2020 Author

kommerzienrat
Dec 2, 2020

svlandeg
Dec 3, 2020

kommerzienrat
Dec 4, 2020
Author