More information on transformers and hyperparameter tuning #6479
Replies: 2 comments
-
Hi! Happy to hear you've found the documentation useful. I'll try to fill in some remaining gaps. I can't really comment on the difference with HuggingFace's implementations though, because I don't know their codebase all that well.
Hm, I guess in theory you should be able to run spacy's pretraining on a transformer model. I'm really unsure how that will work out though. In general we've had two main use-cases in mind:
I've recently proposed a change that would allow you to specify the layers you'd add to the tok2vec component for pretraining, so that would in theory allow you to implement whatever similarity objective you'd like: PR #6451
The losses that are being backpropagated to tune the transformer, indeed depend on the type of challenge and architecture you're building on top of it. I think spaCy 3 provides a huge improvement here because the model architectures are now very explicitely recorded in the config file. You can find the pre-defined ones here and there implementations in code can be found in this folder. You can also implement your own - cf the documentation here (I know you said you read the docs, but I also know there are a lot of them ;-))
We currently don't have functionality to do automated hyperparameter search. With the config system though, it should be relatively easy to set that up yourself. https://github.com/explosion/projects/tree/v3/integrations/wandb contains an example project where we programmatically create variants of the config to run a simple hyperparameter grid search, logging the results with Weights & Biases. It would be better to properly parallellize that though ;-) I hope that gives you some useful pointers! |
Beta Was this translation helpful? Give feedback.
-
Thank you!
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, this concerns spacy-nightly (v3).
In general, I have got three questions regarding how things are being handled in the new spacy, but at first let me thank you for your outstanding work!
pretrain
command would be correct here, e.g. for building some kind of aSciBERT
out ofBERT
. That is, continue pretraining a language model on a new domain such as scientific texts, user reviews, etc. But is the training process here the same as used in huggingface'stransformers
library? In what ways do they differ?spacy
way. But what way exactly is that compared to the approach thattransformers
usually have, i.e. huggingfacetransformers
. Is the last layer different here? I am asking because this must be known when reporting what architecture was used building atagger
.[spacy-ray](https://github.com/explosion/spacy-ray)
, but that doesn't seem to do that yet. Transformers have just introduced a hyperparameter tuning method (also using ray).I am currently building my first project in spacy v3 and of course I read the documentation extensively. :) Especially this was helpful so far, but also this and especially this concerning transformers in spacy. I also came across these issues that may be related to the transformers: #5408 #4204
However, I was not able to answer my questions myself, so I am asking them here. Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions