Using encoder-decoder models with spaCy: What to change so it works. Starting with ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds #12232
-
Hi,
(for full error and traceback see end of message) In this example I'm trying to work with NLLB, but the issue is for all such models, such as t5 and xlm-roberta.
File j:\Anaconda3\envs\nlp\lib\site-packages\spacy_transformers\layers\transformer_model.py:157, in init(model, X, Y) File ~\AppData\Roaming\Python\Python310\site-packages\thinc\model.py:315, in Model.predict(self, X) |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 6 replies
-
Hi @Premshay, can you elaborate on what you are trying to do with your seq2seq model in spaCy? In general, if you haven't done so, I strongly recommend reading our FAQ on loading HF models. |
Beta Was this translation helpful? Give feedback.
-
Hi Premshay, Stepping back a bit, what's the overall goal of using the encoder-decoder model? spaCy is very focussed on adding markup to Encoder-decoder models don't necessarily fit within this paradigm well. The output of the model is other text, rather than the original text. So it doesn't work well within the API design of spaCy. Rather than generalise things to handle all sorts of input and output structures, we want to keep spaCy's purpose specific to its use-cases, which centre on natural language understanding. We think it's best for libraries to stay focussed on their goal, rather than taking on analogous tasks that can easily be done alongside the library using other tooling. So, it could be that the best answer for what you're trying to do (depending on what that is) will be to load and use the encoder-decoder models separately, and combine their outputs with what spaCy does in some way that makes sense for your problem. You don't necessarily need to make the encoder-decoder part a pipeline component within spaCy. There's nothing wrong with running it before or after, to do what you need. If you're interested in using the model as a representation layer (that is, to backprop to it), I would suggest using just the encoder part. That's the configuration that we use ourselves, and it's well tested and working. |
Beta Was this translation helpful? Give feedback.
-
Thanks.
That is exactly what I'm trying to (initially) do - use the encoder. The intuition is that the model has better features and therefore better outputs for the words in context, so that the pos will be better, and the
ner and lemmatization I wish to train will be better.
In order to test that assumption I need to be able to use it instead of the transformers I already worked with for the same tasks, and this is where it breaks.
Back to the issue - it seems that it breaks with the decoder input id.
Alternatively, using the nllb tokenizer should possibly be good as well.
Maybe you know of a way to pass config that only uses the encoder and doesn't require stating these ids. I haven't found one that works yet.
When using the transformers library it works fine. The issue is with `spacy_transformers`.
How would you do it per your suggestion, i.e. using it outside spacy and then using spacy for the rest of the pipeline with its results?
Any advice would be appreciated.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the feedback.
The focus makes a lot of sense, though it does seem strange that the
decoder input id error is such an issue.
I'll try and resolve as you suggested.
…On Sat, Feb 11, 2023, 19:48 Matthew Honnibal ***@***.***> wrote:
Okay thanks, that all makes a lot of sense.
Reading back over your first post, you're trying all the steps of
customization I would've recommended already. It's really not an easy
problem overall. The constraints are essentially:
- The details of coupling the transformers to spaCy is pretty
intricate, overall. The tokenization in particular is a difficult impedance
mismatch. Ensuring we save and load correctly is also difficult.
- There's little consistency in the HF models. Even within AutoModel,
differences of the underlying models sometimes leak through and affect us.
- Support for the custom classes will need to be done on a
case-by-case basis
- The number of future models is continually expanding, and the
long-term utility of any single model is unclear.
What we're trying to provide with spaCy is mostly a "one best" (or at
least, few best) set of configurations that are ready to go. It's a
different approach than way HF go about things, where they are essentially
a neutral and unopinionated platform that make it faster to adopt some
model produced by research, with little intermediation from HF to get in
the way.
So, support for absolutely any HF model, regardless of whether it has a
common API, will fall into some tension with what we're trying to do. We
want to be able to consistently maintain things after we release them, and
that's not going to be possible if we're shipping new code for the latest
models.
So, the best that we can do here is offer an extendable system (e.g. via
the registry, config and ability to write your own pipeline components). An
alternative approach would be to try to work with the HF code, and somehow
repackage or wrap it so that it works consistently with AutoModel. To me
that does seem a less promising approach, but that might just be because
I'm less familiar with the details there.
My main advice if you're setting out to do this is to start from the
bottom and work your way up, testing carefully. If you work on a class like
the WordpieceBatch individually, and you design good well-parametrized
tests, you should be able to get it working and move on. There's only a few
such pieces to get right before you'll have the whole thing. Then you can
publish this as your own extension, and I'm sure people will find it
useful, if the results are indeed better.
—
Reply to this email directly, view it on GitHub
<#12232 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGWMILGST5YYOFKRA6W7RTDWW7GHLANCNFSM6AAAAAAUSPAMOI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
No, I haven't.
…On Wed, Mar 1, 2023, 17:57 GenV ***@***.***> wrote:
@Premshay <https://github.com/Premshay> Have you been able to train an
encoder-decoder based NER model like T5?
—
Reply to this email directly, view it on GitHub
<#12232 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGWMILCXRLHWCZKRJDA7E7LWZ5WV5ANCNFSM6AAAAAAUSPAMOI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
Hi Premshay,
Stepping back a bit, what's the overall goal of using the encoder-decoder model?
spaCy is very focussed on adding markup to
Doc
objects -- adding information to them calculated from statistical models, and then letting you access that information while iterating over tokens, spans, etc.Encoder-decoder models don't necessarily fit within this paradigm well. The output of the model is other text, rather than the original text. So it doesn't work well within the API design of spaCy.
Rather than generalise things to handle all sorts of input and output structures, we want to keep spaCy's purpose specific to its use-cases, which centre on natural language understanding. We think …