Using encoder-decoder models with spaCy: What to change so it works. Starting with ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds #12232

Premshay · 2023-02-06T10:04:39Z

Premshay
Feb 6, 2023

Hi,
I've been struggling with using non-BERT transformer models with spaCy.
Specifically - encoder-decoder. Even more specific (but not unique) - encoder-decoder seq2seq multilingual models.
I'm currently stuck at initializing such a model in spaCy. Current error:

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

(for full error and traceback see end of message)

In this example I'm trying to work with NLLB, but the issue is for all such models, such as t5 and xlm-roberta.
I've been reading some of the transformers, spaCy_transformers, and thinc code, and I think I've narrowed it down to the following bits:

1st step was to add_pipe with a config based on the registered architecture ('spacy-transformers.TransformerModel.v3'). This didn't work. As far as I understand, the reason is AutoModel. Spacy_transformers use the transformers Auto Classes, and specifically the AutoModel. Many transformers on huggingface have a dedicated class, some with substantial differences that don't align with spacy_transformers code. Once it veers off, there's no way to use the registered factory.
Ok, we can make our own, spaCy is great that way.
Not so easy, we need to changecode in a few more places:
HF Wrapper layer (hf_wrapper.py): @registry.layers("HFWrapper.v1") has model_cls=AutoModel. For the above models we need AutoModelForSeq2SeqLM, sometimes something else. I tried changing, didn't work.
Data Classes (data_classes.py): WordpieceBatch. The above models tokenizers use sentencepiece. from the spaCy documentation it seems that it somehow takes it into account into account, but I haven't been able to find where the code allows use of it.
I think I'm also supposed to change code in FullTransformerBatch, WordpieceBatch, HFObjects. I think they all have some hardcoded bits that don't communicate with the above models.
So far, the few changes I've tried didn't work.
full error and traceback upon nlp.initialize with the built-in transformer factory, and the mentioned models as 'name':
`File ~\AppData\Roaming\Python\Python310\site-packages\thinc\model.py:299, in Model.initialize(self, X, Y)
297 validate_fwd_input_output(self.name, self._func, X, Y)
298 if self.init is not None:
--> 299 self.init(self, X=X, Y=Y)
300 return self

File j:\Anaconda3\envs\nlp\lib\site-packages\spacy_transformers\layers\transformer_model.py:157, in init(model, X, Y)
155 wordpieces = WordpieceBatch.from_batch_encoding(token_data)
156 model.layers[0].initialize(X=wordpieces)
--> 157 model_output = model.layers[0].predict(wordpieces)
158 model.set_dim("nO", model_output.last_hidden_state.shape[-1])

File ~\AppData\Roaming\Python\Python310\site-packages\thinc\model.py:315, in Model.predict(self, X)
311 def predict(self, X: InT) -> OutT:
312 """Call the model's forward function with is_train=False, and return
313 only the output, instead of the (output, callback) tuple.
314 """
--> 315 return self._func(self, X, is_train=False)[0]
...
--> 986 raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
988 # past_key_values_length
989 past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
`
I'm just at the first level of issues, but I'm sure there will be more along the way,
There is also the issue of getting it to work in training (add to config file, probably with custom code), but first I need to find a way to even use such transformers with spaCy.
I'm sure this is going to become a recurring issue as people now increasingly use such models, so I'd like to have this discussion as a place to not only solve my specific issue, but rather gather pointers and workflows for using it with spaCy's great architecture.
Maybe the spaCy team can also consider adding such a case to the spaCy Transformers examples, or even to spaCy example/tutorial projects.

Answered by honnibal

Feb 10, 2023

Hi Premshay,

Stepping back a bit, what's the overall goal of using the encoder-decoder model?

spaCy is very focussed on adding markup to Doc objects -- adding information to them calculated from statistical models, and then letting you access that information while iterating over tokens, spans, etc.

Encoder-decoder models don't necessarily fit within this paradigm well. The output of the model is other text, rather than the original text. So it doesn't work well within the API design of spaCy.

Rather than generalise things to handle all sorts of input and output structures, we want to keep spaCy's purpose specific to its use-cases, which centre on natural language understanding. We think …

View full answer

rmitsch · 2023-02-08T10:56:05Z

rmitsch
Feb 8, 2023
Maintainer

Hi @Premshay, can you elaborate on what you are trying to do with your seq2seq model in spaCy?

In general, if you haven't done so, I strongly recommend reading our FAQ on loading HF models. spacy_transformers currently only supports using transformer models as feature source - if your model has task-specific heads, that won't work. spaCy-wrap might be helpful here.

4 replies

Premshay Feb 9, 2023
Author

I want to use it as feature source and use its tokenizer as well. Experimenting with different models.
As mentioned, I've read through the documentation of all relevant packages. I've been working with hf transformers in spaCy until now. These type of transformers are having specific issues, however.

The regular way to load them (which the FAQ states as well) leads to the error stated:

ValueError: You must specify either decoder_input_ids or decoder_inputs_embeds

Specific model in this example is: 'facebook/nllb-200-distilled-600M'

spaCy-wrap is irrelevant for this, as it's not a fine-tuned model for a specific task.

rmitsch Feb 10, 2023
Maintainer

So, it's possible to load an encoder-decoder model like NLLB: write a custom component, wrap it around a black box transformer component, store the output in a doc extension (superficially similar to how spacy-wrap does it). This doesn't mesh well with spacy-transformers though. I'm afraid that spacy-transformers won't support this even with the right AutoModel types - you'll probably need a custom model for this.

On a different note - the xlm-roberta-* family should work with spacy-transformers. Which issues did you experience here?

Premshay Feb 10, 2023
Author

Exactly the same issue, as stated.

adrianeboyd Feb 13, 2023

xlm-roberta-base and xlm-roberta-large should be fine, so I'm not sure what's going on there. If an example would help, here's a working project that uses xlm-roberta-base: https://github.com/explosion/projects/tree/v3/benchmarks/ud_benchmark

In terms of options for t5 that allow you to minimally extend spacy-transformers, I think it would work if you extend TransformerModel to use T5EncoderModel instead of AutoModel for both init and HFWrapper and then write a custom architecture around that to replace spacy-transformers.TransformerModel.v3.

If there were an option you could pass to AutoModel to override its default mapping, then this could work without any extra code (just config options for transformers_config), but as far as I know this isn't implemented in transformers.

I don't see a similar model class like T5EncoderModel to do the same for M2M100 models.

In general spacy-transformers supports any model where this basic workflow gets you to output with last_hidden_state:

from transformers import AutoTokenizer, AutoModel

text = "This is a text."
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModel.from_pretrained("roberta-base")

tokens = tokenizer(text, return_tensors="pt")
result = model(**tokens)
# do something with result.last_hidden_state

Beyond that you'll need some degree of custom architectures or components.

honnibal · 2023-02-10T15:28:06Z

honnibal
Feb 10, 2023
Maintainer

Hi Premshay,

Stepping back a bit, what's the overall goal of using the encoder-decoder model?

spaCy is very focussed on adding markup to Doc objects -- adding information to them calculated from statistical models, and then letting you access that information while iterating over tokens, spans, etc.

Encoder-decoder models don't necessarily fit within this paradigm well. The output of the model is other text, rather than the original text. So it doesn't work well within the API design of spaCy.

Rather than generalise things to handle all sorts of input and output structures, we want to keep spaCy's purpose specific to its use-cases, which centre on natural language understanding. We think it's best for libraries to stay focussed on their goal, rather than taking on analogous tasks that can easily be done alongside the library using other tooling.

So, it could be that the best answer for what you're trying to do (depending on what that is) will be to load and use the encoder-decoder models separately, and combine their outputs with what spaCy does in some way that makes sense for your problem. You don't necessarily need to make the encoder-decoder part a pipeline component within spaCy. There's nothing wrong with running it before or after, to do what you need.

If you're interested in using the model as a representation layer (that is, to backprop to it), I would suggest using just the encoder part. That's the configuration that we use ourselves, and it's well tested and working.

0 replies

Premshay · 2023-02-10T18:20:52Z

Premshay
Feb 10, 2023
Author

Thanks. That is exactly what I'm trying to (initially) do - use the encoder. The intuition is that the model has better features and therefore better outputs for the words in context, so that the pos will be better, and the ner and lemmatization I wish to train will be better. In order to test that assumption I need to be able to use it instead of the transformers I already worked with for the same tasks, and this is where it breaks. Back to the issue - it seems that it breaks with the decoder input id. Alternatively, using the nllb tokenizer should possibly be good as well. Maybe you know of a way to pass config that only uses the encoder and doesn't require stating these ids. I haven't found one that works yet. When using the transformers library it works fine. The issue is with `spacy_transformers`. How would you do it per your suggestion, i.e. using it outside spacy and then using spacy for the rest of the pipeline with its results? Any advice would be appreciated.

1 reply

honnibal Feb 11, 2023
Maintainer

Okay thanks, that all makes a lot of sense.

Reading back over your first post, you're trying all the steps of customization I would've recommended already. It's really not an easy problem overall. The constraints are essentially:

The details of coupling the transformers to spaCy is pretty intricate, overall. The tokenization in particular is a difficult impedance mismatch. Ensuring we save and load correctly is also difficult.
There's little consistency in the HF models. Even within AutoModel, differences of the underlying models sometimes leak through and affect us.
Support for the custom classes will need to be done on a case-by-case basis
The number of future models is continually expanding, and the long-term utility of any single model is unclear.

What we're trying to provide with spaCy is mostly a "one best" (or at least, few best) set of configurations that are ready to go. It's a different approach than way HF go about things, where they are essentially a neutral and unopinionated platform that make it faster to adopt some model produced by research, with little intermediation from HF to get in the way.

So, support for absolutely any HF model, regardless of whether it has a common API, will fall into some tension with what we're trying to do. We want to be able to consistently maintain things after we release them, and that's not going to be possible if we're shipping new code for the latest models.

So, the best that we can do here is offer an extendable system (e.g. via the registry, config and ability to write your own pipeline components). An alternative approach would be to try to work with the HF code, and somehow repackage or wrap it so that it works consistently with AutoModel. To me that does seem a less promising approach, but that might just be because I'm less familiar with the details there.

My main advice if you're setting out to do this is to start from the bottom and work your way up, testing carefully. If you work on a class like the WordpieceBatch individually, and you design good well-parametrized tests, you should be able to get it working and move on. There's only a few such pieces to get right before you'll have the whole thing. Then you can publish this as your own extension, and I'm sure people will find it useful, if the results are indeed better.

Premshay · 2023-02-12T16:43:56Z

Premshay
Feb 12, 2023
Author

Thanks for the feedback. The focus makes a lot of sense, though it does seem strange that the decoder input id error is such an issue. I'll try and resolve as you suggested.

…

On Sat, Feb 11, 2023, 19:48 Matthew Honnibal ***@***.***> wrote: Okay thanks, that all makes a lot of sense. Reading back over your first post, you're trying all the steps of customization I would've recommended already. It's really not an easy problem overall. The constraints are essentially: - The details of coupling the transformers to spaCy is pretty intricate, overall. The tokenization in particular is a difficult impedance mismatch. Ensuring we save and load correctly is also difficult. - There's little consistency in the HF models. Even within AutoModel, differences of the underlying models sometimes leak through and affect us. - Support for the custom classes will need to be done on a case-by-case basis - The number of future models is continually expanding, and the long-term utility of any single model is unclear. What we're trying to provide with spaCy is mostly a "one best" (or at least, few best) set of configurations that are ready to go. It's a different approach than way HF go about things, where they are essentially a neutral and unopinionated platform that make it faster to adopt some model produced by research, with little intermediation from HF to get in the way. So, support for absolutely any HF model, regardless of whether it has a common API, will fall into some tension with what we're trying to do. We want to be able to consistently maintain things after we release them, and that's not going to be possible if we're shipping new code for the latest models. So, the best that we can do here is offer an extendable system (e.g. via the registry, config and ability to write your own pipeline components). An alternative approach would be to try to work with the HF code, and somehow repackage or wrap it so that it works consistently with AutoModel. To me that does seem a less promising approach, but that might just be because I'm less familiar with the details there. My main advice if you're setting out to do this is to start from the bottom and work your way up, testing carefully. If you work on a class like the WordpieceBatch individually, and you design good well-parametrized tests, you should be able to get it working and move on. There's only a few such pieces to get right before you'll have the whole thing. Then you can publish this as your own extension, and I'm sure people will find it useful, if the results are indeed better. — Reply to this email directly, view it on GitHub <#12232 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGWMILGST5YYOFKRA6W7RTDWW7GHLANCNFSM6AAAAAAUSPAMOI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

GenVr Mar 1, 2023

@Premshay Have you been able to train an encoder-decoder NER model like T5?

Premshay · 2023-03-01T18:46:04Z

Premshay
Mar 1, 2023
Author

No, I haven't.

…

On Wed, Mar 1, 2023, 17:57 GenV ***@***.***> wrote: @Premshay <https://github.com/Premshay> Have you been able to train an encoder-decoder based NER model like T5? — Reply to this email directly, view it on GitHub <#12232 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGWMILCXRLHWCZKRJDA7E7LWZ5WV5ANCNFSM6AAAAAAUSPAMOI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Uh oh!

Using encoder-decoder models with spaCy: What to change so it works. Starting with ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds #12232

Uh oh!

Premshay Feb 6, 2023

Replies: 5 comments · 6 replies

Uh oh!

rmitsch Feb 8, 2023 Maintainer

Uh oh!

Uh oh!

Premshay Feb 9, 2023 Author

Uh oh!

rmitsch Feb 10, 2023 Maintainer

Uh oh!

Premshay Feb 10, 2023 Author

Uh oh!

adrianeboyd Feb 13, 2023

Uh oh!

honnibal Feb 10, 2023 Maintainer

Uh oh!

Uh oh!

Premshay Feb 10, 2023 Author

Uh oh!

honnibal Feb 11, 2023 Maintainer

Uh oh!

Premshay Feb 12, 2023 Author

Uh oh!

Uh oh!

GenVr Mar 1, 2023

Uh oh!

Premshay Mar 1, 2023 Author

Premshay
Feb 6, 2023

Replies: 5 comments 6 replies

rmitsch
Feb 8, 2023
Maintainer

Premshay Feb 9, 2023
Author

rmitsch Feb 10, 2023
Maintainer

Premshay Feb 10, 2023
Author

honnibal
Feb 10, 2023
Maintainer

Premshay
Feb 10, 2023
Author

honnibal Feb 11, 2023
Maintainer

Premshay
Feb 12, 2023
Author

Premshay
Mar 1, 2023
Author