-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Extend the support of T5 models with different encoder-decoder layers #15909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jie Fu <[email protected]>
This PR is an extension of #8141 which adds support of T5 and FLAN-T5 model families with different encoder-decoder layers. |
Sooo, I started reviewing this, but when checking HF for encoder-only T5 models it turns out that all of them (the ones I could find at least) has I think the proper solution here is to just make the decoder layers not required and then simply check that they are set before doing the decoder part. |
Thanks @CISC for your review. We have The issue you mentioned above is about What do you think? |
I see, I'll continue the review then, do you have links to some example models? |
Sorry, I didn't find a T5 model with different encoder-decoder layers in the open source world. |
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
🤣 Poor CI (it's faster/easier to add changes to batch, then commit). |
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Thanks @CISC for your excellent review! And it seem much better now. I will do some check and feed back here. |
Thank you. |
Signed-off-by: Jie Fu <[email protected]>
Signed-off-by: Jie Fu <[email protected]>
I tested t5-small, flan-t5-small and a local t5 model (4-encoder-layer & 2-decoder-layer). |
…rs (ggml-org#15909) * Extend the support of T5 models with different encoder-decoder layers Signed-off-by: Jie Fu <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update gguf-py/gguf/gguf_writer.py Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-arch.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-arch.h Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-hparams.h Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * Rename n_dec_layer --> dec_n_layer Signed-off-by: Jie Fu <[email protected]> * Adapt to cases when dec_n_layer > n_layer Signed-off-by: Jie Fu <[email protected]> --------- Signed-off-by: Jie Fu <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
Thanks @CISC and @ggerganov for your help. |
Thanks @fairydreaming we can run T5 models with llama.cpp.
However, current implementation seems to only support equal encoder-decoder layers.
And we've seen T5 models with different encoder-decoder layers which fail to run with llama.cpp.
Since HF transformers can run T5 models with different encoder-decoder layers very well, it would be better to also support them in llama.cpp.
Thanks.