Extend the support of T5 models with different encoder-decoder layers #15909

DamonFool · 2025-09-10T03:03:20Z

Thanks @fairydreaming we can run T5 models with llama.cpp.
However, current implementation seems to only support equal encoder-decoder layers.
And we've seen T5 models with different encoder-decoder layers which fail to run with llama.cpp.

Since HF transformers can run T5 models with different encoder-decoder layers very well, it would be better to also support them in llama.cpp.

Thanks.

Signed-off-by: Jie Fu <[email protected]>

DamonFool · 2025-09-10T03:09:48Z

This PR is an extension of #8141 which adds support of T5 and FLAN-T5 model families with different encoder-decoder layers.

CISC · 2025-09-10T08:42:25Z

Sooo, I started reviewing this, but when checking HF for encoder-only T5 models it turns out that all of them (the ones I could find at least) has num_decoder_layers set to the same value as num_layers and even is_encoder_decoder set to true, however they do not have any decoder layers!

I think the proper solution here is to just make the decoder layers not required and then simply check that they are set before doing the decoder part.

DamonFool · 2025-09-10T09:02:20Z

Sooo, I started reviewing this, but when checking HF for encoder-only T5 models it turns out that all of them (the ones I could find at least) has num_decoder_layers set to the same value as num_layers and even is_encoder_decoder set to true, however they do not have any decoder layers!

I think the proper solution here is to just make the decoder layers not required and then simply check that they are set before doing the decoder part.

Thanks @CISC for your review.

We have LLM_ARCH_T5ENCODER for encoder-only T5 models and LLM_ARCH_T5 for encoder-decoder T5 models.
This pr aims at extending the support of LLM_ARCH_T5 for encoder-decoder T5 models.
It shouldn't change the runtime behavior of encoder-only T5 models after this patch (except that num_decoder_layers may be added in the gguf file).

The issue you mentioned above is about LLM_ARCH_T5ENCODER for encoder-only T5 models.
I would suggest fixing it in a separate PR if it really maters.

What do you think?
Thanks.

CISC · 2025-09-10T09:15:07Z

The issue you mentioned above is about LLM_ARCH_T5ENCODER for encoder-only T5 models. I would suggest fixing it in a separate PR if it really maters.

I see, I'll continue the review then, do you have links to some example models?

DamonFool · 2025-09-10T09:43:28Z

I see, I'll continue the review then, do you have links to some example models?

Sorry, I didn't find a T5 model with different encoder-decoder layers in the open source world.

convert_hf_to_gguf.py

gguf-py/gguf/constants.py

gguf-py/gguf/gguf_writer.py

src/llama-arch.cpp

src/llama-arch.h

src/llama-model.cpp

src/llama-hparams.h

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC · 2025-09-10T10:59:37Z

🤣 Poor CI (it's faster/easier to add changes to batch, then commit).

Co-authored-by: Sigbjørn Skjæret <[email protected]>

DamonFool · 2025-09-10T11:03:58Z

Thanks @CISC for your excellent review!
All your improvements have been applied.

And it seem much better now.

I will do some check and feed back here.

CISC · 2025-09-10T11:05:36Z

I will do some check and feed back here.

Thank you.

src/llama-hparams.h

Signed-off-by: Jie Fu <[email protected]>

src/llama-model.cpp

DamonFool · 2025-09-10T13:51:24Z

I will do some check and feed back here.

I tested t5-small, flan-t5-small and a local t5 model (4-encoder-layer & 2-decoder-layer).
All passed.

DamonFool · 2025-09-11T00:45:41Z

Thanks @CISC and @ggerganov for your help.

Extend the support of T5 models with different encoder-decoder layers

33163bf

Signed-off-by: Jie Fu <[email protected]>

github-actions bot added the python python script changes label Sep 10, 2025

CISC reviewed Sep 10, 2025

View reviewed changes

DamonFool and others added 22 commits September 10, 2025 18:46

Update convert_hf_to_gguf.py

219eada

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update gguf-py/gguf/constants.py

2161c30

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update gguf-py/gguf/gguf_writer.py

284ceb3

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-arch.cpp

77f0f16

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-arch.h

7efe517

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

12a909f

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

634e5a9

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

ebef503

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

0acda17

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-hparams.h

19281fe

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

5153072

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

60821df

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

de46320

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

804a982

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

9215087

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

1167269

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

678aa48

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

d145ee1

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

ce90f80

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

01002df

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

3ee2193

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

6cb51f2

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

42f1fdb

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update src/llama-model.cpp

6940650

Co-authored-by: Sigbjørn Skjæret <[email protected]>

ggerganov reviewed Sep 10, 2025

View reviewed changes

src/llama-hparams.h Outdated Show resolved Hide resolved

DamonFool added 2 commits September 10, 2025 21:18

Rename n_dec_layer --> dec_n_layer

f16d8de

Signed-off-by: Jie Fu <[email protected]>

Adapt to cases when dec_n_layer > n_layer

84e5db4

Signed-off-by: Jie Fu <[email protected]>

DamonFool commented Sep 10, 2025

View reviewed changes

src/llama-model.cpp Show resolved Hide resolved

CISC approved these changes Sep 10, 2025

View reviewed changes

CISC merged commit 4f65885 into ggml-org:master Sep 10, 2025
52 checks passed

DamonFool deleted the t5-enhancement branch September 11, 2025 00:45

Extend the support of T5 models with different encoder-decoder layers #15909

Extend the support of T5 models with different encoder-decoder layers #15909

Uh oh!

Conversation

DamonFool commented Sep 10, 2025

Uh oh!

DamonFool commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

DamonFool commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

DamonFool commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

DamonFool commented Sep 10, 2025

Uh oh!

CISC commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

DamonFool commented Sep 10, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants