T5Gemma

It may be beneficial to support the [T5Gemma](https://arxiv.org/abs/2504.06225) (and upcoming T5Gemma2) architectures. Here's the basic idea from the [`transformers` T5Gemma documentation](https://huggingface.co/docs/transformers/main/model_doc/t5gemma):

> T5Gemma (aka encoder-decoder Gemma) was proposed in a [research paper](https://huggingface.co/papers/2504.06225) by Google. It is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.

The upcoming T5Gemma 2 is the same idea, but based Gemma 3. Here's an overview from the [`transformers` T5Gemma 2 documentation](https://huggingface.co/docs/transformers/main/model_doc/t5gemma2):

> T5Gemma 2 is a family of pretrained encoder-decoder large language models with strong multilingual, multimodal and long-context capability, available in 270M-270M, 1B-1B and 4B-4B parameters. Following T5Gemma, it is built via model adaptation (based on Gemma 3) using UL2. The architecture is similar to T5Gemma and Gemma 3, enhanced with tied word embeddings and merged self- and cross-attention to save model parameters.

These architectures modernize and improve upon T5, by blending the improved preformance of modern Gemma models with the enhanced efficiency of the encoder-decoder architecture.

For reference, here are the PRs that merged model support for these architectures into `transformers`:

- [T5Gemma PR](https://github.com/huggingface/transformers/pull/38332)
- [T5Gemma2 PR](https://github.com/huggingface/transformers/pull/41834)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

T5Gemma #1946

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

T5Gemma #1946

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions