Skip to content

T5GemmaΒ #1946

@jncraton

Description

@jncraton

It may be beneficial to support the T5Gemma (and upcoming T5Gemma2) architectures. Here's the basic idea from the transformers T5Gemma documentation:

T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.

The upcoming T5Gemma 2 is the same idea, but based Gemma 3. Here's an overview from the transformers T5Gemma 2 documentation:

T5Gemma 2 is a family of pretrained encoder-decoder large language models with strong multilingual, multimodal and long-context capability, available in 270M-270M, 1B-1B and 4B-4B parameters. Following T5Gemma, it is built via model adaptation (based on Gemma 3) using UL2. The architecture is similar to T5Gemma and Gemma 3, enhanced with tied word embeddings and merged self- and cross-attention to save model parameters.

These architectures modernize and improve upon T5, by blending the improved preformance of modern Gemma models with the enhanced efficiency of the encoder-decoder architecture.

For reference, here are the PRs that merged model support for these architectures into transformers:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions