Skip to content

T5GemmaΒ #1

@jncraton

Description

@jncraton

Support the T5Gemma architecture. Here's the basic idea from the transformers T5Gemma documentation:

T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.

This architecture modernizes and improves upon T5, by blending the improved performance of modern Gemma models with the enhanced efficiency of the encoder-decoder architecture.

For reference, here are the PR that merged model support for this architecture into transformers:

It might also be valuable to see how recent models were added this package:

Once this is implemented, a user should be able to use the converter to convert a model and run inference. The following model should be openly available for testing purposes:

https://huggingface.co/harshaljanjani/tiny-t5gemma-test

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions