-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Support the T5Gemma architecture. Here's the basic idea from the transformers T5Gemma documentation:
T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.
This architecture modernizes and improves upon T5, by blending the improved performance of modern Gemma models with the enhanced efficiency of the encoder-decoder architecture.
For reference, here are the PR that merged model support for this architecture into transformers:
It might also be valuable to see how recent models were added this package:
Once this is implemented, a user should be able to use the converter to convert a model and run inference. The following model should be openly available for testing purposes: