Skip to content

Implement MetaCLIP 2 model#2527

Open
sineeli wants to merge 18 commits intokeras-team:masterfrom
sineeli:metaclip_2
Open

Implement MetaCLIP 2 model#2527
sineeli wants to merge 18 commits intokeras-team:masterfrom
sineeli:metaclip_2

Conversation

@sineeli
Copy link
Collaborator

@sineeli sineeli commented Jan 19, 2026

Description of the change

This update introduces the MetaCLIP 2 model components, including the tokenizer, vision encoder, and image converter, along with necessary conversion utilities.

Key Points:

  • Architecture: MetaCLIP2 uses the same vanilla CLIP architecture as OpenAI CLIP; no architectural or tokenizer changes.
  • Data scale: Training data is expanded from English-only to worldwide data covering 300+ languages.
  • Curation: Language-aware curation using human-curated metadata with language-specific thresholds (t_lang), where each language preserves the same ~6% tail mass as English (t_en = 170k).
  • Seen pairs scaling: Global training exposure is scaled from 13B (1.0×) to 29B (2.3×) seen pairs to prevent English downsampling (English ≈ 44% of data).
  • Model capacity: Increasing capacity from ViT-L/14 to ViT-H/14 yields consistent gains and is necessary to learn worldwide-scale data.
  • Comparison: MetaCLIP2 outperforms prior multilingual CLIP models (mSigLIP, SigLIP2) using open data and 2.3× seen pairs, versus private WebLI data and 3.0× seen pairs.
  • For tokenizing multilingual data MetaCLIP 2 uses the XLM-V tokenizer (facebook/xlm-v-base) which is a multilingual SentencePiece BPE tokenizer with ~901K vocabulary supporting 100+ languages.

Reference

Paper: https://arxiv.org/pdf/2507.22062
HF: https://github.com/huggingface/transformers/tree/main/src/transformers/models/metaclip_2

Colab Notebook

No Colab notebook is provided.

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

strucutre it better

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sineeli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates the MetaCLIP 2 model into the Keras Hub, providing a comprehensive set of components for multimodal understanding. It includes the model's backbone, vision and text encoders, a multilingual tokenizer, and an image preprocessor. The integration allows Keras users to leverage MetaCLIP 2's advanced capabilities, which stem from its innovative training data curation and robust architecture, offering enhanced performance, especially in multilingual contexts, compared to earlier CLIP-like models.

Highlights

  • MetaCLIP 2 Model Integration: Introduces the MetaCLIP 2 model, including its tokenizer, vision encoder, image converter, and necessary conversion utilities, into the Keras Hub.
  • Enhanced Training Data Strategy: While sharing architecture with CLIP, MetaCLIP 2 distinguishes itself through a significantly larger and more diverse training dataset, featuring meticulously curated, handcrafted language thresholds for over 300 languages, rather than relying on LLMs for language identification.
  • Architectural Upgrade for Performance: The model incorporates a ViT-H/14 (Huge) vision transformer, a direct upgrade from the ViT-L/14 (Large) used in previous versions, leading to improved accuracy.
  • Multilingual Robustness: MetaCLIP 2 is designed to overcome the degradation issues observed in prior models like mSigLip and SigLip2 when trained on multilingual data, offering more robust performance.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sineeli
Copy link
Collaborator Author

sineeli commented Jan 19, 2026

Will attach a colab example in few days

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the MetaCLIP 2 model, including its backbone, encoders, tokenizer, preprocessor, and associated utilities like conversion scripts and tests. The implementation is well-structured and largely adheres to the repository's style guide regarding file structure, naming conventions, and component design. My review focuses on improving the documentation for clarity, correcting examples to use valid presets, and completing an unfinished test case to ensure correctness. A key point from the contribution guidelines is the requirement for a Colab notebook to validate numerical equivalence with the original model; this appears to be missing and should be addressed.

@sachinprasadhs sachinprasadhs self-requested a review January 20, 2026 21:28
@sineeli sineeli mentioned this pull request Jan 22, 2026
@sineeli
Copy link
Collaborator Author

sineeli commented Jan 29, 2026

Baisc Tutorial: https://www.kaggle.com/code/sravanneeli/metaclip2-inference-tutorial

@sachinprasadhs sachinprasadhs added the new model For PRs that contribute a new model to the Keras Hub registry. label Feb 9, 2026
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Code structure looks good.
Made some comments & suggestions please check.

…with causal LM variant

convert metaclip2 checkpoint using on the fly hf preset way
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good, only 2 small comments to address.
Could you also attach numerics validation colab Gist for the preset. Thanks

Examples:
```python
# Load the preprocessor from a preset.
preprocessor = keras_hub.models.MetaCLIP2Preprocessor.from_preset(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetaCLIP2Preprocessor --> MetaCLIP2CausalLMPreprocessor

self.assertEqual(outputs["vision_logits"].shape, (1, 1))
self.assertEqual(outputs["text_logits"].shape, (1, 1))

@pytest.mark.large
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this as extra_large, the GPU tests are failing due to OOM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model For PRs that contribute a new model to the Keras Hub registry.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants