Conversation
…oder, and conversion utilities; add tests for tokenizer and conversion functions.
… classes, streamline pooling logic, and update text and vision encoders to use EOS token for pooling. Adjust checkpoint conversion to reflect changes in layer normalization application.
…g with ops.take_along_axis for improved handling of EOS token positions.
… and add image converter config loading function for preprocessing runtime loading of hf presets
…nable not required while transfer as well
Summary of ChangesHello @sineeli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the MetaCLIP 2 model into the Keras Hub, providing a comprehensive set of components for multimodal understanding. It includes the model's backbone, vision and text encoders, a multilingual tokenizer, and an image preprocessor. The integration allows Keras users to leverage MetaCLIP 2's advanced capabilities, which stem from its innovative training data curation and robust architecture, offering enhanced performance, especially in multilingual contexts, compared to earlier CLIP-like models. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Will attach a colab example in few days |
There was a problem hiding this comment.
Code Review
This pull request introduces the MetaCLIP 2 model, including its backbone, encoders, tokenizer, preprocessor, and associated utilities like conversion scripts and tests. The implementation is well-structured and largely adheres to the repository's style guide regarding file structure, naming conventions, and component design. My review focuses on improving the documentation for clarity, correcting examples to use valid presets, and completing an unfinished test case to ensure correctness. A key point from the contribution guidelines is the requirement for a Colab notebook to validate numerical equivalence with the original model; this appears to be missing and should be addressed.
sachinprasadhs
left a comment
There was a problem hiding this comment.
Thanks for the contribution. Code structure looks good.
Made some comments & suggestions please check.
…with causal LM variant convert metaclip2 checkpoint using on the fly hf preset way
sachinprasadhs
left a comment
There was a problem hiding this comment.
Thanks, looks good, only 2 small comments to address.
Could you also attach numerics validation colab Gist for the preset. Thanks
| Examples: | ||
| ```python | ||
| # Load the preprocessor from a preset. | ||
| preprocessor = keras_hub.models.MetaCLIP2Preprocessor.from_preset( |
There was a problem hiding this comment.
MetaCLIP2Preprocessor --> MetaCLIP2CausalLMPreprocessor
| self.assertEqual(outputs["vision_logits"].shape, (1, 1)) | ||
| self.assertEqual(outputs["text_logits"].shape, (1, 1)) | ||
|
|
||
| @pytest.mark.large |
There was a problem hiding this comment.
Mark this as extra_large, the GPU tests are failing due to OOM.
Description of the change
This update introduces the MetaCLIP 2 model components, including the tokenizer, vision encoder, and image converter, along with necessary conversion utilities.
Key Points:
t_lang), where each language preserves the same ~6% tail mass as English (t_en = 170k).facebook/xlm-v-base) which is a multilingual SentencePiece BPE tokenizer with ~901K vocabulary supporting 100+ languages.Reference
Paper: https://arxiv.org/pdf/2507.22062
HF: https://github.com/huggingface/transformers/tree/main/src/transformers/models/metaclip_2
Colab Notebook
No Colab notebook is provided.
Checklist
strucutre it better