Skip to content

Comments

Add Moondream architecture skeleton#2553

Open
BharathC0 wants to merge 2 commits intokeras-team:masterfrom
BharathC0:moondream-architecture
Open

Add Moondream architecture skeleton#2553
BharathC0 wants to merge 2 commits intokeras-team:masterfrom
BharathC0:moondream-architecture

Conversation

@BharathC0
Copy link

Description

fixes #2549
This PR introduces the initial architecture skeleton for the Moondream model (a tiny vision-language model).

It implements the core components required to run the model, bridging the SigLIP vision encoder and the Phi-1.5 text decoder.

Added Components:

  • MoondreamBackbone: Handles the connection between the vision encoder and text decoder using a projection layer. It supports mixed input (images + text).
  • MoondreamCausalLM: The end-to-end task model for image-to-text generation.
  • MoondreamPreprocessor: Handles tokenization and image resizing/rescaling logic.
  • __init__.py: Registered the new classes.

Verification

I have verified the architecture locally using mock encoders and decoders to ensure:

  • Correct connectivity between Vision and Text components.
  • Correct output shapes for forward passes.
  • Compatibility with Keras functional API.

Note

This PR focuses on the architecture implementation. Pre-trained weights (presets) and extensive unit tests will be added in follow-up PRs.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @BharathC0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational architecture for the Moondream vision-language model within Keras Hub. It defines the core components necessary to integrate a vision encoder with a text decoder, enabling image-to-text generation capabilities. The implementation includes the backbone for feature projection, the causal language model wrapper, and a dedicated preprocessor for handling multimodal inputs.

Highlights

  • Moondream Architecture Skeleton: This PR introduces the initial architecture skeleton for the Moondream model, a tiny vision-language model, laying the groundwork for its full implementation.
  • Core Component Integration: It implements the core components required to bridge the SigLIP vision encoder and the Phi-1.5 text decoder, enabling multimodal processing.
  • MoondreamBackbone: The MoondreamBackbone class is added to handle the connection between the vision encoder and text decoder using a projection layer, supporting mixed input (images + text).
  • MoondreamCausalLM: The MoondreamCausalLM class is introduced as the end-to-end task model for image-to-text generation.
  • MoondreamPreprocessor: A MoondreamPreprocessor is included to manage tokenization and image resizing/rescaling logic for multimodal inputs.
  • Local Verification: The architecture has been verified locally using mock encoders and decoders to ensure correct connectivity, output shapes, and compatibility with the Keras functional API.
  • Future Work: This PR focuses solely on the architecture implementation; pre-trained weights (presets) and extensive unit tests will be added in follow-up pull requests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the architecture skeleton for the Moondream model. The overall structure is well-defined, but there are several areas that need to be aligned with the repository's contribution guidelines. My review focuses on ensuring adherence to the style guide, particularly regarding the use of the Keras Functional API for model definitions, the inclusion of comprehensive docstrings, and correct implementation patterns for multimodal data handling. Addressing these points will improve the maintainability and consistency of the new model within the KerasHub ecosystem.

@BharathC0 BharathC0 force-pushed the moondream-architecture branch 2 times, most recently from ce78cce to 3355ea5 Compare January 30, 2026 15:22
@BharathC0
Copy link
Author

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the initial architecture skeleton for the Moondream model, including the MoondreamBackbone, MoondreamCausalLM, and MoondreamPreprocessor components. The architecture has been locally verified with mock encoders and decoders. The review focuses on identifying potential issues related to correctness and adherence to the repository's style guide, particularly naming conventions, file structure, and code implementation style.

@BharathC0 BharathC0 force-pushed the moondream-architecture branch 2 times, most recently from dda3541 to 7dba12e Compare January 30, 2026 16:03
@BharathC0
Copy link
Author

@gemini-code-assist review

@gemini-code-assist
Copy link
Contributor

Hi @BharathC0, thanks for the ping! I've kicked off the review process as requested.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the architectural skeleton for the Moondream model, including the MoondreamBackbone, MoondreamCausalLM, and MoondreamPreprocessor. The overall structure aligns well with the repository's conventions for new model contributions. My review focuses on a few areas to improve consistency with the style guide's implementation patterns and to address a missing piece of functionality in the preprocessor that is crucial for text generation with image inputs. These changes will help ensure the new model is robust and consistent with other models in KerasHub.

@BharathC0 BharathC0 force-pushed the moondream-architecture branch from 6b1c3ec to 95e3804 Compare January 30, 2026 16:20
@BharathC0
Copy link
Author

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the architecture skeleton for the Moondream model, including the backbone, causal language model, and preprocessor. The implementation generally follows the established patterns for KerasHub models, with appropriate docstrings and get_config methods. However, there are a couple of issues related to file management and completeness of exports that need to be addressed.

@BharathC0
Copy link
Author

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the architectural skeleton for the Moondream model, integrating a vision encoder and a text decoder. The implementation includes MoondreamBackbone, MoondreamCausalLM, and MoondreamPreprocessor, which are well-structured and follow the Keras functional API where appropriate. The docstrings are generally good, providing clear explanations and examples. However, a critical issue was identified regarding the modification of an autogenerated file, and a minor improvement can be made to a docstring for better adherence to the style guide.

@BharathC0 BharathC0 force-pushed the moondream-architecture branch from 9ce3307 to 9e66a2f Compare January 30, 2026 16:31
@BharathC0
Copy link
Author

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the architectural skeleton for the Moondream model, including the MoondreamBackbone, MoondreamCausalLM, and MoondreamPreprocessor. The overall structure adheres to the repository's modularity principles and naming conventions. Docstrings are well-formatted and include examples, which is excellent for maintainability. The use of keras.ops ensures backend agnosticism, aligning with the project's goals. I've identified a missing import in the __init__.py file and a potential runtime issue related to the text_decoder interface in the backbone. These are detailed in the specific review comments below.

@BharathC0
Copy link
Author

@sachinprasadhs please review

@BharathC0 BharathC0 force-pushed the moondream-architecture branch from 16751d2 to 553839f Compare January 30, 2026 16:47
@sachinprasadhs sachinprasadhs self-requested a review February 4, 2026 20:04
@sachinprasadhs sachinprasadhs added the new model For PRs that contribute a new model to the Keras Hub registry. label Feb 9, 2026
@sachinprasadhs
Copy link
Collaborator

@BharathC0 , Thanks for the implementation, please add the complete implementation in a single PR, which is the standard process we follow here to make the review process easier. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model For PRs that contribute a new model to the Keras Hub registry. stat:awaiting response from contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Moondream VLM Architecture

2 participants