[Model] Add Moondream3 model support by sniper35 · Pull Request #32325 · vllm-project/vllm

sniper35 · 2026-01-14T11:19:11Z

Purpose

Test Plan

Offline commands:

  Offline inference examples

  from vllm import LLM, SamplingParams

  llm = LLM(
      model="moondream/moondream3-preview",
      tokenizer="moondream/starmie-v1",
      trust_remote_code=True, dtype="bfloat16",
      max_model_len=2048, enforce_eager=True,
      limit_mm_per_prompt={"image": 1},
  )

  # --- Query ---
  llm.generate(
      {"prompt": "<|endoftext|><image><|md_reserved_0|>query<|md_reserved_1|>What is this?<|md_reserved_2|>",
       "multi_modal_data": {"image": image}},
      SamplingParams(max_tokens=50, temperature=0),
  )

  # --- Caption ---
  llm.generate(
      {"prompt": "<|endoftext|><image><|md_reserved_0|>describe<|md_reserved_1|>normal<|md_reserved_2|>",
       "multi_modal_data": {"image": image}},
      SamplingParams(max_tokens=100, temperature=0),
  )

  # --- Detect (needs extra_args) ---
  llm.generate(
      {"prompt": "<|endoftext|><image><|md_reserved_0|>detect<|md_reserved_1|> sign<|md_reserved_2|>",
       "multi_modal_data": {"image": image}},
      SamplingParams(max_tokens=500, temperature=0,
                     extra_args={"moondream3_task": "detect"}),
  )
  # Returns JSON: {"objects": [{"x_min": ..., "y_min": ..., "x_max": ..., "y_max": ...}]}

  # --- Point (needs extra_args) ---
  llm.generate(
      {"prompt": "<|endoftext|><image><|md_reserved_0|>point<|md_reserved_1|> sign<|md_reserved_2|>",
       "multi_modal_data": {"image": image}},
      SamplingParams(max_tokens=500, temperature=0,
                     extra_args={"moondream3_task": "point"}),
  )
  # Returns JSON: {"points": [{"x": ..., "y": ...}]}

Test Result

Compare the outputs from vllm and HF:

  "inputs": {
    "caption_image": "cherry_blossom",
    "detect_image": "stop_sign",
    "object": "sign",
    "point_image": "stop_sign",
    "query_image": "stop_sign"
  },

  "hf_outputs": {
    "caption": "A tall, slender tower with a white top and gray framework stands against a bright blue sky. Branches with pink blossoms frame the tower in the foreground, creating a layered effect. The blossoms appear dense and full, with some showing hints of orange or yellow at the edges. The perspective is from below, looking up at the tower.",
    "detect": {
      "objects": [
        {
          "x_max": 0.3590589910745621,
          "x_min": 0.16633163392543793,
          "y_max": 0.4200967699289322,
          "y_min": 0.1345907300710678
        }
      ]
    },
    "point": {
      "points": [
        {
          "x": 0.2822265625,
          "y": 0.318359375
        }
      ]
    },
    "query": "The image shows a red stop sign mounted on a pole in the foreground of a street. Behind the sign, there is a red Chinese archway with Chinese characters on it. In the background, a black SUV is driving down the street. Buildings are visible on both sides of the street, and there are several pedestrians walking along the sidewalk. A tree is visible behind the archway. The scene captures a typical urban street setting with Asian architectural elements."
  },

  "vllm_outputs": {
    "caption": "A tall tower with a white top and light blue-green horizontal bands is visible through a dense canopy of pink cherry blossom trees. The sky is a clear, bright blue. The trees frame the tower, creating a layered effect with branches and blossoms in the foreground.",
    "detect": {
      "objects": [
        {
          "x_max": 0.3590589910745621,
          "x_min": 0.16633163392543793,
          "y_max": 0.4210672974586487,
          "y_min": 0.13362020254135132
        }
      ]
    },
    "point": {
      "points": [
        {
          "x": 0.2822265625,
          "y": 0.3095703125
        }
      ]
    },
    "query": "A black SUV is parked on a city street near a red Chinese-style gate or archway. A red octagonal stop sign is mounted on a pole in the foreground. Buildings with signage are visible in the background, and there are decorative stone statues flanking the gate. A tree is visible behind the"
  }

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2026-01-14T11:19:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-01-14T11:19:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sniper35.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request adds support for the Moondream3 model. The implementation includes the model architecture for both the vision and text components, a custom processor for handling Moondream3's specific image tiling and tokenization, and necessary registrations. The code is comprehensive, but I've found a couple of critical issues in the model implementation that would prevent it from working correctly. One is a fragile dependency in the image encoding logic, and the other is an incorrect weight name remapping during model loading. Addressing these will be crucial for the model to function as intended.

gemini-code-assist · 2026-01-14T11:22:04Z

vllm/model_executor/models/moondream3.py

+        pixel_values = pixel_values.to(device=device, dtype=dtype)
+
+        features = self.vision(pixel_values)
+        grid_size = self.config.vision.enc_n_layers


The grid_size is being set to self.config.vision.enc_n_layers. While the value (27) is coincidentally correct for the default configuration (crop_size 378 / patch_size 14 = 27), this is semantically incorrect and very brittle. The grid size of the vision encoder output depends on the image crop size and patch size, not the number of encoder layers. This will break if the model configuration changes in a way that decouples these values. The grid size should be calculated from the vision config's crop_size and enc_patch_size for correctness and robustness.

Suggested change

grid_size = self.config.vision.enc_n_layers

grid_size = self.config.vision.crop_size // self.config.vision.enc_patch_size

gemini-code-assist · 2026-01-14T11:22:04Z

vllm/model_executor/models/moondream3.py

+            name = name.replace(".attn.qkv.", ".attn.qkv_proj.")
+            name = name.replace(".attn.proj.", ".attn.out_proj.")


The weight name remapping for the attention layers is incorrect. The code replaces .attn.qkv. with .attn.qkv_proj. and .attn.proj. with .attn.out_proj.. However, the Moondream3Attention module defines its layers with prefixes that result in parameter names containing ...attn.qkv.weight and ...attn.proj.weight. This mismatch will cause the weights for the attention QKV and output projections to fail to load, leading to model errors. These remapping lines should be removed to match the parameter names defined in Moondream3Attention.

DarkLight1337 · 2026-01-14T14:24:19Z

Heads up that you might need to update some imports after #32327

mergify · 2026-01-20T02:42:53Z

Documentation preview: https://vllm--32325.org.readthedocs.build/en/32325/

sniper35 · 2026-01-20T02:44:22Z

docs/models/supported_models.md

 <sup>E</sup> Pre-computed embeddings can be inputted for this modality.
+


This is to make the documentation not cluttered

Current:

After:

mergify · 2026-02-23T11:11:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sniper35.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sniper35 · 2026-02-23T12:48:44Z

Hey @copumpkin you can pull my branch to test by yourself. All the four skills are supported. Here is an instruction to run it.
moondream3_testing.md

Signed-off-by: Dong Wang <dongw2019@gmail.com>

Signed-off-by: Dong Wang <dongw2019@gmail.com> (cherry picked from commit 03c4c7c)

Signed-off-by: Dong Wang <dongw2019@gmail.com> (cherry picked from commit 008cdac)

Signed-off-by: Dong Wang <dongw2019@gmail.com>

sniper35 requested review from DarkLight1337, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners January 14, 2026 11:19

mergify bot added the new-model Requests to new models label Jan 14, 2026

mergify bot added the needs-rebase label Jan 14, 2026

sniper35 marked this pull request as draft January 14, 2026 11:20

mergify bot removed the needs-rebase label Jan 14, 2026

gemini-code-assist bot reviewed Jan 14, 2026

View reviewed changes

sniper35 changed the title ~~[Model] Add Moondream3 model support~~ [Model] Add Moondream3 model support[WIP] Jan 14, 2026

mergify bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) labels Jan 20, 2026

sniper35 commented Jan 20, 2026

View reviewed changes

sniper35 force-pushed the add-moondream3-model branch 2 times, most recently from 2544de1 to f8ded0c Compare January 20, 2026 02:59

copumpkin mentioned this pull request Jan 22, 2026

VLLM integration vikhyat/moondream#297

Open

sniper35 force-pushed the add-moondream3-model branch 3 times, most recently from 3e885c4 to c82c1bd Compare February 11, 2026 07:36

mergify bot added needs-rebase and removed needs-rebase labels Feb 23, 2026

sniper35 added 17 commits February 24, 2026 09:29

[Model] Add Moondream3 model support

04fa73f

Signed-off-by: Dong Wang <dongw2019@gmail.com>

WIP: fix token issues

5e6ba46

Signed-off-by: Dong Wang <dongw2019@gmail.com>

WIP: ruff

e151775

Signed-off-by: Dong Wang <dongw2019@gmail.com>

fix and test

a5cce42

Signed-off-by: Dong Wang <dongw2019@gmail.com>

fix CI failures

ff6867f

Signed-off-by: Dong Wang <dongw2019@gmail.com>

fix tests

e338bcb

Signed-off-by: Dong Wang <dongw2019@gmail.com>

add other capabilities for moondream3

1fb3262

Signed-off-by: Dong Wang <dongw2019@gmail.com>

update docs

64a2c2f

Signed-off-by: Dong Wang <dongw2019@gmail.com>

fix bunch of issues

99ff8f8

Signed-off-by: Dong Wang <dongw2019@gmail.com>

fix placeholder issues

158d131

Signed-off-by: Dong Wang <dongw2019@gmail.com>

add point and detect skills

b801fe0

Signed-off-by: Dong Wang <dongw2019@gmail.com> (cherry picked from commit 03c4c7c)

valdiate

6816b17

Signed-off-by: Dong Wang <dongw2019@gmail.com> (cherry picked from commit 008cdac)

All of the four skills work now

daaa2bf

Signed-off-by: Dong Wang <dongw2019@gmail.com>

fusedMOE

a1f6d1c

Signed-off-by: Dong Wang <dongw2019@gmail.com>

refactored to low-instrusion to core

7f67193

Signed-off-by: Dong Wang <dongw2019@gmail.com>

format

6acd06f

Signed-off-by: Dong Wang <dongw2019@gmail.com>

ruff

c7be284

Signed-off-by: Dong Wang <dongw2019@gmail.com>

sniper35 force-pushed the add-moondream3-model branch from 5aee540 to c7be284 Compare February 24, 2026 09:49

fix mypy issue

0537f4f

Signed-off-by: Dong Wang <dongw2019@gmail.com>

sniper35 marked this pull request as ready for review February 24, 2026 10:40

sniper35 requested review from ApostaC, alexm-redhat, heheda12345, njhill and orozery as code owners February 24, 2026 10:40

sniper35 changed the title ~~[Model] Add Moondream3 model support[WIP]~~ [Model] Add Moondream3 model support Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[Model] Add Moondream3 model support#32325

[Model] Add Moondream3 model support#32325
sniper35 wants to merge 18 commits intovllm-project:mainfrom
sniper35:add-moondream3-model

sniper35 commented Jan 14, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

mergify bot commented Jan 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 14, 2026

Uh oh!

gemini-code-assist bot Jan 14, 2026

Uh oh!

DarkLight1337 commented Jan 14, 2026

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

sniper35 Jan 20, 2026

Uh oh!

sniper35 Jan 20, 2026

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

sniper35 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	grid_size = self.config.vision.enc_n_layers
	grid_size = self.config.vision.crop_size // self.config.vision.enc_patch_size

		name = name.replace(".attn.qkv.", ".attn.qkv_proj.")
		name = name.replace(".attn.proj.", ".attn.out_proj.")

		<sup>E</sup> Pre-computed embeddings can be inputted for this modality.

Uh oh!

Comments

Conversation

sniper35 commented Jan 14, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

mergify bot commented Jan 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Jan 14, 2026

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

sniper35 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

sniper35 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

sniper35 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sniper35 commented Jan 14, 2026 •

edited by github-actions bot

Loading