[feat] NanoVLM Training support by Purshow · Pull Request #134 · EvolvingLMMs-Lab/lmms-engine

Purshow · 2026-02-09T04:54:57Z

No description provided.

kcz358 · 2026-02-09T05:21:42Z

examples/nanovlm/nanovlm_train.sh

+DATASET_PATH="/mnt/umm/users/pufanyi/workspace/Show/lmms-engine/data/llava_next.yaml"
+PROCESSOR_NAME="/mnt/umm/users/pufanyi/workspace/Show/CKPT/Qwen/Qwen3-0.6B"
+MODEL_PATH="/mnt/umm/users/pufanyi/workspace/Show/CKPT/Qwen/Qwen3-0.6B"
+SIGLIP_PROCESSOR="/mnt/umm/users/pufanyi/workspace/Show/CKPT/google/siglip2-so400m-patch16-naflex"


Later change to public path (in the repo) or hf path would be better

kcz358 · 2026-02-09T05:28:09Z

src/lmms_engine/datasets/processor/nanovlm_processor.py

+    def _normalize_messages_for_template(self, hf_messages):
+        normalized = []
+        for message in hf_messages:
+            content = message.get("content")
+            if isinstance(content, list):
+                parts = []
+                for item in content:
+                    if not isinstance(item, dict):
+                        parts.append(str(item))
+                        continue
+                    item_type = item.get("type")
+                    if item_type in ["image", "image_url"] or "image" in item:
+                        parts.append("<|vision_start|><|image_pad|><|vision_end|>\n")
+                    elif item_type in ["video", "video_url"] or "video" in item:
+                        parts.append("<|vision_start|><|video_pad|><|vision_end|>\n")
+                    elif item_type in ["audio", "audio_url"] or "audio" in item:
+                        parts.append("<|AUDIO|>\n")
+                    elif "text" in item:
+                        parts.append(item["text"])
+                normalized.append({"role": message["role"], "content": "".join(parts)})
+            else:
+                normalized.append(message)
+        return normalized


Okay for using a placeholder. But seems to be a bit hardcoded for the visual tokens. In examples such as https://github.com/EvolvingLMMs-Lab/lmms-engine/blob/main/src/lmms_engine/datasets/processor/qwen3_vl_processor.py does not use this but use chat template for example apply chat template on {"type": "image"} would directly generate a special image tokens and then expand on this. But I'm fine with this if this is necessary.

kcz358 · 2026-02-09T05:31:13Z

src/lmms_engine/models/nanovlm/modeling_nanovlm.py

+        vision_model: Optional[PreTrainedModel] = None,
+        language_model: Optional[PreTrainedModel] = None,
+        **kwargs,
+    ):
+        super().__init__(config)
+        attn_implementation = kwargs.pop("attn_implementation", None)
+        torch_dtype = kwargs.pop("torch_dtype", None)
+
+        if language_model is None:
+            language_model = AutoModelForCausalLM.from_pretrained(
+                config.llm_model_name,
+                attn_implementation=attn_implementation,
+                torch_dtype=torch_dtype,
+            )
+        if vision_model is None:
+            vision_model = Siglip2VisionModel.from_pretrained(
+                config.vision_model_name,
+                torch_dtype=torch_dtype,
+            )


Usually in a transformers styled model, we don't pass in an object and use from_pretrained in the init. Otherwise we do a double from_pretrained when use the model class. Usually I think the practice of transformers currently is to only init using config and then call AutoXXX.from_config or something similar

Examples:
https://github.com/huggingface/transformers/blob/7769f660935b5d48b73bf6711d0a78b6f8f98739/src/transformers/models/llava_onevision/modeling_llava_onevision.py#L267-L283

kcz358 · 2026-02-12T09:42:32Z

examples/nanovlm/nanovlm_train.sh

@@ -1,12 +1,13 @@
 #!/bin/bash
+export PYTHONPATH=/mnt/afs/niuyuwei/Job/lmms-engine/src:$PYTHONPATH


Local Python path

kcz358 · 2026-02-12T09:44:02Z

examples/nanovlm/nanovlm_train.sh

-MODEL_PATH="/mnt/umm/users/pufanyi/workspace/Show/CKPT/Qwen/Qwen3-0.6B"
-SIGLIP_PROCESSOR="/mnt/umm/users/pufanyi/workspace/Show/CKPT/google/siglip2-so400m-patch16-naflex"
-
+DATASET_PATH="./data/llava_next.yaml"


Feel free to add the example or instruction/script for preparing data to example so that the speedrun can be setup conveniently.

kcz358

LGTM for me. I left a few comments for some small revision. Also can use the pre-commit to pass the lint check. Thanks!

nanovlm

4970858

kcz358 reviewed Feb 9, 2026

View reviewed changes

kcz358 reviewed Feb 12, 2026

View reviewed changes

kcz358 approved these changes Feb 12, 2026

View reviewed changes

Purshow added 3 commits February 12, 2026 13:39

tokenizer_init

48693b3

trasnformer-style

fee4800

style: fix linting issues with black and isort

0680e07

Purshow force-pushed the main branch from eb7ee06 to 0680e07 Compare February 12, 2026 13:40

kcz358 merged commit 42dd631 into EvolvingLMMs-Lab:main Feb 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] NanoVLM Training support#134

[feat] NanoVLM Training support#134
kcz358 merged 4 commits intoEvolvingLMMs-Lab:mainfrom
Purshow:main

Purshow commented Feb 9, 2026

Uh oh!

kcz358 Feb 9, 2026

Uh oh!

kcz358 Feb 9, 2026

Uh oh!

kcz358 Feb 9, 2026

Uh oh!

kcz358 Feb 12, 2026

Uh oh!

kcz358 Feb 12, 2026

Uh oh!

kcz358 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1,12 +1,13 @@
		#!/bin/bash
		export PYTHONPATH=/mnt/afs/niuyuwei/Job/lmms-engine/src:$PYTHONPATH

Conversation

Purshow commented Feb 9, 2026

Uh oh!

kcz358 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kcz358 left a comment •

edited

Loading