Description
Four VLM model classes have to_turbomind() wired up to call to_turbomind_aux() but their forward() methods raise NotImplementedError, making TurboMind backend unusable for these models. One model (qwen3.py) has to_turbomind() as a bare pass.
Users who set --backend turbomind with any of these models will get a runtime error when processing vision inputs.
Affected Models
| File |
to_turbomind() |
forward() |
Issue |
vl/model/deepseek_vl2.py |
Calls to_turbomind_aux |
NotImplementedError (line 104-117) |
Runtime crash |
vl/model/llama4.py |
Calls to_turbomind_aux |
NotImplementedError (line 85-98) |
Runtime crash |
vl/model/gemma3_vl.py |
Calls to_turbomind_aux |
NotImplementedError (line 92-105) |
Runtime crash |
vl/model/qwen3.py |
pass (line ~) |
pass |
Silent no-op |
Working Reference
vl/model/internvl.py has a working implementation:
to_turbomind() calls self.proc_messages() to format prompt + image tokens, then delegates to self.to_turbomind_aux()
forward() extracts visual features from pixel values and appends them to messages with role='forward'
to_turbomind_aux() (in base.py) reads these features to build input_embeddings
What Needs to Be Implemented
For each model, forward() must:
- Accept preprocessed messages containing pixel values
- Run the model's vision encoder to extract visual features
- Append features to the message list with
role='forward'
Additionally, qwen3.py needs its to_turbomind() and build_model() implemented from scratch.
Suggested Approach
Start with one model (e.g., Gemma3) as a reference implementation following the internvl.py pattern, then replicate for the other three.
Key Files
lmdeploy/vl/model/base.py — VisionModel base class, to_turbomind_aux() helper (line 289-323)
lmdeploy/vl/model/internvl.py — working reference (line 283-296)
lmdeploy/vl/model/deepseek_vl2.py — stub
lmdeploy/vl/model/llama4.py — stub
lmdeploy/vl/model/gemma3_vl.py — stub
lmdeploy/vl/model/qwen3.py — stub
Description
Four VLM model classes have
to_turbomind()wired up to callto_turbomind_aux()but theirforward()methods raiseNotImplementedError, making TurboMind backend unusable for these models. One model (qwen3.py) hasto_turbomind()as a barepass.Users who set
--backend turbomindwith any of these models will get a runtime error when processing vision inputs.Affected Models
to_turbomind()forward()vl/model/deepseek_vl2.pyto_turbomind_auxNotImplementedError(line 104-117)vl/model/llama4.pyto_turbomind_auxNotImplementedError(line 85-98)vl/model/gemma3_vl.pyto_turbomind_auxNotImplementedError(line 92-105)vl/model/qwen3.pypass(line ~)passWorking Reference
vl/model/internvl.pyhas a working implementation:to_turbomind()callsself.proc_messages()to format prompt + image tokens, then delegates toself.to_turbomind_aux()forward()extracts visual features from pixel values and appends them to messages withrole='forward'to_turbomind_aux()(inbase.py) reads these features to buildinput_embeddingsWhat Needs to Be Implemented
For each model,
forward()must:role='forward'Additionally,
qwen3.pyneeds itsto_turbomind()andbuild_model()implemented from scratch.Suggested Approach
Start with one model (e.g., Gemma3) as a reference implementation following the internvl.py pattern, then replicate for the other three.
Key Files
lmdeploy/vl/model/base.py—VisionModelbase class,to_turbomind_aux()helper (line 289-323)lmdeploy/vl/model/internvl.py— working reference (line 283-296)lmdeploy/vl/model/deepseek_vl2.py— stublmdeploy/vl/model/llama4.py— stublmdeploy/vl/model/gemma3_vl.py— stublmdeploy/vl/model/qwen3.py— stub