Fix Qwen35 VLM crash on text-only inference by dirvine · Pull Request #149 · ml-explore/mlx-swift-lm

dirvine · 2026-03-15T17:52:18Z

Problem

Loading a Qwen3.5 MoE model (e.g. mlx-community/Qwen3.5-35B-A3B-4bit) via VLMModelFactory works fine — the containr loads successfully. But the first text-only inference call crashes with:

Fatal error: SmallVector out of range.
at mlx-c/mlx/c/array.cpp:335

The same model loads and runs without issue through LLMModelFactory.

Root Cause

Qwen35Language.LanguageModel.callAsFunction passes inputs directly to Qwen3VLLanguage.getRopeIndex(), which does:

let (batchSize, seqLength) = (inputIds.dim(0), inputIds.dim(1))

When the VLM container is used for text-only generation (no images), callers like WiredMemoryUtils.tune() and TokenIterator pass 1D [seq] token tensors. dim(1) on a 1D array is out of range → fatal crash.

The same issue affects inputs.dim(1) at lines 946 and 966 in the same function.

Fix

Add an ndim check at the top of callAsFunction to expand 1D inputs to 2D [1, seq] before any dimension-dependent logic runs. This matches what the LLM path does via tokens[text: .newAxis] in prefillOnly.

Testing

Verified locally with mlx-community/Qwen3.5-35B-A3B-4bit on M4 Max 96GB:

Container loads via VLMModelFactory ✅
WiredMemoryUtils.tune() completes (was crashing) ✅
Text inference produces correct output ✅
Full e2e test suite passes ✅

Fixes #148

Qwen35Language.LanguageModel.callAsFunction assumes inputs is always 2D [batch, seq], but text-only callers like WiredMemoryUtils.tune and TokenIterator can pass 1D [seq] token arrays. This causes getRopeIndex() and subsequent dim(1) calls to crash with "SmallVector out of range" when accessing a non-existent dimension. Add an ndim check at the top of callAsFunction to expand 1D inputs to 2D before any dimension-dependent logic runs. Fixes ml-explore#148

davidkoski · 2026-03-19T23:07:44Z

Libraries/MLXVLM/Models/Qwen35.swift

            videoGridTHW: [THW]? = nil
        ) -> LMOutput {
+            // Ensure inputs is 2D [batch, seq]. Text-only callers (e.g.
+            // WiredMemoryUtils, TokenIterator) may pass 1D token arrays.


I wonder if that is incorrect (as in this is the bug). I would expect the shape to always be [B, S].

So:

are these paths buggy in that they allow 1d arrays?

if not, perhaps the callers should ensure the correct shape -- otherwise this is a one-off fix for a single model

davidkoski

Please look at my comment on the shape -- let me know what you think. I suspect it is a caller issue, but not 100% sure.

davidkoski reviewed Mar 19, 2026

View reviewed changes

davidkoski mentioned this pull request Mar 23, 2026

[BUG] Qwen 3.5 VLM model crashes on subsequent requests with manual KV cache prefix matching #157

Open

davidkoski requested changes Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen35 VLM crash on text-only inference#149

Fix Qwen35 VLM crash on text-only inference#149
dirvine wants to merge 1 commit intoml-explore:mainfrom
dirvine:fix/qwen35-vlm-1d-input

dirvine commented Mar 15, 2026

Uh oh!

davidkoski Mar 19, 2026

Uh oh!

davidkoski left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dirvine commented Mar 15, 2026

Problem

Root Cause

Fix

Testing

Uh oh!

davidkoski Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

davidkoski left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants