Skip to content

Fix Qwen35 VLM crash on text-only inference#149

Open
dirvine wants to merge 1 commit intoml-explore:mainfrom
dirvine:fix/qwen35-vlm-1d-input
Open

Fix Qwen35 VLM crash on text-only inference#149
dirvine wants to merge 1 commit intoml-explore:mainfrom
dirvine:fix/qwen35-vlm-1d-input

Conversation

@dirvine
Copy link
Copy Markdown

@dirvine dirvine commented Mar 15, 2026

Problem

Loading a Qwen3.5 MoE model (e.g. mlx-community/Qwen3.5-35B-A3B-4bit) via VLMModelFactory works fine — the containr loads successfully. But the first text-only inference call crashes with:

Fatal error: SmallVector out of range.
at mlx-c/mlx/c/array.cpp:335

The same model loads and runs without issue through LLMModelFactory.

Root Cause

Qwen35Language.LanguageModel.callAsFunction passes inputs directly to Qwen3VLLanguage.getRopeIndex(), which does:

let (batchSize, seqLength) = (inputIds.dim(0), inputIds.dim(1))

When the VLM container is used for text-only generation (no images), callers like WiredMemoryUtils.tune() and TokenIterator pass 1D [seq] token tensors. dim(1) on a 1D array is out of range → fatal crash.

The same issue affects inputs.dim(1) at lines 946 and 966 in the same function.

Fix

Add an ndim check at the top of callAsFunction to expand 1D inputs to 2D [1, seq] before any dimension-dependent logic runs. This matches what the LLM path does via tokens[text: .newAxis] in prefillOnly.

Testing

Verified locally with mlx-community/Qwen3.5-35B-A3B-4bit on M4 Max 96GB:

  • Container loads via VLMModelFactory
  • WiredMemoryUtils.tune() completes (was crashing) ✅
  • Text inference produces correct output ✅
  • Full e2e test suite passes ✅

Fixes #148

Qwen35Language.LanguageModel.callAsFunction assumes inputs is always 2D
[batch, seq], but text-only callers like WiredMemoryUtils.tune and
TokenIterator can pass 1D [seq] token arrays. This causes
getRopeIndex() and subsequent dim(1) calls to crash with
"SmallVector out of range" when accessing a non-existent dimension.

Add an ndim check at the top of callAsFunction to expand 1D inputs
to 2D before any dimension-dependent logic runs.

Fixes ml-explore#148
videoGridTHW: [THW]? = nil
) -> LMOutput {
// Ensure inputs is 2D [batch, seq]. Text-only callers (e.g.
// WiredMemoryUtils, TokenIterator) may pass 1D token arrays.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that is incorrect (as in this is the bug). I would expect the shape to always be [B, S].

So:

  • are these paths buggy in that they allow 1d arrays?
  • if not, perhaps the callers should ensure the correct shape -- otherwise this is a one-off fix for a single model

Copy link
Copy Markdown
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at my comment on the shape -- let me know what you think. I suspect it is a caller issue, but not 100% sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5 MoE VLM crashes on text-only inference (SmallVector out of range)

2 participants