-
Notifications
You must be signed in to change notification settings - Fork 104
Description
Description
Loading mlx-community/Qwen3.5-35B-A3B-4bit via VLMModelFactory.shared.loadContainer() works fine — the container loads and weights map correctly. But the first text-only inference call crashes with:
MLX/ErrorHandler.swift:343: Fatal error: SmallVector out of range.
at mlx-c/mlx/c/array.cpp:335
The same model loads and runs fine through LLMModelFactory.
Root Cause
Qwen3VLLanguage.getRopeIndex() (Qwen3VL.swift L1341) assumes inputIds is always 2D [batch, seq]:
let (batchSize, seqLength) = (inputIds.dim(0), inputIds.dim(1))When the VLM container is used for text-only generation (no images), the token tensor can be 1D [seq]. inputIds.dim(1) on a 1D array triggers the SmallVector fatal error.
The call path is:
Qwen35.prepare(input:cache:windowSize:)— no image/video, sopixelValuesis nillanguageModel(inputIds, ...)— forwards raw token IDsQwen35Language.LanguageModel.callAsFunction—positionIds == nil,ropeDeltas == nil, so enters thegetRopeIndexbranchQwen3VLLanguage.getRopeIndex(inputIds:...)— crashes oninputIds.dim(1)becasue the tensor is 1D
This shows up concretely when WiredMemoryUtils.tune() runs its prefill measurement — it creates a plain LMInput(tokens:) with 1D tokens, calls model.prepare(), and the VLM path crashes.
Reproduction
import MLXLMCommon
import MLXVLM
let config = ModelConfiguration(id: "mlx-community/Qwen3.5-35B-A3B-4bit")
let container = try await VLMModelFactory.shared.loadContainer(configuration: config)
// This crashes:
try await container.perform { context in
try await WiredMemoryUtils.tune(
context: context,
tokenCount: 512,
parameters: GenerateParameters(maxTokens: 1)
)
}Suggested Fix
The simplest fix would be to ensure inputIds is 2D before calling getRopeIndex. Something like:
// In Qwen35Language.LanguageModel.callAsFunction, before calling getRopeIndex:
var inputsForRope = inputs
if inputsForRope.ndim == 1 {
inputsForRope = inputsForRope.expandedDimensions(axis: 0)
}Or alternatively, getRopeIndex itself could handle 1D input gracefully.
Environment
- mlx-swift-lm: main branch (HEAD = bc3c20e)
- mlx-swift: main branch (HEAD = b6e128c)
- Model:
mlx-community/Qwen3.5-35B-A3B-4bit(model_type:qwen3_5_moe) - macOS 26.0, Apple M4 Max 96GB