Qwen3.5 MoE VLM crashes on text-only inference (SmallVector out of range)

### Description

Loading `mlx-community/Qwen3.5-35B-A3B-4bit` via `VLMModelFactory.shared.loadContainer()` works fine — the container loads and weights map correctly. But the first text-only inference call crashes with:

```
MLX/ErrorHandler.swift:343: Fatal error: SmallVector out of range.
at mlx-c/mlx/c/array.cpp:335
```

The same model loads and runs fine through `LLMModelFactory`.

### Root Cause

`Qwen3VLLanguage.getRopeIndex()` ([Qwen3VL.swift L1341](https://github.com/ml-explore/mlx-swift-lm/blob/main/Libraries/MLXVLM/Models/Qwen3VL.swift#L1341)) assumes `inputIds` is always 2D `[batch, seq]`:

```swift
let (batchSize, seqLength) = (inputIds.dim(0), inputIds.dim(1))
```

When the VLM container is used for text-only generation (no images), the token tensor can be 1D `[seq]`. `inputIds.dim(1)` on a 1D array triggers the SmallVector fatal error.

The call path is:
1. `Qwen35.prepare(input:cache:windowSize:)` — no image/video, so `pixelValues` is nil
2. `languageModel(inputIds, ...)` — forwards raw token IDs
3. `Qwen35Language.LanguageModel.callAsFunction` — `positionIds == nil`, `ropeDeltas == nil`, so enters the `getRopeIndex` branch
4. `Qwen3VLLanguage.getRopeIndex(inputIds:...)` — crashes on `inputIds.dim(1)` becasue the tensor is 1D

This shows up concretely when `WiredMemoryUtils.tune()` runs its prefill measurement — it creates a plain `LMInput(tokens:)` with 1D tokens, calls `model.prepare()`, and the VLM path crashes.

### Reproduction

```swift
import MLXLMCommon
import MLXVLM

let config = ModelConfiguration(id: "mlx-community/Qwen3.5-35B-A3B-4bit")
let container = try await VLMModelFactory.shared.loadContainer(configuration: config)

// This crashes:
try await container.perform { context in
    try await WiredMemoryUtils.tune(
        context: context,
        tokenCount: 512,
        parameters: GenerateParameters(maxTokens: 1)
    )
}
```

### Suggested Fix

The simplest fix would be to ensure `inputIds` is 2D before calling `getRopeIndex`. Something like:

```swift
// In Qwen35Language.LanguageModel.callAsFunction, before calling getRopeIndex:
var inputsForRope = inputs
if inputsForRope.ndim == 1 {
    inputsForRope = inputsForRope.expandedDimensions(axis: 0)
}
```

Or alternatively, `getRopeIndex` itself could handle 1D input gracefully.

### Environment

- mlx-swift-lm: main branch (HEAD = bc3c20e)
- mlx-swift: main branch (HEAD = b6e128c)  
- Model: `mlx-community/Qwen3.5-35B-A3B-4bit` (model_type: `qwen3_5_moe`)
- macOS 26.0, Apple M4 Max 96GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5 MoE VLM crashes on text-only inference (SmallVector out of range) #148

Description

Root Cause

Reproduction

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3.5 MoE VLM crashes on text-only inference (SmallVector out of range) #148

Description

Description

Root Cause

Reproduction

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions