[BUG] gemma3text crashes if the attention mask is used

**Describe the bug**

Using `LLMRegistry.gemma3_1B_qat_4bit` if you run a second message such that the attention mask is engaged, it will crash:

```
MLX/ErrorHandler.swift:343: Fatal error: [broadcast_shapes] Shapes (19,19) and (1,4,19,531) cannot be broadcast. at /Users/dkoski/Library/Developer/Xcode/DerivedData/mlx-swift-examples-eimbjcofifunwybkcvhnzjbqwyri/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx-c/mlx/c/fast.cpp:598
```

roughly:

- generate 512+ tokens of output
- issue a second prompt on the same KVCache

The `(19, 19)` is the mask:

```
            slidingWindowMask = createAttentionMask(h: h, cache: allCaches)
```

here is what is created:

```
(lldb) p t
(Int) 13
(lldb) p offset
(Int) 512
(lldb) p windowSize
(Int?) 512
(lldb) p createCausalMask(n: t, offset: offset, windowSize: windowSize).shape
([Int]) 2 values {
  [0] = 13
  [1] = 525
}
```

so somewhere this is getting mashed.

Ah, which is this:

- https://github.com/ml-explore/mlx-lm/issues/463

The code has moved forward since the original port.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] gemma3text crashes if the attention mask is used #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] gemma3text crashes if the attention mask is used #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions