Skip to content

[BUG] gemma3text crashes if the attention mask is used #27

@davidkoski

Description

@davidkoski

Describe the bug

Using LLMRegistry.gemma3_1B_qat_4bit if you run a second message such that the attention mask is engaged, it will crash:

MLX/ErrorHandler.swift:343: Fatal error: [broadcast_shapes] Shapes (19,19) and (1,4,19,531) cannot be broadcast. at /Users/dkoski/Library/Developer/Xcode/DerivedData/mlx-swift-examples-eimbjcofifunwybkcvhnzjbqwyri/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx-c/mlx/c/fast.cpp:598

roughly:

  • generate 512+ tokens of output
  • issue a second prompt on the same KVCache

The (19, 19) is the mask:

            slidingWindowMask = createAttentionMask(h: h, cache: allCaches)

here is what is created:

(lldb) p t
(Int) 13
(lldb) p offset
(Int) 512
(lldb) p windowSize
(Int?) 512
(lldb) p createCausalMask(n: t, offset: offset, windowSize: windowSize).shape
([Int]) 2 values {
  [0] = 13
  [1] = 525
}

so somewhere this is getting mashed.

Ah, which is this:

The code has moved forward since the original port.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions