-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Describe the bug
Using LLMRegistry.gemma3_1B_qat_4bit if you run a second message such that the attention mask is engaged, it will crash:
MLX/ErrorHandler.swift:343: Fatal error: [broadcast_shapes] Shapes (19,19) and (1,4,19,531) cannot be broadcast. at /Users/dkoski/Library/Developer/Xcode/DerivedData/mlx-swift-examples-eimbjcofifunwybkcvhnzjbqwyri/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx-c/mlx/c/fast.cpp:598
roughly:
- generate 512+ tokens of output
- issue a second prompt on the same KVCache
The (19, 19) is the mask:
slidingWindowMask = createAttentionMask(h: h, cache: allCaches)
here is what is created:
(lldb) p t
(Int) 13
(lldb) p offset
(Int) 512
(lldb) p windowSize
(Int?) 512
(lldb) p createCausalMask(n: t, offset: offset, windowSize: windowSize).shape
([Int]) 2 values {
[0] = 13
[1] = 525
}
so somewhere this is getting mashed.
Ah, which is this:
The code has moved forward since the original port.
Metadata
Metadata
Assignees
Labels
No labels