Flash Attention on 1.83.1 breaks Mistral Small 3 models

Enabling Flash Attention on 1.83.1 with a Mistral Small 3 model leads to the model replying with Unicode garbage when the prompt exceeds 4k tokens, regardless of context size (prompts only slightly over 4k tokens can produce passable results occasionally, try like 8k tokens for a clear picture). Disabling Flash Attention, or using 1.82.4, fixes the issue. 

**Additional Information:**
Windows 10, AMD 7800X3D, RTX 4090, latest Nvidia drivers


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flash Attention on 1.83.1 breaks Mistral Small 3 models #1372

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Flash Attention on 1.83.1 breaks Mistral Small 3 models #1372

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions