Skip to content

Flash Attention on 1.83.1 breaks Mistral Small 3 models #1372

@Magiwarriorx

Description

@Magiwarriorx

Enabling Flash Attention on 1.83.1 with a Mistral Small 3 model leads to the model replying with Unicode garbage when the prompt exceeds 4k tokens, regardless of context size (prompts only slightly over 4k tokens can produce passable results occasionally, try like 8k tokens for a clear picture). Disabling Flash Attention, or using 1.82.4, fixes the issue.

Additional Information:
Windows 10, AMD 7800X3D, RTX 4090, latest Nvidia drivers

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions