Skip to content

Conversation

JohannesGaessler
Copy link
Collaborator

With FlashAttention enabled Gemma 3n can crash when using many GPUs due to GGML_MAX_SPLIT_INPUTS being too low, see #15434 (comment) . The maximum value with which I can provoke the issue is 19 when using 6 GPUs. This PR sets the value to 30 so that there is some margin above this value.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 1, 2025
@slaren slaren merged commit 5d804a4 into ggml-org:master Sep 1, 2025
48 checks passed
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants