ggml-backend: raise GGML_MAX_SPLIT_INPUTS #15722

JohannesGaessler · 2025-09-01T21:34:06Z

With FlashAttention enabled Gemma 3n can crash when using many GPUs due to GGML_MAX_SPLIT_INPUTS being too low, see #15434 (comment) . The maximum value with which I can provoke the issue is 19 when using 6 GPUs. This PR sets the value to 30 so that there is some margin above this value.

ggml-backend: raise GGML_MAX_SPLIT_INPUTS

de46ce3

JohannesGaessler requested a review from slaren September 1, 2025 21:34

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 1, 2025

slaren approved these changes Sep 1, 2025

View reviewed changes

slaren merged commit 5d804a4 into ggml-org:master Sep 1, 2025
48 checks passed

walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025

ggml-backend: raise GGML_MAX_SPLIT_INPUTS (ggml-org#15722)

51bf289

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-backend: raise GGML_MAX_SPLIT_INPUTS #15722

ggml-backend: raise GGML_MAX_SPLIT_INPUTS #15722

Uh oh!

JohannesGaessler commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-backend: raise GGML_MAX_SPLIT_INPUTS #15722

ggml-backend: raise GGML_MAX_SPLIT_INPUTS #15722

Uh oh!

Conversation

JohannesGaessler commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants