llama: fix -fa auto for multiple GPUs #15693

JohannesGaessler · 2025-08-31T12:56:45Z

Fixes -fa auto for multiple GPUs, the problem is that the same compute graph is being passed to ggml_backend_sched twice, see #15434 (comment) . This PR extends llama_context::graph_reserve with a flag alloc so that the graph is only used once.

llama: fix -fa auto for multiple GPUs

d1729cb

JohannesGaessler closed this Aug 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama: fix -fa auto for multiple GPUs #15693

llama: fix -fa auto for multiple GPUs #15693

Uh oh!

JohannesGaessler commented Aug 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

llama: fix -fa auto for multiple GPUs #15693

llama: fix -fa auto for multiple GPUs #15693

Uh oh!

Conversation

JohannesGaessler commented Aug 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant