Skip to content

Conversation

JohannesGaessler
Copy link
Collaborator

Fixes -fa auto for multiple GPUs, the problem is that the same compute graph is being passed to ggml_backend_sched twice, see #15434 (comment) . This PR extends llama_context::graph_reserve with a flag alloc so that the graph is only used once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant