Skip to content

Conversation

ggerganov
Copy link
Member

PoC for #14482 (comment)

This is a hacky implementation of the idea in the comment. It works but it does not lead to any measurable improvement. The reason is that we already do the CPU/GPU overlap by submitting the first 128 graph nodes and while their are computing, we prepare and submit the rest of the graph. This is already enough to completely mask the CPU overhead of constructing the Metal graph, so there is no point in adding logic to reuse a previous Metal graph.

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Jul 7, 2025
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 7, 2025
@ggerganov ggerganov mentioned this pull request Jul 7, 2025
10 tasks
@ggerganov ggerganov force-pushed the gg/llama-reuse-graphs branch 3 times, most recently from 8303a68 to 3d28b3b Compare July 12, 2025 13:35
Base automatically changed from gg/llama-reuse-graphs to master July 17, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) demo Demonstrate some concept or idea, not intended to be merged ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant