metal : reuse graphs #14570

ggerganov · 2025-07-07T18:45:00Z

This is a hacky implementation of the idea in the comment. It works but it does not lead to any measurable improvement. The reason is that we already do the CPU/GPU overlap by submitting the first 128 graph nodes and while their are computing, we prepare and submit the rest of the graph. This is already enough to completely mask the CPU overhead of constructing the Metal graph, so there is no point in adding logic to reuse a previous Metal graph.

ggml-ci

ggerganov added 3 commits July 5, 2025 15:18

llama : reuse compute graphs

76681e3

ggml-ci

llama-bench : add graph reuse parameter

0d2038f

ggml-ci

metal : reuse graphs

bf8b390

ggml-ci

ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Jul 7, 2025

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 7, 2025

ggerganov mentioned this pull request Jul 7, 2025

llama : reuse compute graphs #14482

Merged

10 tasks

ggerganov force-pushed the gg/llama-reuse-graphs branch 3 times, most recently from 8303a68 to 3d28b3b Compare July 12, 2025 13:35

Base automatically changed from gg/llama-reuse-graphs to master July 17, 2025 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : reuse graphs #14570

metal : reuse graphs #14570

Uh oh!

ggerganov commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

metal : reuse graphs #14570

Are you sure you want to change the base?

metal : reuse graphs #14570

Uh oh!

Conversation

ggerganov commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant