Skip to content

Conversation

@BB-fat
Copy link
Contributor

@BB-fat BB-fat commented Mar 8, 2025

Currently, Metal shaders are recompiled for every llama context initialization, which is redundant and impacts performance when creating multiple contexts.
Cache the compiled Metal library at the device context level (g_ggml_ctx_dev_main), reusing it for subsequent context initializations.

Fixes #12199

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 8, 2025
@BB-fat BB-fat force-pushed the metal-library-cache branch 2 times, most recently from 6b3a511 to 0569909 Compare March 9, 2025 12:27
@BB-fat BB-fat marked this pull request as ready for review March 9, 2025 12:29
@BB-fat
Copy link
Contributor Author

BB-fat commented Mar 10, 2025

During testing, I found an objc double-release issue, I am trying to fix it.

@BB-fat BB-fat force-pushed the metal-library-cache branch from 0569909 to 70432c7 Compare March 10, 2025 05:33
@BB-fat
Copy link
Contributor Author

BB-fat commented Mar 10, 2025

@ggerganov Please review when convenient.

@ggerganov ggerganov merged commit 6ab2e47 into ggml-org:master Mar 11, 2025
47 checks passed
@BB-fat BB-fat deleted the metal-library-cache branch March 12, 2025 02:19
ishaangandhi pushed a commit to ishaangandhi/llama.cpp that referenced this pull request Mar 12, 2025
jpohhhh pushed a commit to Telosnex/llama.cpp that referenced this pull request Mar 14, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metal] Context init optimization opportunity: metal library is compiled for every llama context

2 participants