-
Notifications
You must be signed in to change notification settings - Fork 13.5k
ggml : add support for dynamic loading of backends #10469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a55c1e9 to
1605605
Compare
8af20e3 to
808d434
Compare
Co-authored-by: Georgi Gerganov <[email protected]>
use MODULE target type for dl backend set backend output directory to the runtime directory ggml_backend_load_all searches backends in the system path first, then in the executable directory ggml-ci
5c04fb1 to
53d7f4f
Compare
ad04995 to
6d19135
Compare
…e executable directory
|
|
||
| # keep standard at C11 and C++11 | ||
| MK_CPPFLAGS = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon | ||
| MK_CPPFLAGS = -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -DGGML_USE_CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGML_USE_CPU now needs to be defined to use the CPU backend with the backend registry. This is necessary because the CPU backend now may be loaded dynamically, so it cannot be assumed that it is linked in the build. This may break other build scripts.
In Linux, it may also be necessary to link to dl for dlopen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to mention this change in: #9289
I spent a few hours scratching my head on why I had no devices.
On the side, when no devices are loaded, this causes a segfault due to cpu_dev being a nullptr:
We probably should assert or something here, or perhaps anywhere when 0 devices are present. Let the user know something is wrong.
|
Is |
|
It is ok to call |
It does, but allocates memory again, essentially duplicating total memory usage. I suppose it's a mistake on my end? It shouldn't behave like that on a combination of a static build with a dynamic backend? |
|
It could happen if you have a static backend and the same backend as a dynamic backend, but that does not happen normally, because backends build without |
* ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <[email protected]>
|
is this exposed in llama-server? |
Adds support for loading backends dynamically at load time, without needing to link to them in the build.
GGML_BACKEND_DLenabledggml_backend_load(const char * path)to load a backend dynamicallyggml_backend_load_all(void)to load all the known backendsggml_backend_unload(ggml_backend_reg_t reg)to unregister and unload a backendggml_backend_get_featuresto obtain a list of flags of a backend. This replaces the calls to theggml_cpu_has_xxfunctions from the CPU backend in llama.cppggml_backend_get_features, which returns the list of archs included in the build and the build flags used such asGGML_CUDA_FORCE_MMQ. Other backends should also implement this function to report compile-time flags and features.TODO
ggml_backend_load_allsearch paths