-
Notifications
You must be signed in to change notification settings - Fork 4.9k
if GGML_BACKEND_DL defined use one of the ggml_backend_load functions #3057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ay be more cases, two identified and trapped
I decided to remove all references to whisper_load_backends and update bench, cli and stream to illustrate basic GGML_BACKEND_DL usage In ggml-backend.cpp # 1475 at start of ggml_backend_sched_new you assert that the last available device is a CPU The examples could now allow selectable device if using GGML_BACKEND_DL (e.g. benchmark CPU vs Vulkan vs CUDA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to whisper.cpp look good to me, but the changes to ggml-backend.cpp/h
need to be removed. ggml is a separate library and cannot be modified in this way. There are also good reasons why ggml_backend_sched
requires a CPU backend, even when using a GPU the CPU backend must be available.
OK - I'll modify (and un-modify) as required. I presume cpu is requred as a fallback if other fails? Just had that happen to me with blas. |
examples/bench/bench.cpp
Outdated
if(ggml_backend_load_best(params.device.c_str(), true, nullptr) == nullptr) { | ||
fprintf(stderr, "error: could not load device %s\n", params.device.c_str()); | ||
return 5; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not currently available to applications, but I agree that it should. I will make the change necessary to make this function public, but at the moment it cannot be used here.
There is also an important distinction between devices and backends. Backends may have multiple devices, e.g. in a system with multiple GPUs, and it would be good to add the ability to whisper.cpp to choose what device to use, but that would need to be done in a different way (e.g. by making whisper.cpp accept a ggml_backend_dev_t
object in whisper_context_params
). The implementation in llama.cpp may be useful to use as a guide, although it is a bit more complicated since llama.cpp can use multiple devices at the same time.
In conclusion:
- Adding a
--backend
parameter to choose which backend to load would be good, but either needs to useggml_backend_load
to load specifically the file given by the user, or it would need to wait untilggml_backend_load_best
is made public - Adding a
--device
parameter to choose which device to use would also be good, but it must be a separate setting
Probably better left for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the meanwhile I can safely expose ggml_backend_load_best before it's public? That's a good start from my point of view (fairly lost without it loading cpu on older machines).
I'll wrap it in a whisper_load_device function making sure that there's at least one cpu at end of list which should be OK?
Two modifications I made to ggml-backend.cpp were dealing with passing a nullptr to functions that wanted to return a member of the passed parameter.
Removed all objections as far as I can see |
As suggested when I exposed whisper_load_backends
Updated bench, cli and stream examples with conditional backend load calls
Added a helpful error message if null device access attempted in the two places I've seen it happen
Next I need to add load_best etc to allow specific device only to be demanded (which can cause other problems)