Skip to content

Conversation

peardox
Copy link
Contributor

@peardox peardox commented Apr 16, 2025

As suggested when I exposed whisper_load_backends
Updated bench, cli and stream examples with conditional backend load calls
Added a helpful error message if null device access attempted in the two places I've seen it happen

Next I need to add load_best etc to allow specific device only to be demanded (which can cause other problems)

@peardox
Copy link
Contributor Author

peardox commented Apr 17, 2025

I decided to remove all references to whisper_load_backends and update bench, cli and stream to illustrate basic GGML_BACKEND_DL usage

In ggml-backend.cpp # 1475 at start of ggml_backend_sched_new you assert that the last available device is a CPU
I had to set a conditional for GGML_BACKEND_DL to remove that check as it is possible to now select a specific device
For example, in bench.cpp, you can change the conditional ggml_backend_load_all() to ggml_backend_load_best("gpu", true, nullptr) and only a GPU will be available (would cause issues with use_gpu = false).

The examples could now allow selectable device if using GGML_BACKEND_DL (e.g. benchmark CPU vs Vulkan vs CUDA)

@peardox peardox marked this pull request as ready for review April 17, 2025 05:26
Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to whisper.cpp look good to me, but the changes to ggml-backend.cpp/h need to be removed. ggml is a separate library and cannot be modified in this way. There are also good reasons why ggml_backend_sched requires a CPU backend, even when using a GPU the CPU backend must be available.

@peardox
Copy link
Contributor Author

peardox commented Apr 17, 2025

OK - I'll modify (and un-modify) as required.

I presume cpu is requred as a fallback if other fails? Just had that happen to me with blas.

@peardox peardox marked this pull request as draft April 17, 2025 13:51
Comment on lines 86 to 89
if(ggml_backend_load_best(params.device.c_str(), true, nullptr) == nullptr) {
fprintf(stderr, "error: could not load device %s\n", params.device.c_str());
return 5;
}
Copy link
Member

@slaren slaren Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not currently available to applications, but I agree that it should. I will make the change necessary to make this function public, but at the moment it cannot be used here.

There is also an important distinction between devices and backends. Backends may have multiple devices, e.g. in a system with multiple GPUs, and it would be good to add the ability to whisper.cpp to choose what device to use, but that would need to be done in a different way (e.g. by making whisper.cpp accept a ggml_backend_dev_t object in whisper_context_params). The implementation in llama.cpp may be useful to use as a guide, although it is a bit more complicated since llama.cpp can use multiple devices at the same time.

In conclusion:

  • Adding a --backend parameter to choose which backend to load would be good, but either needs to use ggml_backend_load to load specifically the file given by the user, or it would need to wait until ggml_backend_load_best is made public
  • Adding a --device parameter to choose which device to use would also be good, but it must be a separate setting

Probably better left for a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the meanwhile I can safely expose ggml_backend_load_best before it's public? That's a good start from my point of view (fairly lost without it loading cpu on older machines).

I'll wrap it in a whisper_load_device function making sure that there's at least one cpu at end of list which should be OK?

Two modifications I made to ggml-backend.cpp were dealing with passing a nullptr to functions that wanted to return a member of the passed parameter.

@peardox peardox marked this pull request as ready for review April 18, 2025 16:50
@peardox
Copy link
Contributor Author

peardox commented Apr 18, 2025

Removed all objections as far as I can see

@peardox peardox marked this pull request as draft April 18, 2025 18:40
@peardox peardox marked this pull request as ready for review April 19, 2025 19:21
@peardox peardox closed this Apr 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants