llama-bench : use local GPUs along with RPC servers #14917

rgerganov · 2025-07-28T12:19:53Z

Currently if RPC servers are specified with '--rpc' and there is a local GPU available (e.g. CUDA), the benchmark will be performed only on the RPC device(s) but the backend result column will say "CUDA,RPC" which is incorrect. This patch is adding all local GPU devices and makes llama-bench consistent with llama-cli.

rgerganov requested a review from slaren July 28, 2025 12:19

github-actions bot added the examples label Jul 28, 2025

slaren approved these changes Jul 28, 2025

View reviewed changes

rgerganov merged commit c556418 into ggml-org:master Jul 28, 2025
47 checks passed

rgerganov mentioned this pull request Aug 10, 2025

Running on iGPUs with Vulkan can be fun - llama.cpp OOM error debugging geerlingguy/beowulf-ai-cluster#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-bench : use local GPUs along with RPC servers #14917

llama-bench : use local GPUs along with RPC servers #14917

Uh oh!

rgerganov commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llama-bench : use local GPUs along with RPC servers #14917

llama-bench : use local GPUs along with RPC servers #14917

Uh oh!

Conversation

rgerganov commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants