Skip to content

Feature Request: Add --device and --device-draft parametersΒ #785

@MRGRD56

Description

@MRGRD56

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

With the original llama.cpp, I can specify which device(s) the model will be running on, using -dev/--device and -devd/--device-draft.
This way, using speculative decoding, I can run the target model on CUDA0 and the draft model on CUDA1, making use of 2 GPUs. Like this:

C:\apps\llama.cpp\llama-server.exe --port 15900 `
      --model "E:\AI\LLM\gguf\unsloth\Qwen3-235B-A22B-Instruct-2507-GGUF\Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL-00001-of-00003.gguf" `
      --ctx-size 16384 `
      -ctk q8_0 -ctv q8_0 `
      -fa on `
      --n-gpu-layers 999 `
      -mg 0 `
      -dev CUDA0 `
      -ot "blk\.(0?[0-9]|1[0-6])\.ffn_.*_exps.=CUDA0" `
      -ot ".ffn_.*_exps.=CPU" `
      --parallel 1 `
      -tb 30 -t 15 `
      --jinja --reasoning-budget 0 --no-mmap `
      --model-draft "E:\AI\LLM\gguf\unsloth\Qwen3-4B-Instruct-2507-GGUF\Qwen3-4B-Instruct-2507-UD-Q6_K_XL.gguf" `
      -devd CUDA1 `
      -ngld 999 `
      --ctx-size-draft 16384 `
      -ctkd q8_0 -ctvd q8_0

If anything, my GPUs are different: RTX 5090 (32GB) and RTX 4070 Ti Super (16GB).

In ik_llama.cpp, as far as I understand, there's no way to specify which GPUs to use, except this environment variable CUDA_VISIBLE_DEVICES. But it doesn't let you use different cards for the target and the draft model.

Motivation

Using different GPUs for the target and the draft model when using speculative decoding, making it more effective.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions