Feature Request: Add `--device` and `--device-draft` parameters

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

With the original llama.cpp, I can specify which device(s) the model will be running on, using `-dev`/`--device` and `-devd`/`--device-draft`.  
This way, using speculative decoding, I can run the target model on `CUDA0` and the draft model on `CUDA1`, making use of 2 GPUs. Like this:

```powershell
C:\apps\llama.cpp\llama-server.exe --port 15900 `
      --model "E:\AI\LLM\gguf\unsloth\Qwen3-235B-A22B-Instruct-2507-GGUF\Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL-00001-of-00003.gguf" `
      --ctx-size 16384 `
      -ctk q8_0 -ctv q8_0 `
      -fa on `
      --n-gpu-layers 999 `
      -mg 0 `
      -dev CUDA0 `
      -ot "blk\.(0?[0-9]|1[0-6])\.ffn_.*_exps.=CUDA0" `
      -ot ".ffn_.*_exps.=CPU" `
      --parallel 1 `
      -tb 30 -t 15 `
      --jinja --reasoning-budget 0 --no-mmap `
      --model-draft "E:\AI\LLM\gguf\unsloth\Qwen3-4B-Instruct-2507-GGUF\Qwen3-4B-Instruct-2507-UD-Q6_K_XL.gguf" `
      -devd CUDA1 `
      -ngld 999 `
      --ctx-size-draft 16384 `
      -ctkd q8_0 -ctvd q8_0
```

If anything, my GPUs are different: RTX 5090 (32GB) and RTX 4070 Ti Super (16GB).

In ik_llama.cpp, as far as I understand, there's no way to specify which GPUs to use, except this environment variable `CUDA_VISIBLE_DEVICES`. But it doesn't let you use different cards for the target and the draft model.

### Motivation

Using different GPUs for the target and the draft model when using speculative decoding, making it more effective.

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add `--device` and `--device-draft` parameters #785

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Add --device and --device-draft parameters #785

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Add `--device` and `--device-draft` parameters #785