Feature Request: Add separate --override-tensor control for draft models.

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

I'm using --override-tensor to selectively load parts of Qwen3 480B onto a GPU and load the rest into system memory. Attempting to use Qwen3 30B as a draft model to improve performance as it can run entirely in GPU RAM. However draft model performance is significantly degraded as it shares the same tensor-override setting as the main model and the MOEs are executed on CPU.

The enhancement would provide a new command line flag --override-tensor-draft to specify different offload parameters for the draft model. In addition, when providing this flag, the default behavior (without specifying --override-tensor-draft) should be to offload all layers/tensors to the GPU (if GPU offloading with ngld is specified) to match main model behavior.

### Motivation

Performance improvement with draft models using MOE architecture.

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add separate --override-tensor control for draft models. #15185

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add separate --override-tensor control for draft models. #15185

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions