Feature Request: Model group support (router mode / config.ini)

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

It would be great to add a `router-group` parameter for models in router mode. When loading models with `models-max`, models from the same router group should be treated as a single model.

### Motivation

In the current router mode of llama.cpp, the `models-max` parameter is used to count the number of models. However, there is a scenario where, in a dual-GPU environment, the system is capable of running two smaller models simultaneously or one larger model. Here’s an example of the configuration:
```ini
[gemma3-27b]
model = /path/to/model
tensor-split = 1,1

[qwen3-vl-8b]
model = /path/to/model
tensor-split = 1,0

[ministral3-8b]
model = /path/to/model
tensor-split = 0,1
```
In this hardware setup, the latter two models can run at the same time. But if models-max is set to 2, loading gemma3-27b will prevent the loading of other models. If we could introduce a group ID and consider models with the same group ID as a single model, this issue could be resolved. 

### Possible Implementation

For example, if we set `models-max` to 1 and configure it as follows (for any model that does not have a router-group set, an independent group ID can be assigned during loading.):
```ini
[gemma3-27b]
model = /path/to/model
tensor-split = 1,1
router-group = 0

[qwen3-vl-8b]
model = /path/to/model
tensor-split = 1,0
router-group = 1

[ministral3-8b]
model = /path/to/model
tensor-split = 0,1
router-group = 1
```
Then the latter two models would be able to run concurrently. This configuration would allow for more flexible use of the multiple GPUs. It would also be beneficial for running multiple smaller models or one larger model on a single GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Model group support (router mode / config.ini) #18312

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Model group support (router mode / config.ini) #18312

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions