Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models

### Name and Version

[docker@7158e8afaf9c ~]$ llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 6527 (7f766929)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu


### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell
llama-batched-bench -m .cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 --no-mmap -c 0 -ntg 128 -npp 512 -npl 1,2,3,4,5,6,7,8
```

### Problem description & steps to reproduce

When benching dense models through llama-batched-bench, the vulkan backend shows nice scaling across all batch sizes. Eg, Qwen3-8b q8_0:

```
|    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
|   512 |    128 |    1 |    640 |    0.758 |   675.37 |    4.953 |    25.84 |    5.711 |   112.06 |
|   512 |    128 |    2 |   1280 |    1.382 |   740.84 |    5.058 |    50.61 |    6.440 |   198.75 |
|   512 |    128 |    3 |   1920 |    2.282 |   673.16 |    5.257 |    73.04 |    7.539 |   254.67 |
|   512 |    128 |    4 |   2560 |    2.913 |   702.98 |    5.441 |    94.09 |    8.355 |   306.41 |
|   512 |    128 |    5 |   3200 |    3.684 |   694.80 |    5.593 |   114.43 |    9.277 |   344.93 |
|   512 |    128 |    6 |   3840 |    4.408 |   696.92 |    5.841 |   131.47 |   10.249 |   374.66 |
|   512 |    128 |    7 |   4480 |    5.227 |   685.71 |    6.002 |   149.29 |   11.228 |   398.99 |
|   512 |    128 |    8 |   5120 |    5.935 |   690.16 |    6.202 |   165.11 |   12.137 |   421.85 |
```

But when trying to same with a MOE model (gpt-oss-120b in this case), there is negative scaling at batch sizes 2 and 3. I know MOE models scale worse as not every sequence will activate the same experts (therefor there can be less weight sharing between sequences), but I would expect some positive improvement as batch size increases, not the current negative scaling:

```
|    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
|   512 |    128 |    1 |    640 |    1.281 |   399.82 |    2.531 |    50.58 |    3.811 |   167.92 |
|   512 |    128 |    2 |   1280 |    2.527 |   405.27 |    7.296 |    35.09 |    9.823 |   130.31 |
|   512 |    128 |    3 |   1920 |    3.879 |   395.98 |    8.605 |    44.62 |   12.484 |   153.79 |
|   512 |    128 |    4 |   2560 |    4.960 |   412.93 |    9.623 |    53.21 |   14.582 |   175.55 |
|   512 |    128 |    5 |   3200 |    6.187 |   413.78 |   10.704 |    59.79 |   16.891 |   189.45 |
|   512 |    128 |    6 |   3840 |    7.419 |   414.05 |   11.554 |    66.47 |   18.974 |   202.39 |
|   512 |    128 |    7 |   4480 |    8.851 |   404.92 |   12.547 |    71.41 |   21.398 |   209.36 |
|   512 |    128 |    8 |   5120 |    9.971 |   410.79 |   13.604 |    75.27 |   23.575 |   217.18 |
```

### First Bad Commit

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models #16134

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Vulkan backend shows negative scaling at low batch sizes with MOE models #16134

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions