Vulkan: GGML_ASSERT failed on Kimi-Linear-48B with large context - maxComputeWorkGroupCount exceeded

### Name and Version

federico@Sogliola:~$ llama-server --version
load_backend: loaded RPC backend from /home/federico/.local/share/llamacpp/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1150) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /home/federico/.local/share/llamacpp/libggml-vulkan.so
load_backend: loaded CPU backend from /home/federico/.local/share/llamacpp/libggml-cpu-zen4.so
version: 7966 (8872ad212)
built with GNU 11.4.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

AMD Ryzen AI 9 HX370 (Strix Halo) with integrated Radeon 890M GPU

### Models

https://huggingface.co/bartowski/Kimi-Linear-48B-Instruct-GGUF (model-00001-of-00002.gguf, Q8_0 quantization)

### Problem description & steps to reproduce

## Problem Description

When running Kimi-Linear-48B-Instruct Q8_0 with llama-server and Vulkan backend, the server crashes with a `GGML_ASSERT` failure during prompt processing. The assertion fails because the computed workgroup dimensions exceed the GPU's `maxComputeWorkGroupCount` limits.

## Steps to Reproduce

1. Start llama-server with these parameters:
```bash
llama-server -c 65536 --context-shift -b 8192 -ub 2048 -fa on --no-mmap --jinja --host 0.0.0.0 --port 1234 -m /path/to/Kimi-Linear-48B-Instruct-Q8_0.gguf
```

2. Load a moderately sized text file (~38KB, approximately 13609 tokens)

3. Server crashes during prompt processing after processing 8192 tokens

## Error Output

```
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 8192, progress = 0.601955
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:6225: GGML_ASSERT(wg0 <= ctx->device->properties.limits.maxComputeWorkGroupCount[0] && wg1 <= ctx->device->properties.limits.maxComputeWorkGroupCount[1] && wg2 <= ctx->device->properties.limits.maxComputeWorkGroupCount[2]) failed
```

## Additional Context

- Same test with Qwen3-VL-30B-8bit works perfectly with identical parameters
- Issue appears specific to Kimi-Linear-48B model with large batch sizes
- Crash consistently happens at the same point during prompt processing
- Model works fine with small inputs (chat mode), only fails with larger context

### First Bad Commit

_No response_

### Relevant log output

```
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 8192, progress = 0.601955
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:6225: GGML_ASSERT(wg0 <= ctx->device->properties.limits.maxComputeWorkGroupCount[0] && wg1 <= ctx->device->properties.limits.maxComputeWorkGroupCount[1] && wg2 <= ctx->device->properties.limits.maxComputeWorkGroupCount[2]) failed

/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:6225: GGML_ASSERT(wg0 <= ctx->device->properties.limits.maxComputeWorkGroupCount[0] && wg1 <= ctx->device->properties.limits.maxComputeWorkGroupCount[1] && wg2 <= ctx->device->properties.limits.maxComputeWorkGroupCount[2]) failed
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: GGML_ASSERT failed on Kimi-Linear-48B with large context - maxComputeWorkGroupCount exceeded #19471

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Problem Description

Steps to Reproduce

Error Output

Additional Context

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vulkan: GGML_ASSERT failed on Kimi-Linear-48B with large context - maxComputeWorkGroupCount exceeded #19471

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Problem Description

Steps to Reproduce

Error Output

Additional Context

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions