Name and Version
llama-server version: 4292 (1a05004)
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4080 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | matrix cores: KHR_coopmat
version: 4292 (1a050047)
built with MSVC 19.42.34435.0 for x64
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
Build:
cmake -S . -B build -DGGML_VULKAN=ON -DGGML_OPENMP=OFF -DGGML_STATIC=OFF -DBUILD_SHARED_LIBS=ON
cmake --build build --config Release
Command:
llama-server.exe -m Phi-3-mini-4k-instruct-fp16.gguf -ngl 100
The model is downloaded from: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Phi-3-mini-4k-instruct-fp16.gguf (other models may also reproduce the issue).
Open browser: http://127.0.0.1:8080 and start a new conversation with "Hi" (tested on RTX 4080 with driver 565.90):

I got invalid reply with code "#" repeatedly.
runlog: llama-server-runlog-verbose.txt
If set $env:GGML_VK_DISABLE_COOPMAT=1 and $env:GGML_VK_DISABLE_COOPMAT2=1, I can get normal reply "Hello! How can I assist you today?".
First Bad Commit
I think the issue is related to: #10597
Commit: 3df784b
Relevant log output
See https://github.com/user-attachments/files/18058540/llama-server-runlog-verbose.txt.