Name and Version
version: 6399 (61bdfd5)
built with clang version 19.1.5 for x86_64-pc-windows-msvc
Operating systems
No response
Which llama.cpp modules do you know to be affected?
llama-server
Command line
.\llama-server.exe -m unsloth_Qwen3-0.6B-GGUF_Qwen3-0.6B-Q4_K_M.gguf -dev none
Problem description & steps to reproduce
- Start llama-serverwith any model
- Open web browser and go to http://127.0.0.1:8080
- Go to Settings -> Advanced and turn on the "Show tokens per second" toggle
- Click Save
- Type any prompt and see the generation
Before b6399 (61bdfd5) the web interface (http://localhost:8080) immediately showed a label Speed: xx t/s with the current speed, which was constantly updated during the generation and streaming (and when the mouse pointer was over it, it also showed details like pp tokens, generated tokens, times, speed, etc.).
Since b6399, the Speed label is shown only when the token generation is fully complete and is not visible during the streaming. This is especially annoying when the generation is slow or there is a big thinking part.
NOTE: Tested with both Firefox and Chrome, as well as on Windows and macOS, so the bug is not OS, or browser specific.
First Bad Commit
b6399 (61bdfd5)
Relevant log output