Skip to content

Conversation

ggerganov
Copy link
Member

target #16391

Setting LLAMA_SERVER_SLOTS_DEBUG=1 env will make the /slots endpoint output a more detailed output containing the prompt and the generated text of the current or last task. This is useful for debugging purposes.

@ngxson ngxson merged commit c5e5167 into gg/prompt-cache-ext Oct 9, 2025
59 checks passed
@ggerganov ggerganov deleted the gg/server-slot-contents branch October 9, 2025 14:20
ggerganov added a commit that referenced this pull request Oct 9, 2025
* minor : code style

* server : fix prompt similarity calculation

* server : initial host-memory prompt caching

* cont

* server : refactor

* cont

* cont : make the server task of the slot const

* cont : minor [no ci]

* server : cache prompts and checkpoints only for completion tasks

* server : improve prompt caching logic

* cont : fix check for number of cached prompts [no ci]

* server : improve caching logic, add -cram CLI arg

* server : print prompt mismatch info

* cont : better naming [no ci]

* server : improve prompt cache loading logic

* server : add option to debug the slot contents (#16482)

* server : add option to debug the slot contents

* Update tools/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* server : add option to disable prompt cache

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
* minor : code style

* server : fix prompt similarity calculation

* server : initial host-memory prompt caching

* cont

* server : refactor

* cont

* cont : make the server task of the slot const

* cont : minor [no ci]

* server : cache prompts and checkpoints only for completion tasks

* server : improve prompt caching logic

* cont : fix check for number of cached prompts [no ci]

* server : improve caching logic, add -cram CLI arg

* server : print prompt mismatch info

* cont : better naming [no ci]

* server : improve prompt cache loading logic

* server : add option to debug the slot contents (ggml-org#16482)

* server : add option to debug the slot contents

* Update tools/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

* server : add option to disable prompt cache

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants