-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Open
Labels
Description
Name and Version
version: 6721 (56b4795)
built with clang version 19.1.5 for x86_64-pc-windows-msvc
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m LFM2-2.6B-Q8_0.gguf --jinja -ngl 999 -fa on -c 32768
Even when starting with --swa-full, the behavior doesn’t change.
Problem description & steps to reproduce
For both LFM2-2.6B and LFM2-8B-A1 (arch lfm2, lfm2moe), it always shows “forcing full prompt re-processing,” and the prompt cache gets cleared every time.
Modify the last part of the prompt and send the request.
First Bad Commit
No response
Relevant log output
srv params_from_: Chat format: Content-only
slot get_availabl: id 0 | task 1912 | selected slot by lcs similarity, lcs_len = 7657, similarity = 0.922 (> 0.100 thold)
slot launch_slot_: id 0 | task 2068 | processing task
slot update_slots: id 0 | task 2068 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 8155
slot update_slots: id 0 | task 2068 | n_past = 7657, cache_tokens.size() = 8306, seq_id = 0, pos_min = 8305, n_swa = 1
slot update_slots: id 0 | task 2068 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 2068 | erased invalidated context checkpoint (pos_min = 8092, pos_max = 8092, n_swa = 1, size = 0.344 MiB)
slot update_slots: id 0 | task 2068 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 2068 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.251134
slot update_slots: id 0 | task 2068 | n_past = 2048, memory_seq_rm [2048, end)
slot update_slots: id 0 | task 2068 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.502269
slot update_slots: id 0 | task 2068 | n_past = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 2068 | prompt processing progress, n_past = 6144, n_tokens = 2048, progress = 0.753403
slot update_slots: id 0 | task 2068 | n_past = 6144, memory_seq_rm [6144, end)
slot update_slots: id 0 | task 2068 | prompt processing progress, n_past = 8091, n_tokens = 1947, progress = 0.992152
slot update_slots: id 0 | task 2068 | n_past = 8091, memory_seq_rm [8091, end)
slot update_slots: id 0 | task 2068 | prompt processing progress, n_past = 8155, n_tokens = 64, progress = 1.000000
slot update_slots: id 0 | task 2068 | prompt done, n_past = 8155, n_tokens = 64
slot update_slots: id 0 | task 2068 | saved context checkpoint 1 of 3 (pos_min = 8090, pos_max = 8090, size = 0.344 MiB)
srv cancel_tasks: cancel task, id_task = 2068
srv log_server_r: request: POST /v1/chat/completions 192.168.1.199 200
slot release: id 0 | task 2068 | stop processing: n_past = 8299, truncated = 0
srv update_slots: all slots are idle