Commit 06a92a1
authored
server : fix cache reuse logic (ggml-org#12161)
The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.1 parent a057897 commit 06a92a1
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3003 | 3003 | | |
3004 | 3004 | | |
3005 | 3005 | | |
3006 | | - | |
| 3006 | + | |
3007 | 3007 | | |
3008 | 3008 | | |
3009 | 3009 | | |
| |||
0 commit comments