Skip to content

Commit 24aff7f

Browse files
committed
fix(main): Check the output of seq_rm for prefix matching
This prefix matching is explicitly attempting to remove the tokens at the end of the sequence that don't match. This is the operation that can't be performed on a recurrent cache due to the state being updated in place, so if this removal fails, we need to clear the whole cache. #16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <[email protected]>
1 parent 4fb572c commit 24aff7f

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

tools/main/main.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,10 @@ int main(int argc, char ** argv) {
354354
}
355355

356356
// remove any "future" tokens that we might have inherited from the previous session
357-
llama_memory_seq_rm(mem, -1, n_matching_session_tokens, -1);
357+
if (!llama_memory_seq_rm(mem, -1, n_matching_session_tokens, -1)) {
358+
LOG_INF("%s: unable to resuse common prefix\n", __func__);
359+
llama_memory_seq_rm(mem, -1, -1, -1);
360+
}
358361
}
359362

360363
LOG_DBG("recalculate the cached logits (check): embd_inp.size() %zu, n_matching_session_tokens %zu, embd_inp.size() %zu, session_tokens.size() %zu\n",

0 commit comments

Comments
 (0)