Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions src/llama-memory-recurrent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos
p1 = std::numeric_limits<llama_pos>::max();
}

// models like Mamba or RWKV can't have a state partially erased
// models like Mamba or RWKV can't have a state partially erased at the end
// of the sequence because their state isn't preserved for previous tokens
if (seq_id >= (int64_t) size) {
// could be fatal
return false;
Expand All @@ -160,8 +161,8 @@ bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos
int32_t & tail_id = cells[seq_id].tail;
if (tail_id >= 0) {
const auto & cell = cells[tail_id];
// partial intersection is invalid
if ((0 < p0 && p0 < cell.pos) || (0 < p1 && p1 <= cell.pos)) {
// partial intersection is invalid if it includes the final pos
if ((0 < p0 && p0 <= cell.pos && p1 > cell.pos)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check strictly larger than 0 rather than 0 <= p0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the range includes the first token and extends beyond the last token, it's no longer a partial intersection and can simply clear the whole sequence which is a valid operation here since the answer is just to zero-out everything. The case this is trying to catch is an attempt to go part-way back in the sequence which is invalid since those states are not preserved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check strictly larger than 0 rather than 0 <= p0?

Because when p0 == 0 (and p1 > cell.pos), then the sequence can be removed entirely, since there is no partial intersection.

p0 larger than 0 means there's at least one remaining token in the context, which would require state rollback.

Copy link
Collaborator Author

@gabe-l-hart gabe-l-hart Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(jinx!)

((edit: I hope that translates correctly as "we said the same thing at the same time" to people that don't have little kids))

Copy link
Collaborator

@CISC CISC Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snap! is the British equivalent. :)

Edit: (and for some reason skylder meg en kald cola (owe me a cold coke) the Norwegian one)

//printf("[DEBUG] inside `llama_memory_recurrent::seq_rm`: partial intersection is invalid, so returning false\n");
return false;
}
Expand Down
5 changes: 4 additions & 1 deletion tools/main/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,10 @@ int main(int argc, char ** argv) {
}

// remove any "future" tokens that we might have inherited from the previous session
llama_memory_seq_rm(mem, -1, n_matching_session_tokens, -1);
if (!llama_memory_seq_rm(mem, -1, n_matching_session_tokens, -1)) {
LOG_INF("%s: unable to resuse common prefix\n", __func__);
llama_memory_seq_rm(mem, -1, -1, -1);
Comment on lines 357 to 360
Copy link
Collaborator

@compilade compilade Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this will correctly trigger reevaluation of the context afterwards.

n_matching_session_tokens seems to be used mostly for cosmetic purposes except when it's equal to embd_inp.size(), so maybe it could be set to 0.

But I think there might be something else necessary here. (For example, should session_tokens be cleared?)

(This is mostly relevant when reloading a saved session, and not necessarily during context shifting, though)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah, I'm still a little murky on the logic going on here. I'll look more carefully and see what I can find.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry folks,
I am new to merge management of commits.
b8f9c56923ab34bf24d4fe5476efe9b02736918d
Don't know who and when presses merge button for the commit after all checks are passed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're holding on this until I resolve the question in this thread. I'll try to get to that today!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you're getting at here with the clause below that recalculates the cached logits for the final token. I think the right thing to do here is to set n_matching_session_tokens to 0 since we're essentially saying there are no matching tokens anymore.

}
}

LOG_DBG("recalculate the cached logits (check): embd_inp.size() %zu, n_matching_session_tokens %zu, embd_inp.size() %zu, session_tokens.size() %zu\n",
Expand Down
Loading