Skip to content

Commit bcf14fd

Browse files
tools/main: llama-cli: prevent spurious assistant token (#13402)
During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece. Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged. Fixes #13402. Signed-off-by: Vinkal Chudgar <[email protected]>
1 parent 138c87c commit bcf14fd

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

tools/main/main.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -706,6 +706,10 @@ int main(int argc, char ** argv) {
706706
// LOG_DBG("last: %s\n", string_from(ctx, smpl->prev.to_vector()).c_str());
707707

708708
embd.push_back(id);
709+
710+
if (params.conversation_mode && !waiting_for_first_input && !llama_vocab_is_eog(vocab, id)) {
711+
assistant_ss << common_token_to_piece(ctx, id, false);
712+
}
709713

710714
// echo this to console
711715
input_echo = true;
@@ -826,9 +830,6 @@ int main(int argc, char ** argv) {
826830

827831
// if current token is not EOG, we add it to current assistant message
828832
if (params.conversation_mode && !waiting_for_first_input) {
829-
const auto id = common_sampler_last(smpl);
830-
assistant_ss << common_token_to_piece(ctx, id, false);
831-
832833
if (!prompt.empty()) {
833834
prompt.clear();
834835
is_interacting = false;

0 commit comments

Comments
 (0)