You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* tools/main: llama-cli: prevent spurious assistant token (ggml-org#13402)
During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.
Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.
Fixesggml-org#13402.
Signed-off-by: Vinkal Chudgar <[email protected]>
* Update tools/main/main.cpp
Co-authored-by: Sigbjørn Skjæret <[email protected]>
* tools/main: remove outdated comment
Signed-off-by: Vinkal Chudgar <[email protected]>
---------
Signed-off-by: Vinkal Chudgar <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
0 commit comments