Commit 2f61c0f
llama-cli: prevent spurious assistant token (ggml-org#16202)
* tools/main: llama-cli: prevent spurious assistant token (ggml-org#13402)
During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.
Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.
Fixes ggml-org#13402.
Signed-off-by: Vinkal Chudgar <[email protected]>
* Update tools/main/main.cpp
Co-authored-by: Sigbjørn Skjæret <[email protected]>
* tools/main: remove outdated comment
Signed-off-by: Vinkal Chudgar <[email protected]>
---------
Signed-off-by: Vinkal Chudgar <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>1 parent 3ffd0fa commit 2f61c0f
1 file changed
+4
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
707 | 707 | | |
708 | 708 | | |
709 | 709 | | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
710 | 714 | | |
711 | 715 | | |
712 | 716 | | |
| |||
824 | 828 | | |
825 | 829 | | |
826 | 830 | | |
827 | | - | |
828 | 831 | | |
829 | | - | |
830 | | - | |
831 | | - | |
832 | 832 | | |
833 | 833 | | |
834 | 834 | | |
| |||
0 commit comments