Skip to content

Conversation

@ggerganov
Copy link
Member

fix #9606

Upon vocab construction, iterate over all tokens and store all that look like a token that might cause an "end of generation" event (e.g. <EOT>, <endoftext>, <im_end>, etc.). llama_token_is_eog will now check this set of tokens to determine the EOG status.

Detected EOG tokens are printed like this (Qwen2.5-Coder):

0.00.190.685 I llm_load_print_meta: model size       = 14.19 GiB (16.00 BPW) 
0.00.190.685 I llm_load_print_meta: general.name     = Qwen2.5 Coder 7B Instruct
0.00.190.685 I llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
0.00.190.686 I llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
0.00.190.686 I llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
0.00.190.686 I llm_load_print_meta: LF token         = 148848 'ÄĬ'
0.00.190.696 I llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
0.00.190.697 I llm_load_print_meta: EOG token        = 151643 '<|endoftext|>'
0.00.190.697 I llm_load_print_meta: EOG token        = 151645 '<|im_end|>'
0.00.190.698 I llm_load_print_meta: max token length = 256
0.00.190.734 I llm_load_tensors: ggml ctx size =    0.30 MiB

This is yet another hack for handling end-of-... tokens. The best way to fix this is to have proper tokenizer configurations, but as discussed in #9606, this is unlikely to happen.

@tristandruyen
Copy link
Contributor

Seems to fix Qwen2.5-Coder at least #9606 (comment)

@ggerganov ggerganov merged commit 31ac583 into master Sep 24, 2024
59 checks passed
@ggerganov ggerganov deleted the gg/eog-ids branch September 24, 2024 07:16
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Qwen2.5-Coder variants do not properly stop in FIM mode

3 participants