Commit 027f7bc
fix: Avoid duplicate special tokens in chat formats (#1439)
* Templates sometimes have BOS in them, remove duplicate
* tokenize chat format prompts before completion
This is to ensure that we don't duplicate any special tokens.
Hopefully I amended the existing formats correctly?
* updated comment
* corrected a few
* add some missing internals
* proper bos/eos detection
* just let tokenizer do the job
* typo--
* align test with new response
* changed to a warning
* move to another PR
* Use python warnings module
---------
Co-authored-by: Andrei Betlen <[email protected]>1 parent 951e39c commit 027f7bc
File tree
4 files changed
+25
-10
lines changed- llama_cpp
- tests
4 files changed
+25
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
145 | 153 | | |
146 | 154 | | |
147 | 155 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
1019 | 1020 | | |
1020 | 1021 | | |
1021 | 1022 | | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
1022 | 1029 | | |
1023 | 1030 | | |
1024 | 1031 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
| 163 | + | |
163 | 164 | | |
164 | 165 | | |
165 | 166 | | |
| |||
232 | 233 | | |
233 | 234 | | |
234 | 235 | | |
235 | | - | |
| 236 | + | |
236 | 237 | | |
237 | 238 | | |
238 | 239 | | |
| |||
548 | 549 | | |
549 | 550 | | |
550 | 551 | | |
551 | | - | |
| 552 | + | |
552 | 553 | | |
553 | 554 | | |
554 | 555 | | |
| |||
655 | 656 | | |
656 | 657 | | |
657 | 658 | | |
658 | | - | |
| 659 | + | |
659 | 660 | | |
660 | 661 | | |
661 | 662 | | |
| |||
708 | 709 | | |
709 | 710 | | |
710 | 711 | | |
711 | | - | |
| 712 | + | |
712 | 713 | | |
713 | 714 | | |
714 | 715 | | |
| |||
918 | 919 | | |
919 | 920 | | |
920 | 921 | | |
921 | | - | |
| 922 | + | |
922 | 923 | | |
923 | 924 | | |
924 | 925 | | |
| |||
940 | 941 | | |
941 | 942 | | |
942 | 943 | | |
943 | | - | |
944 | 944 | | |
945 | 945 | | |
946 | 946 | | |
947 | | - | |
| 947 | + | |
948 | 948 | | |
949 | 949 | | |
950 | 950 | | |
| |||
1229 | 1229 | | |
1230 | 1230 | | |
1231 | 1231 | | |
1232 | | - | |
1233 | 1232 | | |
1234 | 1233 | | |
1235 | | - | |
| 1234 | + | |
1236 | 1235 | | |
1237 | 1236 | | |
1238 | 1237 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | | - | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
0 commit comments