Skip to content

Latest commit

 

History

History
578 lines (559 loc) · 139 KB

File metadata and controls

578 lines (559 loc) · 139 KB

Report for google/codegemma-7b

Model info

  • Model Info:
    • Tied embeddings: True
    • LM head uses bias: False
    • Embeddings shape: [256000, 3072]
  • Tokenizer Info:
    • Vocab Size: 256000
    • Tokenizer Class: GemmaTokenizer
    • Tokenizer Type: BPE
    • Bytes handling: Byte Fallback
    • Token for verification prompt building: TouchableOpacity
    • Token id for verification prompt building: 39886
  • Indicator summary:
    • Indicator for under-trained tokens: E_{out} Cosine Distance
    • Overall distribution: 0.104 +/- 0.044
  • Detected Token Counts:
    • Number of tested under-trained tokens: 5117, 5015 non-special, 834 below p = 0.01 threshold, 259 below soft indicator threshold
    • Number of single byte tokens: 380, of which 144 below indicator threshold
    • Number of special tokens: 1, of which 1 below indicator threshold
    • Number of non-single-byte unreachable tokens: 1, of which 1 below indicator threshold

Under-trained token indicators plot

Indicators scatter plots

Verification plot

Verification plot

Under-trained token verification results

259 entries below threshold of 0.001

token_id token indicator max_prob in_other_tokens
229433 ^(@)$_ 5.33462e-05 0.00041
164525 हिंदीखरीदारी 5.38826e-05 0.00038
196609 \u200cآمباردا 5.72205e-05 0.00041 ▁ویکی\u200cآمباردا
134910 ſammen 6.634e-05 0.00037 ▁zuſammen
127237 ▁coachTry 7.39098e-05 0.00063
213138 ſſung 7.67708e-05 0.00039
121349 ▁AcceptedLoading 7.75456e-05 0.00044
59098 EnglishChoose 8.11219e-05 0.00046 ▁EnglishChoose
185507 ▁queſto 8.22544e-05 0.00032
222309 ▁queſta 8.29697e-05 0.00035
225573 ▁Geiſt 8.29697e-05 0.00041
158454 ▁unſer 8.32677e-05 0.00037
216622 ▁Dieſe 8.5175e-05 0.00041
91282 ▁ſelb 8.86917e-05 0.00035 ▁ſelbſt
227644 ▁ſeines 8.92282e-05 0.00051
220218 ▁ſehen 8.97646e-05 0.00033
184138 ▁zuſammen 9.01818e-05 0.00051
121705 ▁ſondern 9.06587e-05 0.00037
252915 \uf3f5 9.23872e-05 0.0005
210616 ▁geweſen 9.38177e-05 0.00037
239 additional entries below threshold
token_id token indicator max_prob in_other_tokens
255245 \uf3cc 9.41753e-05 0.0005
161080 ▁ſeyn 0.000103295 0.00041
230983 ▁wiſſen 0.000107765 0.00037
123984 ▁ſeinen 0.000122726 0.0005
192547 ▁erſten 0.000123262 0.00042
174176 ▁ſoll 0.000125945 0.00055
203019 ▁daſs 0.000127614 0.0005
148617 ▁deſſen 0.000129461 0.00037
113990 ▁ſehr 0.000136435 0.00043
143114 ▁ſeinem 0.000140607 0.00046
151521 ▁müſſen 0.000141859 0.00039
254455 \ued90 0.000142813 0.00065
254175 𐁘 0.000143409 0.00054
153473 ▁Menſchen 0.000144064 0.00057
173899 ▁メンテナ 0.000145495 0.00047 ▁メンテナンス
123221 >\<^ 0.000145912 0.00051
42380 ▁stockbild 0.000146747 0.00061 ▁stockbilder
193385 iſen 0.000151336 0.00032
255011 𓇠 0.000153184 0.00063
195121 ▁Waſſer 0.000155449 0.00038
224365 ikusbot 0.000155807 0.00041 haikusbot
254350 \uf5ce 0.000155985 0.00077
151848 ▁ſei 0.000158429 0.00034 ▁ſeines
143473 )$_. 0.00015986 0.0005
233201 ▁Weiſe 0.000163913 0.00043
167982 ▁stockfotografie 0.000165462 0.00034
128625 ▁dieſem 0.000167489 0.00039
254071 \uef5a 0.000168681 0.00076
153064 ▁stockbilder 0.000170112 0.00042
109547 ▁ſchon 0.000171363 0.00031
96098 ▁ſelbſt 0.000176013 0.0004
232866 ▁stiefe 0.000176787 0.00045
45971 ▁linkCC 0.00018084 0.0008
255807 𝆣 0.000182271 0.00039
97619 ▁ſeiner 0.000183225 0.00041
195351 niſſe 0.000186503 0.00039
123190 ſelben 0.000188887 0.00039
202616 ▁erſt 0.000191152 0.00033
254591 \u0e72 0.000191629 0.00047
254908 𖧹 0.000195503 0.00093
172465 iſche 0.000196278 0.00039 ▁zwiſchen
255645 \uef0e 0.000199676 0.00097
159234 ſehen 0.000200033 0.00056 ▁ſehen
136616 ▁verſch 0.000207007 0.00034
75807 ▁dieſe 0.000207543 0.00032 ▁dieſer, ▁dieſes, ▁dieſem, ▁dieſen
167630 ▁PeEn 0.000211477 0.00047 ▁PeEnEo
255267 \u0e63 0.000211477 0.0005
125919 Билгалда 0.000212669 0.00049 Билгалдахарш
2873 ICTOGRAM 0.000216663 0.00044 ▁PICTOGRAM, PICTOGRAM
254944 0.000219047 0.001
199696 ſicht 0.000219405 0.00036
135639 ▁dieſen 0.000221372 0.00045
255510 \ue51e 0.000222564 0.00061
253034 \uf7a0 0.000223279 0.00043
208438 ▁ſuo 0.000227094 0.00072
155980 ▁beſch 0.000231504 0.00065
255154 0.000235736 0.00093
255647 \uf35e 0.000237942 0.00046
89379 ▁ſeine 0.00023824 0.00036 ▁ſeiner, ▁ſeinen, ▁ſeinem, ▁ſeines
255122 \uf540 0.000238776 0.00087
255849 0.000241101 0.0011
208306 ▁beſte 0.000241697 0.0005
250800 \u0ba1 0.000243902 0.0005
251499 0.00024581 0.00093
206857 ▁tartalo 0.000246823 0.00045 ▁tartalomajánló
255795 \uec4c 0.000252128 0.00069
254885 0.0002563 0.00078
118456 ロウィン 0.000258029 0.00037 ハロウィン, ▁ハロウィン
108162 久しぶ 0.000259101 0.00033 久しぶりに, 久しぶり, 久しぶりの
177069 ▁티즈 0.000260115 0.00088
225539 isGridAdvEx 0.000263393 0.00059
253613 \U000e0041 0.000275612 0.0017
171300 rbrakk 0.000284612 0.0018
120213 iſchen 0.000284791 0.00062 ▁zwiſchen
88138 ſchaft 0.00028646 0.00036
198203 ▁zwiſchen 0.000286579 0.00035
252631 \uf51a 0.000288665 0.0011
114402 ▁Geſch 0.00029093 0.00057
80527 ▁dieſer 0.000291049 0.00023
128951 ▁laſſen 0.000295281 0.00037
200906 ▁ſua 0.000299692 0.0007
255242 \ue6f0 0.000322819 0.00084
171654 lbrakk 0.000323534 0.0017
254456 \uefa6 0.000326395 0.00056
214340 ▁パンチラ 0.000329971 0.00054
181784 ▁་་ 0.000331998 0.00044
254549 0.000342607 0.00042
255420 0.000347137 0.0012
255279 0.000349283 0.0024
251525 \ueae4 0.000352442 0.00064
253441 \ue984 0.00035584 0.0015
254258 \ue5d0 0.000359654 0.0024
255790 \ue734 0.000364721 0.0034
253247 0.000369549 0.0018
169039 ▁ſche 0.000375509 0.00072
252436 0.000377715 0.0014
254686 𑄮 0.000397742 0.014
252790 0.000400186 0.0028
150747 ſcher 0.000401437 0.00066
207398 ▁plufieurs 0.000404358 0.00061
255271 0.000405133 0.0016
253030 0.000405729 0.00034
254566 \ue776 0.000413954 0.0026
68314 ▁例证 0.000421643 0.00069
167294 ▁GoogleContinue 0.000425756 0.0016
255705 0.000433445 0.00059
209936 ▁展板 0.00043422 0.0013
255379 \uf2ba 0.000443161 0.0011
253828 0.000446856 0.00042
152266 ▁imagui 0.000448346 0.00092
253758 0.000448406 0.003
249717 0.000449479 0.0013
253510 \uec1d 0.000449836 0.00047
220260 ▁beſti 0.000458598 0.00029
253523 \U000900b0 0.000458777 0.001
255123 𑄥 0.000458956 0.014
254486 0.000460088 0.0007
255806 𑄠 0.000461161 0.0085
205674 нгред 0.000469267 0.001 нгредіє, нгредієнти
220916 ▁vooz 0.000470042 0.0004
182427 )$_, 0.000473917 0.0014
252858 0.000474215 0.0022
225065 bildtitel 0.000480175 0.00099
250433 0.000481844 0.00047
255248 𐑥 0.000482202 0.0043
253027 0.000490725 0.00038
253927 0.000492513 0.0005
255955 \ue6ec 0.000492632 0.00083
116882 ▁geſch 0.000492692 0.00041
187776 ▁Verſ 0.000500202 0.00052
254349 \uf412 0.000504732 0.0053
255517 𑄝 0.00051713 0.042
253926 0.000518322 0.00071
254574 𖡻 0.000521541 0.00037
255792 \ue762 0.000522673 0.0097
253904 𑄣 0.000530243 0.03
255380 \uf8e0 0.000531971 0.0032
255124 𑄪 0.000533938 0.04
253326 0.00053668 0.0042
72920 ▁ſind 0.000537097 0.00062
254565 \ue67b 0.000541568 0.0028
255376 \ueb9a 0.000542223 0.0013
254600 0.000548542 0.0064
251632 𑄨 0.000549674 0.045
255814 𞤑 0.000553846 0.0016
64069 ディネート 0.000560284 0.00068 ▁コーディネート, コーディネート
195112 ▁好文分享 0.000562727 0.0014
251560 \ue978 0.000565708 0.012
176775 ▁盗撮 0.000566781 0.0015
254482 \u0bab 0.000567257 0.00054
159995 ▁剪影 0.000569046 0.001
72182 ▁版税 0.00056982 0.0015
254213 0.000570416 0.0067
253103 𑄚 0.000576437 0.055
255275 0.000579834 0.0021
253104 𓆱 0.00058198 0.0024
249784 ܞ 0.000586271 0.0017
249361 0.000593662 0.0028
252852 🜲 0.00059402 0.0012
248337 \uf21d 0.000595629 0.0038
75991 ▁indígen 0.000595808 0.00058 ▁indígenas, ▁indígena
172769 征詢我 0.000595868 0.0013
252567 0.000598133 0.0075
254798 \ufe67 0.000598967 0.0012
206788 majánló 0.000610769 0.0011 ▁tartalomajánló
134830 往下閱讀 0.000616968 0.00094 請繼續往下閱讀
25269 NdEx 0.000624597 0.0026 iNdEx, ▁iNdEx
171222 征詢 0.000624597 0.0003 征詢我
253706 0.00062716 0.00083
254076 𑄟 0.000629663 0.05
196059 باردا 0.00063169 0.00075 \u200cآمباردا, ▁ویکی\u200cآمباردا
252966 𝆺 0.000635505 0.00038
253052 \u0bc4 0.000639737 0.00039
254460 0.000640333 0.0049
251496 \u0ba5 0.000642717 0.00048
248691 0.000646532 0.00096
141456 isOraColElement 0.000647962 0.0005
253187 ݯ 0.000648081 0.003
112171 Diwed 0.000649452 0.00076 Diwedd, Diweddar, Diweddarwch
255663 \U000f023b 0.000651777 0.0016
131560 ▁desmotivaciones 0.000654697 0.0014
248384 0.000655651 0.0028
254114 0.000656962 0.018
255953 \ue65a 0.000659168 0.0053
254911 𞤶 0.000659883 0.0022
254075 𑄇 0.000660419 0.027
140439 ▁stockfotos 0.000660539 0.0027
255934 0.000660837 0.0017
253841 𑄬 0.000679135 0.034
252372 𑄢 0.000679672 0.12
253901 \ue676 0.000681221 0.0032
255389 𞤼 0.000682354 0.003
253992 \ue7b5 0.000682712 0.0066
253371 0.000684977 0.0016
254484 0.000692546 0.0024
250918 0.000697255 0.0013
247641 ܇ 0.000697851 0.0084
253511 \uf563 0.000710249 0.0023
136017 ▁简谱 0.000711441 0.00084
251670 0.000712276 0.007
251965 \u0bc5 0.000725448 0.00044
255382 0.000733137 0.0082
255728 0.000734329 0.00099
248911 \ue5f1 0.000735939 0.0016
35321 ſchen 0.000737846 0.00069 iſchen, ▁Menſchen, ▁zwiſchen
254626 0.000748098 0.0014
250185 0.000757098 0.0047
254270 𞤴 0.000760078 0.0044
254573 𑄃 0.000764251 0.042
250887 0.000768125 0.039
252083 \uf565 0.000769138 0.0035
245817 \U00071706 0.000769556 0.00032
247780 0.00077337 0.0029
254833 0.000775993 0.0032
65939 \<^ 0.000787199 0.00094 >\<^
255018 𞥄 0.000792086 0.0054
253460 \u0b8b 0.000793457 0.00059
251778 0.000798047 0.019
253075 0.000814557 0.00083
90675 ▁Geſ 0.000814974 0.00033 ▁Geſch
251780 0.000816703 0.0064
247445 0.000817895 0.0058
180346 ſſo 0.000818133 0.00093
255439 0.000823796 0.00083
248619 0.000827789 0.00059
254496 0.000828981 0.003
115459 ſem 0.000835717 0.00045 ▁dieſem
129755 ſam 0.000837982 0.0025 ſammen, ▁ſame, ▁zuſammen
212547 ▁Pardavimas 0.000844717 0.0039
114373 ▁témoig 0.000846148 0.0011 ▁témoignage, ▁témoignages
139931 Дерекк 0.000846386 0.0084 Дереккөздер
254927 0.000846386 0.016
253723 0.000847101 0.00094
252682 \uf55f 0.000851274 0.0029
32602 ▁ſich 0.000852525 0.00082
254903 \ue66e 0.000857472 0.0016
254089 \u0e6c 0.000862837 0.00068
176309 enablog 0.000863552 0.0054 hatenablog
115666 ▁verſ 0.000863731 0.00089 ▁verſch

Byte tokens

144 entries below threshold of 0.002

token_id token indicator ord hex byte_type reencoded
313 <0x60> 4.85778e-05 96 0x60 ascii 235376: `
300 <0x53> 4.88758e-05 83 0x53 ascii 235277: S
225 <0x08> 4.97699e-05 8 0x08 ascii 245584: \x08
412 <0xC3> 4.98891e-05 195 0xC3 utf8
466 <0xF9> 4.99487e-05 249 0xF9 unused_utf8
265 <0x30> 5.01275e-05 48 0x30 ascii 235276: 0
317 <0x64> 5.02467e-05 100 0x64 ascii 235258: d
278 <0x3D> 5.03659e-05 61 0x3D ascii 235293: =
292 <0x4B> 5.04851e-05 75 0x4B ascii 235333: K
299 <0x52> 5.04851e-05 82 0x52 ascii 235294: R
315 <0x62> 5.06639e-05 98 0x62 ascii 235268: b
219 <0x02> 5.07236e-05 2 0x02 ascii 247977: \x02
266 <0x31> 5.08428e-05 49 0x31 ascii 235274: 1
258 <0x29> 5.0962e-05 41 0x29 ascii 235275: )
263 <0x2E> 5.10812e-05 46 0x2E ascii 235265: .
284 <0x43> 5.11408e-05 67 0x43 ascii 235288: C
230 <0x0D> 5.13792e-05 13 0x0D ascii 235316: \r
252 <0x23> 5.13792e-05 35 0x23 ascii 235345: #
323 <0x6A> 5.14388e-05 106 0x6A ascii 235312: j
248 <0x1F> 5.1558e-05 31 0x1F ascii 251698: \x1f
124 additional entries below threshold
token_id token indicator ord hex byte_type reencoded
325 <0x6C> 5.1558e-05 108 0x6C ascii 235257: l
330 <0x71> 5.17964e-05 113 0x71 ascii 235317: q
262 <0x2D> 5.1856e-05 45 0x2D ascii 235290: -
289 <0x48> 5.19156e-05 72 0x48 ascii 235314: H
264 <0x2F> 5.19753e-05 47 0x2F ascii 235283: /
307 <0x5A> 5.19753e-05 90 0x5A ascii 235382: Z
232 <0x0F> 5.20945e-05 15 0x0F ascii 249949: \x0f
310 <0x5D> 5.21541e-05 93 0x5D ascii 235307: ]
277 <0x3C> 5.22137e-05 60 0x3C ascii 235322: <
282 <0x41> 5.22137e-05 65 0x41 ascii 235280: A
236 <0x13> 5.23329e-05 19 0x13 ascii 252752: \x13
316 <0x63> 5.23329e-05 99 0x63 ascii 235260: c
257 <0x28> 5.23925e-05 40 0x28 ascii 235278: (
296 <0x4F> 5.23925e-05 79 0x4F ascii 235302: O
309 <0x5C> 5.23925e-05 92 0x5C ascii 235286: \
283 <0x42> 5.24521e-05 66 0x42 ascii 235305: B
293 <0x4C> 5.24521e-05 76 0x4C ascii 235301: L
222 <0x05> 5.25117e-05 5 0x05 ascii 250940: \x05
244 <0x1B> 5.25713e-05 27 0x1B ascii 242385: \x1b
270 <0x35> 5.26309e-05 53 0x35 ascii 235308: 5
276 <0x3B> 5.26309e-05 59 0x3B ascii 235289: ;
280 <0x3F> 5.28097e-05 63 0x3F ascii 235336: ?
312 <0x5F> 5.28097e-05 95 0x5F ascii 235298: _
340 <0x7B> 5.28097e-05 123 0x7B ascii 235282: {
301 <0x54> 5.28693e-05 84 0x54 ascii 235279: T
333 <0x74> 5.29885e-05 116 0x74 ascii 235251: t
250 <0x21> 5.30481e-05 33 0x21 ascii 235341: !
335 <0x76> 5.30481e-05 118 0x76 ascii 235272: v
228 <0x0B> 5.31077e-05 11 0x0B ascii 249154: \x0b
274 <0x39> 5.31673e-05 57 0x39 ascii 235315: 9
290 <0x49> 5.31673e-05 73 0x49 ascii 235285: I
304 <0x57> 5.32866e-05 87 0x57 ascii 235325: W
332 <0x73> 5.32866e-05 115 0x73 ascii 235256: s
231 <0x0E> 5.3525e-05 14 0x0E ascii 252689: \x0e
336 <0x77> 5.3525e-05 119 0x77 ascii 235271: w
251 <0x22> 5.35846e-05 34 0x22 ascii 235281: "
319 <0x66> 5.35846e-05 102 0x66 ascii 235266: f
238 <0x15> 5.37038e-05 21 0x15 ascii 253776: \x15
241 <0x18> 5.37038e-05 24 0x18 ascii 250600: \x18
249 <0x20> 5.37038e-05 32 0x20 ascii 235248:
422 <0xCD> 5.37634e-05 205 0xCD utf8
271 <0x36> 5.3823e-05 54 0x36 ascii 235318: 6
302 <0x55> 5.38826e-05 85 0x55 ascii 235327: U
320 <0x67> 5.38826e-05 103 0x67 ascii 235264: g
334 <0x75> 5.40018e-05 117 0x75 ascii 235261: u
342 <0x7D> 5.40018e-05 125 0x7D ascii 235270: }
409 <0xC0> 5.40018e-05 192 0xC0 unused_utf8
295 <0x4E> 5.40614e-05 78 0x4E ascii 235300: N
259 <0x2A> 5.4121e-05 42 0x2A ascii 235287: *
285 <0x44> 5.4121e-05 68 0x44 ascii 235299: D
267 <0x32> 5.41806e-05 50 0x32 ascii 235284: 2
318 <0x65> 5.41806e-05 101 0x65 ascii 235249: e
467 <0xFA> 5.41806e-05 250 0xFA unused_utf8
255 <0x26> 5.42402e-05 38 0x26 ascii 235343: &
343 <0x7E> 5.43594e-05 126 0x7E ascii 235436: ~
275 <0x3A> 5.47171e-05 58 0x3A ascii 235292: :
303 <0x56> 5.47171e-05 86 0x56 ascii 235330: V
308 <0x5B> 5.47171e-05 91 0x5B ascii 235309: [
234 <0x11> 5.47767e-05 17 0x11 ascii 253614: \x11
470 <0xFD> 5.48363e-05 253 0xFD unused_utf8
233 <0x10> 5.48959e-05 16 0x10 ascii 248775: \x10
305 <0x58> 5.49555e-05 88 0x58 ascii 235356: X
326 <0x6D> 5.49555e-05 109 0x6D ascii 235262: m
471 <0xFE> 5.49555e-05 254 0xFE unused_utf8
339 <0x7A> 5.50151e-05 122 0x7A ascii 235306: z
414 <0xC5> 5.50747e-05 197 0xC5 utf8
465 <0xF8> 5.51343e-05 248 0xF8 unused_utf8
235 <0x12> 5.51939e-05 18 0x12 ascii 252232: \x12
268 <0x33> 5.52535e-05 51 0x33 ascii 235304: 3
464 <0xF7> 5.52535e-05 247 0xF7 unused_utf8
472 <0xFF> 5.53131e-05 255 0xFF unused_utf8
254 <0x25> 5.54323e-05 37 0x25 ascii 235358: %
281 <0x40> 5.54323e-05 64 0x40 ascii 235348: @
227 <0x0A> 5.54919e-05 10 0x0A ascii 108: \n
247 <0x1E> 5.54919e-05 30 0x1E ascii 253777: \x1e
311 <0x5E> 5.54919e-05 94 0x5E ascii 235393: ^
287 <0x46> 5.55515e-05 70 0x46 ascii 235311: F
243 <0x1A> 5.56111e-05 26 0x1A ascii 243931: \x1a
298 <0x51> 5.56111e-05 81 0x51 ascii 235368: Q
331 <0x72> 5.56707e-05 114 0x72 ascii 235255: r
237 <0x14> 5.57899e-05 20 0x14 ascii 250861: \x14
229 <0x0C> 5.59092e-05 12 0x0C ascii 238092: \x0c
288 <0x47> 5.59092e-05 71 0x47 ascii 235319: G
223 <0x06> 5.60284e-05 6 0x06 ascii 251368: \x06
272 <0x37> 5.61476e-05 55 0x37 ascii 235324: 7
306 <0x59> 5.62668e-05 89 0x59 ascii 235342: Y
245 <0x1C> 5.63264e-05 28 0x1C ascii 255818: \x1c
337 <0x78> 5.63264e-05 120 0x78 ascii 235297: x
279 <0x3E> 5.6386e-05 62 0x3E ascii 235313: >
273 <0x38> 5.65052e-05 56 0x38 ascii 235321: 8
468 <0xFB> 5.65052e-05 251 0xFB unused_utf8
220 <0x03> 5.65648e-05 3 0x03 ascii 249006: \x03
253 <0x24> 5.65648e-05 36 0x24 ascii 235323: $
291 <0x4A> 5.6684e-05 74 0x4A ascii 235338: J
218 <0x01> 5.69224e-05 1 0x01 ascii 238213: \x01
294 <0x4D> 5.71609e-05 77 0x4D ascii 235296: M
322 <0x69> 5.72205e-05 105 0x69 ascii 235252: i
341 <0x7C> 5.72205e-05 124 0x7C ascii 235371: |
246 <0x1D> 5.72801e-05 29 0x1D ascii 254363: \x1d
411 <0xC2> 5.72801e-05 194 0xC2 utf8
260 <0x2B> 5.73993e-05 43 0x2B ascii 235340: +
469 <0xFC> 5.76973e-05 252 0xFC unused_utf8
344 <0x7F> 5.77569e-05 127 0x7F ascii 244423: \x7f
462 <0xF5> 5.78165e-05 245 0xF5 unused_utf8
328 <0x6F> 5.78761e-05 111 0x6F ascii 235253: o
239 <0x16> 5.79953e-05 22 0x16 ascii 254362: \x16
286 <0x45> 5.81145e-05 69 0x45 ascii 235291: E
324 <0x6B> 5.81145e-05 107 0x6B ascii 235273: k
421 <0xCC> 5.81741e-05 204 0xCC utf8
242 <0x19> 5.84126e-05 25 0x19 ascii 254472: \x19
410 <0xC1> 5.84126e-05 193 0xC1 unused_utf8
256 <0x27> 5.84722e-05 39 0x27 ascii 235303: '
329 <0x70> 5.85318e-05 112 0x70 ascii 235263: p
463 <0xF6> 5.8651e-05 246 0xF6 unused_utf8
338 <0x79> 5.87106e-05 121 0x79 ascii 235267: y
269 <0x34> 5.9545e-05 52 0x34 ascii 235310: 4
327 <0x6E> 6.01411e-05 110 0x6E ascii 235254: n
224 <0x07> 6.02007e-05 7 0x07 ascii 249340: \x07
297 <0x50> 6.03199e-05 80 0x50 ascii 235295: P
314 <0x61> 6.04391e-05 97 0x61 ascii 235250: a
221 <0x04> 6.04987e-05 4 0x04 ascii 250124: \x04
261 <0x2C> 6.15716e-05 44 0x2C ascii 235269: ,
413 <0xC4> 6.17504e-05 196 0xC4 utf8
321 <0x68> 6.25849e-05 104 0x68 ascii 235259: h

Special tokens

102 entries below threshold of 0.002

token_id token indicator max_prob
12 <unused5> 4.8995e-05 0.00039
38 <unused31> 4.94123e-05 0.00042
85 <unused78> 4.94719e-05 0.00047
25 <unused18> 4.99487e-05 0.00046
55 <unused48> 5.02467e-05 0.00039
88 <unused81> 5.06043e-05 0.00042
97 <unused90> 5.07832e-05 0.00039
90 <unused83> 5.08428e-05 0.00033
11 <unused4> 5.10216e-05 0.00046
87 <unused80> 5.10216e-05 0.00042
14 <unused7> 5.12004e-05 0.00047
31 <unused24> 5.126e-05 0.0004
35 <unused28> 5.126e-05 0.00044
18 <unused11> 5.14388e-05 0.00043
74 <unused67> 5.14388e-05 0.00042
76 <unused69> 5.14388e-05 0.00048
100 <unused93> 5.16176e-05 0.00033
0 <pad> 5.17368e-05 7e-13
21 <unused14> 5.19156e-05 0.00036
104 <unused97> 5.19156e-05 0.00038
82 additional entries below threshold
token_id token indicator max_prob
66 <unused59> 5.20349e-05 0.00041
49 <unused42> 5.20945e-05 0.00043
62 <unused55> 5.21541e-05 0.0004
72 <unused65> 5.22137e-05 0.00044
23 <unused16> 5.22733e-05 0.00035
33 <unused26> 5.22733e-05 0.00043
91 <unused84> 5.22733e-05 0.00043
15 <unused8> 5.24521e-05 0.00042
58 <unused51> 5.24521e-05 0.00039
102 <unused95> 5.24521e-05 0.00042
78 <unused71> 5.25117e-05 0.00041
43 <unused36> 5.25713e-05 0.00044
75 <unused68> 5.25713e-05 0.00039
81 <unused74> 5.25713e-05 0.0004
103 <unused96> 5.26309e-05 0.00036
80 <unused73> 5.27501e-05 0.00043
42 <unused35> 5.30481e-05 0.00045
83 <unused76> 5.30481e-05 0.00042
92 <unused85> 5.32269e-05 0.0005
86 <unused79> 5.32866e-05 0.00041
105 <unused98> 5.32866e-05 0.00046
34 <unused27> 5.36442e-05 0.00044
79 <unused72> 5.3823e-05 0.00042
93 <unused86> 5.3823e-05 0.00031
27 <unused20> 5.40614e-05 0.00043
48 <unused41> 5.40614e-05 0.00043
52 <unused45> 5.42998e-05 0.00045
24 <unused17> 5.46575e-05 0.00039
71 <unused64> 5.47767e-05 0.00044
107 <end_of_turn> 5.47767e-05 0.00038
17 <unused10> 5.48363e-05 0.00041
40 <unused33> 5.48363e-05 0.00041
95 <unused88> 5.48363e-05 0.00046
22 <unused15> 5.48959e-05 0.00043
106 <start_of_turn> 5.48959e-05 0.00044
3 <unk> 5.49555e-05 0.00036
28 <unused21> 5.49555e-05 0.00049
73 <unused66> 5.49555e-05 0.00035
99 <unused92> 5.49555e-05 0.00045
36 <unused29> 5.51343e-05 0.00039
101 <unused94> 5.51939e-05 0.00054
13 <unused6> 5.52535e-05 0.00043
57 <unused50> 5.54919e-05 0.00038
61 <unused54> 5.55515e-05 0.00048
84 <unused77> 5.55515e-05 0.00038
37 <unused30> 5.56111e-05 0.00047
59 <unused52> 5.56111e-05 0.00041
94 <unused87> 5.56111e-05 0.00035
50 <unused43> 5.57303e-05 0.00042
26 <unused19> 5.59688e-05 0.00045
56 <unused49> 5.59688e-05 0.00039
10 <unused3> 5.60284e-05 0.00039
20 <unused13> 5.61476e-05 0.00044
98 <unused91> 5.62072e-05 0.00044
29 <unused22> 5.6386e-05 0.00043
82 <unused75> 5.6386e-05 0.00039
19 <unused12> 5.65052e-05 0.00044
6 [@BOS@] 5.6982e-05 0.00043
39 <unused32> 5.6982e-05 0.00043
54 <unused47> 5.6982e-05 0.00037
53 <unused46> 5.70416e-05 0.00043
65 <unused58> 5.70416e-05 0.00041
32 <unused25> 5.72205e-05 0.00039
46 <unused39> 5.72205e-05 0.00044
30 <unused23> 5.72801e-05 0.00042
47 <unused40> 5.72801e-05 0.00038
77 <unused70> 5.72801e-05 0.0004
16 <unused9> 5.73397e-05 0.00042
51 <unused44> 5.74589e-05 0.00043
96 <unused89> 5.75185e-05 0.0004
44 <unused37> 5.78761e-05 0.00041
89 <unused82> 5.79357e-05 0.00043
60 <unused53> 5.87106e-05 0.00047
45 <unused38> 5.9247e-05 0.00041
5 <2mass> 5.99623e-05 0.00042
63 <unused56> 6.06775e-05 0.00047
64 <unused57> 6.07967e-05 0.00051
41 <unused34> 6.91414e-05 0.00042
9 <unused2> 7.05123e-05 0.00051
8 <unused1> 7.18236e-05 0.00043
255999 <unused99> 0.000144482 0.00046
7 <unused0> 0.000176907 0.0013

Unreachable tokens

1 entries below threshold of 0.002

token_id token indicator reencoded
158576 ▁ссср 5.22137e-05 941: ▁с, 15497: сс, 235334: р