Report for google/codegemma-7b
Model Info:
Tied embeddings: True
LM head uses bias: False
Embeddings shape: [256000, 3072]
Tokenizer Info:
Vocab Size: 256000
Tokenizer Class: GemmaTokenizer
Tokenizer Type: BPE
Bytes handling: Byte Fallback
Token for verification prompt building: TouchableOpacity
Token id for verification prompt building: 39886
Indicator summary:
Indicator for under-trained tokens: E_{out} Cosine Distance
Overall distribution: 0.104 +/- 0.044
Detected Token Counts:
Number of tested under-trained tokens: 5117, 5015 non-special, 834 below p = 0.01 threshold, 259 below soft indicator threshold
Number of single byte tokens: 380, of which 144 below indicator threshold
Number of special tokens: 1, of which 1 below indicator threshold
Number of non-single-byte unreachable tokens: 1, of which 1 below indicator threshold
Under-trained token indicators plot
Under-trained token verification results
259 entries below threshold of 0.001
token_id
token
indicator
max_prob
in_other_tokens
229433
^(@)$_
5.33462e-05
0.00041
164525
हिंदीखरीदारी
5.38826e-05
0.00038
196609
\u200cآمباردا
5.72205e-05
0.00041
▁ویکی\u200cآمباردا
134910
ſammen
6.634e-05
0.00037
▁zuſammen
127237
▁coachTry
7.39098e-05
0.00063
213138
ſſung
7.67708e-05
0.00039
121349
▁AcceptedLoading
7.75456e-05
0.00044
59098
EnglishChoose
8.11219e-05
0.00046
▁EnglishChoose
185507
▁queſto
8.22544e-05
0.00032
222309
▁queſta
8.29697e-05
0.00035
225573
▁Geiſt
8.29697e-05
0.00041
158454
▁unſer
8.32677e-05
0.00037
216622
▁Dieſe
8.5175e-05
0.00041
91282
▁ſelb
8.86917e-05
0.00035
▁ſelbſt
227644
▁ſeines
8.92282e-05
0.00051
220218
▁ſehen
8.97646e-05
0.00033
184138
▁zuſammen
9.01818e-05
0.00051
121705
▁ſondern
9.06587e-05
0.00037
252915
\uf3f5
9.23872e-05
0.0005
210616
▁geweſen
9.38177e-05
0.00037
239 additional entries below threshold
token_id
token
indicator
max_prob
in_other_tokens
255245
\uf3cc
9.41753e-05
0.0005
161080
▁ſeyn
0.000103295
0.00041
230983
▁wiſſen
0.000107765
0.00037
123984
▁ſeinen
0.000122726
0.0005
192547
▁erſten
0.000123262
0.00042
174176
▁ſoll
0.000125945
0.00055
203019
▁daſs
0.000127614
0.0005
148617
▁deſſen
0.000129461
0.00037
113990
▁ſehr
0.000136435
0.00043
143114
▁ſeinem
0.000140607
0.00046
151521
▁müſſen
0.000141859
0.00039
254455
\ued90
0.000142813
0.00065
254175
𐁘
0.000143409
0.00054
153473
▁Menſchen
0.000144064
0.00057
173899
▁メンテナ
0.000145495
0.00047
▁メンテナンス
123221
>\<^
0.000145912
0.00051
42380
▁stockbild
0.000146747
0.00061
▁stockbilder
193385
iſen
0.000151336
0.00032
255011
𓇠
0.000153184
0.00063
195121
▁Waſſer
0.000155449
0.00038
224365
ikusbot
0.000155807
0.00041
haikusbot
254350
\uf5ce
0.000155985
0.00077
151848
▁ſei
0.000158429
0.00034
▁ſeines
143473
)$_.
0.00015986
0.0005
233201
▁Weiſe
0.000163913
0.00043
167982
▁stockfotografie
0.000165462
0.00034
128625
▁dieſem
0.000167489
0.00039
254071
\uef5a
0.000168681
0.00076
153064
▁stockbilder
0.000170112
0.00042
109547
▁ſchon
0.000171363
0.00031
96098
▁ſelbſt
0.000176013
0.0004
232866
▁stiefe
0.000176787
0.00045
45971
▁linkCC
0.00018084
0.0008
255807
𝆣
0.000182271
0.00039
97619
▁ſeiner
0.000183225
0.00041
195351
niſſe
0.000186503
0.00039
123190
ſelben
0.000188887
0.00039
202616
▁erſt
0.000191152
0.00033
254591
\u0e72
0.000191629
0.00047
254908
𖧹
0.000195503
0.00093
172465
iſche
0.000196278
0.00039
▁zwiſchen
255645
\uef0e
0.000199676
0.00097
159234
ſehen
0.000200033
0.00056
▁ſehen
136616
▁verſch
0.000207007
0.00034
75807
▁dieſe
0.000207543
0.00032
▁dieſer , ▁dieſes , ▁dieſem , ▁dieſen
167630
▁PeEn
0.000211477
0.00047
▁PeEnEo
255267
\u0e63
0.000211477
0.0005
125919
Билгалда
0.000212669
0.00049
Билгалдахарш
2873
ICTOGRAM
0.000216663
0.00044
▁PICTOGRAM , PICTOGRAM
254944
⪜
0.000219047
0.001
199696
ſicht
0.000219405
0.00036
135639
▁dieſen
0.000221372
0.00045
255510
\ue51e
0.000222564
0.00061
253034
\uf7a0
0.000223279
0.00043
208438
▁ſuo
0.000227094
0.00072
155980
▁beſch
0.000231504
0.00065
255154
ᦶ
0.000235736
0.00093
255647
\uf35e
0.000237942
0.00046
89379
▁ſeine
0.00023824
0.00036
▁ſeiner , ▁ſeinen , ▁ſeinem , ▁ſeines
255122
\uf540
0.000238776
0.00087
255849
⏔
0.000241101
0.0011
208306
▁beſte
0.000241697
0.0005
250800
\u0ba1
0.000243902
0.0005
251499
ྑ
0.00024581
0.00093
206857
▁tartalo
0.000246823
0.00045
▁tartalomajánló
255795
\uec4c
0.000252128
0.00069
254885
ꎬ
0.0002563
0.00078
118456
ロウィン
0.000258029
0.00037
ハロウィン, ▁ハロウィン
108162
久しぶ
0.000259101
0.00033
久しぶりに, 久しぶり, 久しぶりの
177069
▁티즈
0.000260115
0.00088
225539
isGridAdvEx
0.000263393
0.00059
253613
\U000e0041
0.000275612
0.0017
171300
rbrakk
0.000284612
0.0018
120213
iſchen
0.000284791
0.00062
▁zwiſchen
88138
ſchaft
0.00028646
0.00036
198203
▁zwiſchen
0.000286579
0.00035
252631
\uf51a
0.000288665
0.0011
114402
▁Geſch
0.00029093
0.00057
80527
▁dieſer
0.000291049
0.00023
128951
▁laſſen
0.000295281
0.00037
200906
▁ſua
0.000299692
0.0007
255242
\ue6f0
0.000322819
0.00084
171654
lbrakk
0.000323534
0.0017
254456
\uefa6
0.000326395
0.00056
214340
▁パンチラ
0.000329971
0.00054
181784
▁་་
0.000331998
0.00044
254549
ꊥ
0.000342607
0.00042
255420
⸏
0.000347137
0.0012
255279
ᦵ
0.000349283
0.0024
251525
\ueae4
0.000352442
0.00064
253441
\ue984
0.00035584
0.0015
254258
\ue5d0
0.000359654
0.0024
255790
\ue734
0.000364721
0.0034
253247
ྻ
0.000369549
0.0018
169039
▁ſche
0.000375509
0.00072
252436
ᅝ
0.000377715
0.0014
254686
𑄮
0.000397742
0.014
252790
ﮢ
0.000400186
0.0028
150747
ſcher
0.000401437
0.00066
207398
▁plufieurs
0.000404358
0.00061
255271
ྴ
0.000405133
0.0016
253030
콯
0.000405729
0.00034
254566
\ue776
0.000413954
0.0026
68314
▁例证
0.000421643
0.00069
167294
▁GoogleContinue
0.000425756
0.0016
255705
䊐
0.000433445
0.00059
209936
▁展板
0.00043422
0.0013
255379
\uf2ba
0.000443161
0.0011
253828
ꪼ
0.000446856
0.00042
152266
▁imagui
0.000448346
0.00092
253758
ﭥ
0.000448406
0.003
249717
༞
0.000449479
0.0013
253510
\uec1d
0.000449836
0.00047
220260
▁beſti
0.000458598
0.00029
253523
\U000900b0
0.000458777
0.001
255123
𑄥
0.000458956
0.014
254486
ᆤ
0.000460088
0.0007
255806
𑄠
0.000461161
0.0085
205674
нгред
0.000469267
0.001
нгредіє, нгредієнти
220916
▁vooz
0.000470042
0.0004
182427
)$_,
0.000473917
0.0014
252858
ྋ
0.000474215
0.0022
225065
bildtitel
0.000480175
0.00099
250433
㜵
0.000481844
0.00047
255248
𐑥
0.000482202
0.0043
253027
쎲
0.000490725
0.00038
253927
ᦺ
0.000492513
0.0005
255955
\ue6ec
0.000492632
0.00083
116882
▁geſch
0.000492692
0.00041
187776
▁Verſ
0.000500202
0.00052
254349
\uf412
0.000504732
0.0053
255517
𑄝
0.00051713
0.042
253926
᥀
0.000518322
0.00071
254574
𖡻
0.000521541
0.00037
255792
\ue762
0.000522673
0.0097
253904
𑄣
0.000530243
0.03
255380
\uf8e0
0.000531971
0.0032
255124
𑄪
0.000533938
0.04
253326
ྚ
0.00053668
0.0042
72920
▁ſind
0.000537097
0.00062
254565
\ue67b
0.000541568
0.0028
255376
\ueb9a
0.000542223
0.0013
254600
⅏
0.000548542
0.0064
251632
𑄨
0.000549674
0.045
255814
𞤑
0.000553846
0.0016
64069
ディネート
0.000560284
0.00068
▁コーディネート , コーディネート
195112
▁好文分享
0.000562727
0.0014
251560
\ue978
0.000565708
0.012
176775
▁盗撮
0.000566781
0.0015
254482
\u0bab
0.000567257
0.00054
159995
▁剪影
0.000569046
0.001
72182
▁版税
0.00056982
0.0015
254213
⸄
0.000570416
0.0067
253103
𑄚
0.000576437
0.055
255275
ᔢ
0.000579834
0.0021
253104
𓆱
0.00058198
0.0024
249784
ܞ
0.000586271
0.0017
249361
ྪ
0.000593662
0.0028
252852
🜲
0.00059402
0.0012
248337
\uf21d
0.000595629
0.0038
75991
▁indígen
0.000595808
0.00058
▁indígenas, ▁indígena
172769
征詢我
0.000595868
0.0013
252567
ﭔ
0.000598133
0.0075
254798
\ufe67
0.000598967
0.0012
206788
majánló
0.000610769
0.0011
▁tartalomajánló
134830
往下閱讀
0.000616968
0.00094
請繼續往下閱讀
25269
NdEx
0.000624597
0.0026
iNdEx, ▁iNdEx
171222
征詢
0.000624597
0.0003
征詢我
253706
ᔡ
0.00062716
0.00083
254076
𑄟
0.000629663
0.05
196059
باردا
0.00063169
0.00075
\u200cآمباردا , ▁ویکی\u200cآمباردا
252966
𝆺
0.000635505
0.00038
253052
\u0bc4
0.000639737
0.00039
254460
ﮈ
0.000640333
0.0049
251496
\u0ba5
0.000642717
0.00048
248691
ྰ
0.000646532
0.00096
141456
isOraColElement
0.000647962
0.0005
253187
ݯ
0.000648081
0.003
112171
Diwed
0.000649452
0.00076
Diwedd, Diweddar , Diweddarwch
255663
\U000f023b
0.000651777
0.0016
131560
▁desmotivaciones
0.000654697
0.0014
248384
ྛ
0.000655651
0.0028
254114
⸅
0.000656962
0.018
255953
\ue65a
0.000659168
0.0053
254911
𞤶
0.000659883
0.0022
254075
𑄇
0.000660419
0.027
140439
▁stockfotos
0.000660539
0.0027
255934
ꘋ
0.000660837
0.0017
253841
𑄬
0.000679135
0.034
252372
𑄢
0.000679672
0.12
253901
\ue676
0.000681221
0.0032
255389
𞤼
0.000682354
0.003
253992
\ue7b5
0.000682712
0.0066
253371
龸
0.000684977
0.0016
254484
༸
0.000692546
0.0024
250918
䊳
0.000697255
0.0013
247641
܇
0.000697851
0.0084
253511
\uf563
0.000710249
0.0023
136017
▁简谱
0.000711441
0.00084
251670
྾
0.000712276
0.007
251965
\u0bc5
0.000725448
0.00044
255382
ﮗ
0.000733137
0.0082
255728
琑
0.000734329
0.00099
248911
\ue5f1
0.000735939
0.0016
35321
ſchen
0.000737846
0.00069
iſchen , ▁Menſchen , ▁zwiſchen
254626
䡵
0.000748098
0.0014
250185
ྠ
0.000757098
0.0047
254270
𞤴
0.000760078
0.0044
254573
𑄃
0.000764251
0.042
250887
ﭜ
0.000768125
0.039
252083
\uf565
0.000769138
0.0035
245817
\U00071706
0.000769556
0.00032
247780
ྞ
0.00077337
0.0029
254833
⧪
0.000775993
0.0032
65939
\<^
0.000787199
0.00094
>\<^
255018
𞥄
0.000792086
0.0054
253460
\u0b8b
0.000793457
0.00059
251778
⬮
0.000798047
0.019
253075
呌
0.000814557
0.00083
90675
▁Geſ
0.000814974
0.00033
▁Geſch
251780
䊱
0.000816703
0.0064
247445
ꩻ
0.000817895
0.0058
180346
ſſo
0.000818133
0.00093
255439
娡
0.000823796
0.00083
248619
ྶ
0.000827789
0.00059
254496
⏡
0.000828981
0.003
115459
ſem
0.000835717
0.00045
▁dieſem
129755
ſam
0.000837982
0.0025
ſammen , ▁ſame , ▁zuſammen
212547
▁Pardavimas
0.000844717
0.0039
114373
▁témoig
0.000846148
0.0011
▁témoignage, ▁témoignages
139931
Дерекк
0.000846386
0.0084
Дереккөздер
254927
༐
0.000846386
0.016
253723
卝
0.000847101
0.00094
252682
\uf55f
0.000851274
0.0029
32602
▁ſich
0.000852525
0.00082
254903
\ue66e
0.000857472
0.0016
254089
\u0e6c
0.000862837
0.00068
176309
enablog
0.000863552
0.0054
hatenablog
115666
▁verſ
0.000863731
0.00089
▁verſch
144 entries below threshold of 0.002
token_id
token
indicator
ord
hex
byte_type
reencoded
313
<0x60>
4.85778e-05
96
0x60
ascii
235376: `
300
<0x53>
4.88758e-05
83
0x53
ascii
235277: S
225
<0x08>
4.97699e-05
8
0x08
ascii
245584: \x08
412
<0xC3>
4.98891e-05
195
0xC3
utf8
466
<0xF9>
4.99487e-05
249
0xF9
unused_utf8
265
<0x30>
5.01275e-05
48
0x30
ascii
235276: 0
317
<0x64>
5.02467e-05
100
0x64
ascii
235258: d
278
<0x3D>
5.03659e-05
61
0x3D
ascii
235293: =
292
<0x4B>
5.04851e-05
75
0x4B
ascii
235333: K
299
<0x52>
5.04851e-05
82
0x52
ascii
235294: R
315
<0x62>
5.06639e-05
98
0x62
ascii
235268: b
219
<0x02>
5.07236e-05
2
0x02
ascii
247977: \x02
266
<0x31>
5.08428e-05
49
0x31
ascii
235274: 1
258
<0x29>
5.0962e-05
41
0x29
ascii
235275: )
263
<0x2E>
5.10812e-05
46
0x2E
ascii
235265: .
284
<0x43>
5.11408e-05
67
0x43
ascii
235288: C
230
<0x0D>
5.13792e-05
13
0x0D
ascii
235316: \r
252
<0x23>
5.13792e-05
35
0x23
ascii
235345: #
323
<0x6A>
5.14388e-05
106
0x6A
ascii
235312: j
248
<0x1F>
5.1558e-05
31
0x1F
ascii
251698: \x1f
124 additional entries below threshold
token_id
token
indicator
ord
hex
byte_type
reencoded
325
<0x6C>
5.1558e-05
108
0x6C
ascii
235257: l
330
<0x71>
5.17964e-05
113
0x71
ascii
235317: q
262
<0x2D>
5.1856e-05
45
0x2D
ascii
235290: -
289
<0x48>
5.19156e-05
72
0x48
ascii
235314: H
264
<0x2F>
5.19753e-05
47
0x2F
ascii
235283: /
307
<0x5A>
5.19753e-05
90
0x5A
ascii
235382: Z
232
<0x0F>
5.20945e-05
15
0x0F
ascii
249949: \x0f
310
<0x5D>
5.21541e-05
93
0x5D
ascii
235307: ]
277
<0x3C>
5.22137e-05
60
0x3C
ascii
235322: <
282
<0x41>
5.22137e-05
65
0x41
ascii
235280: A
236
<0x13>
5.23329e-05
19
0x13
ascii
252752: \x13
316
<0x63>
5.23329e-05
99
0x63
ascii
235260: c
257
<0x28>
5.23925e-05
40
0x28
ascii
235278: (
296
<0x4F>
5.23925e-05
79
0x4F
ascii
235302: O
309
<0x5C>
5.23925e-05
92
0x5C
ascii
235286: \
283
<0x42>
5.24521e-05
66
0x42
ascii
235305: B
293
<0x4C>
5.24521e-05
76
0x4C
ascii
235301: L
222
<0x05>
5.25117e-05
5
0x05
ascii
250940: \x05
244
<0x1B>
5.25713e-05
27
0x1B
ascii
242385: \x1b
270
<0x35>
5.26309e-05
53
0x35
ascii
235308: 5
276
<0x3B>
5.26309e-05
59
0x3B
ascii
235289: ;
280
<0x3F>
5.28097e-05
63
0x3F
ascii
235336: ?
312
<0x5F>
5.28097e-05
95
0x5F
ascii
235298: _
340
<0x7B>
5.28097e-05
123
0x7B
ascii
235282: {
301
<0x54>
5.28693e-05
84
0x54
ascii
235279: T
333
<0x74>
5.29885e-05
116
0x74
ascii
235251: t
250
<0x21>
5.30481e-05
33
0x21
ascii
235341: !
335
<0x76>
5.30481e-05
118
0x76
ascii
235272: v
228
<0x0B>
5.31077e-05
11
0x0B
ascii
249154: \x0b
274
<0x39>
5.31673e-05
57
0x39
ascii
235315: 9
290
<0x49>
5.31673e-05
73
0x49
ascii
235285: I
304
<0x57>
5.32866e-05
87
0x57
ascii
235325: W
332
<0x73>
5.32866e-05
115
0x73
ascii
235256: s
231
<0x0E>
5.3525e-05
14
0x0E
ascii
252689: \x0e
336
<0x77>
5.3525e-05
119
0x77
ascii
235271: w
251
<0x22>
5.35846e-05
34
0x22
ascii
235281: "
319
<0x66>
5.35846e-05
102
0x66
ascii
235266: f
238
<0x15>
5.37038e-05
21
0x15
ascii
253776: \x15
241
<0x18>
5.37038e-05
24
0x18
ascii
250600: \x18
249
<0x20>
5.37038e-05
32
0x20
ascii
235248: ▁
422
<0xCD>
5.37634e-05
205
0xCD
utf8
271
<0x36>
5.3823e-05
54
0x36
ascii
235318: 6
302
<0x55>
5.38826e-05
85
0x55
ascii
235327: U
320
<0x67>
5.38826e-05
103
0x67
ascii
235264: g
334
<0x75>
5.40018e-05
117
0x75
ascii
235261: u
342
<0x7D>
5.40018e-05
125
0x7D
ascii
235270: }
409
<0xC0>
5.40018e-05
192
0xC0
unused_utf8
295
<0x4E>
5.40614e-05
78
0x4E
ascii
235300: N
259
<0x2A>
5.4121e-05
42
0x2A
ascii
235287: *
285
<0x44>
5.4121e-05
68
0x44
ascii
235299: D
267
<0x32>
5.41806e-05
50
0x32
ascii
235284: 2
318
<0x65>
5.41806e-05
101
0x65
ascii
235249: e
467
<0xFA>
5.41806e-05
250
0xFA
unused_utf8
255
<0x26>
5.42402e-05
38
0x26
ascii
235343: &
343
<0x7E>
5.43594e-05
126
0x7E
ascii
235436: ~
275
<0x3A>
5.47171e-05
58
0x3A
ascii
235292: :
303
<0x56>
5.47171e-05
86
0x56
ascii
235330: V
308
<0x5B>
5.47171e-05
91
0x5B
ascii
235309: [
234
<0x11>
5.47767e-05
17
0x11
ascii
253614: \x11
470
<0xFD>
5.48363e-05
253
0xFD
unused_utf8
233
<0x10>
5.48959e-05
16
0x10
ascii
248775: \x10
305
<0x58>
5.49555e-05
88
0x58
ascii
235356: X
326
<0x6D>
5.49555e-05
109
0x6D
ascii
235262: m
471
<0xFE>
5.49555e-05
254
0xFE
unused_utf8
339
<0x7A>
5.50151e-05
122
0x7A
ascii
235306: z
414
<0xC5>
5.50747e-05
197
0xC5
utf8
465
<0xF8>
5.51343e-05
248
0xF8
unused_utf8
235
<0x12>
5.51939e-05
18
0x12
ascii
252232: \x12
268
<0x33>
5.52535e-05
51
0x33
ascii
235304: 3
464
<0xF7>
5.52535e-05
247
0xF7
unused_utf8
472
<0xFF>
5.53131e-05
255
0xFF
unused_utf8
254
<0x25>
5.54323e-05
37
0x25
ascii
235358: %
281
<0x40>
5.54323e-05
64
0x40
ascii
235348: @
227
<0x0A>
5.54919e-05
10
0x0A
ascii
108: \n
247
<0x1E>
5.54919e-05
30
0x1E
ascii
253777: \x1e
311
<0x5E>
5.54919e-05
94
0x5E
ascii
235393: ^
287
<0x46>
5.55515e-05
70
0x46
ascii
235311: F
243
<0x1A>
5.56111e-05
26
0x1A
ascii
243931: \x1a
298
<0x51>
5.56111e-05
81
0x51
ascii
235368: Q
331
<0x72>
5.56707e-05
114
0x72
ascii
235255: r
237
<0x14>
5.57899e-05
20
0x14
ascii
250861: \x14
229
<0x0C>
5.59092e-05
12
0x0C
ascii
238092: \x0c
288
<0x47>
5.59092e-05
71
0x47
ascii
235319: G
223
<0x06>
5.60284e-05
6
0x06
ascii
251368: \x06
272
<0x37>
5.61476e-05
55
0x37
ascii
235324: 7
306
<0x59>
5.62668e-05
89
0x59
ascii
235342: Y
245
<0x1C>
5.63264e-05
28
0x1C
ascii
255818: \x1c
337
<0x78>
5.63264e-05
120
0x78
ascii
235297: x
279
<0x3E>
5.6386e-05
62
0x3E
ascii
235313: >
273
<0x38>
5.65052e-05
56
0x38
ascii
235321: 8
468
<0xFB>
5.65052e-05
251
0xFB
unused_utf8
220
<0x03>
5.65648e-05
3
0x03
ascii
249006: \x03
253
<0x24>
5.65648e-05
36
0x24
ascii
235323: $
291
<0x4A>
5.6684e-05
74
0x4A
ascii
235338: J
218
<0x01>
5.69224e-05
1
0x01
ascii
238213: \x01
294
<0x4D>
5.71609e-05
77
0x4D
ascii
235296: M
322
<0x69>
5.72205e-05
105
0x69
ascii
235252: i
341
<0x7C>
5.72205e-05
124
0x7C
ascii
235371: |
246
<0x1D>
5.72801e-05
29
0x1D
ascii
254363: \x1d
411
<0xC2>
5.72801e-05
194
0xC2
utf8
260
<0x2B>
5.73993e-05
43
0x2B
ascii
235340: +
469
<0xFC>
5.76973e-05
252
0xFC
unused_utf8
344
<0x7F>
5.77569e-05
127
0x7F
ascii
244423: \x7f
462
<0xF5>
5.78165e-05
245
0xF5
unused_utf8
328
<0x6F>
5.78761e-05
111
0x6F
ascii
235253: o
239
<0x16>
5.79953e-05
22
0x16
ascii
254362: \x16
286
<0x45>
5.81145e-05
69
0x45
ascii
235291: E
324
<0x6B>
5.81145e-05
107
0x6B
ascii
235273: k
421
<0xCC>
5.81741e-05
204
0xCC
utf8
242
<0x19>
5.84126e-05
25
0x19
ascii
254472: \x19
410
<0xC1>
5.84126e-05
193
0xC1
unused_utf8
256
<0x27>
5.84722e-05
39
0x27
ascii
235303: '
329
<0x70>
5.85318e-05
112
0x70
ascii
235263: p
463
<0xF6>
5.8651e-05
246
0xF6
unused_utf8
338
<0x79>
5.87106e-05
121
0x79
ascii
235267: y
269
<0x34>
5.9545e-05
52
0x34
ascii
235310: 4
327
<0x6E>
6.01411e-05
110
0x6E
ascii
235254: n
224
<0x07>
6.02007e-05
7
0x07
ascii
249340: \x07
297
<0x50>
6.03199e-05
80
0x50
ascii
235295: P
314
<0x61>
6.04391e-05
97
0x61
ascii
235250: a
221
<0x04>
6.04987e-05
4
0x04
ascii
250124: \x04
261
<0x2C>
6.15716e-05
44
0x2C
ascii
235269: ,
413
<0xC4>
6.17504e-05
196
0xC4
utf8
321
<0x68>
6.25849e-05
104
0x68
ascii
235259: h
102 entries below threshold of 0.002
token_id
token
indicator
max_prob
12
<unused5>
4.8995e-05
0.00039
38
<unused31>
4.94123e-05
0.00042
85
<unused78>
4.94719e-05
0.00047
25
<unused18>
4.99487e-05
0.00046
55
<unused48>
5.02467e-05
0.00039
88
<unused81>
5.06043e-05
0.00042
97
<unused90>
5.07832e-05
0.00039
90
<unused83>
5.08428e-05
0.00033
11
<unused4>
5.10216e-05
0.00046
87
<unused80>
5.10216e-05
0.00042
14
<unused7>
5.12004e-05
0.00047
31
<unused24>
5.126e-05
0.0004
35
<unused28>
5.126e-05
0.00044
18
<unused11>
5.14388e-05
0.00043
74
<unused67>
5.14388e-05
0.00042
76
<unused69>
5.14388e-05
0.00048
100
<unused93>
5.16176e-05
0.00033
0
<pad>
5.17368e-05
7e-13
21
<unused14>
5.19156e-05
0.00036
104
<unused97>
5.19156e-05
0.00038
82 additional entries below threshold
token_id
token
indicator
max_prob
66
<unused59>
5.20349e-05
0.00041
49
<unused42>
5.20945e-05
0.00043
62
<unused55>
5.21541e-05
0.0004
72
<unused65>
5.22137e-05
0.00044
23
<unused16>
5.22733e-05
0.00035
33
<unused26>
5.22733e-05
0.00043
91
<unused84>
5.22733e-05
0.00043
15
<unused8>
5.24521e-05
0.00042
58
<unused51>
5.24521e-05
0.00039
102
<unused95>
5.24521e-05
0.00042
78
<unused71>
5.25117e-05
0.00041
43
<unused36>
5.25713e-05
0.00044
75
<unused68>
5.25713e-05
0.00039
81
<unused74>
5.25713e-05
0.0004
103
<unused96>
5.26309e-05
0.00036
80
<unused73>
5.27501e-05
0.00043
42
<unused35>
5.30481e-05
0.00045
83
<unused76>
5.30481e-05
0.00042
92
<unused85>
5.32269e-05
0.0005
86
<unused79>
5.32866e-05
0.00041
105
<unused98>
5.32866e-05
0.00046
34
<unused27>
5.36442e-05
0.00044
79
<unused72>
5.3823e-05
0.00042
93
<unused86>
5.3823e-05
0.00031
27
<unused20>
5.40614e-05
0.00043
48
<unused41>
5.40614e-05
0.00043
52
<unused45>
5.42998e-05
0.00045
24
<unused17>
5.46575e-05
0.00039
71
<unused64>
5.47767e-05
0.00044
107
<end_of_turn>
5.47767e-05
0.00038
17
<unused10>
5.48363e-05
0.00041
40
<unused33>
5.48363e-05
0.00041
95
<unused88>
5.48363e-05
0.00046
22
<unused15>
5.48959e-05
0.00043
106
<start_of_turn>
5.48959e-05
0.00044
3
<unk>
5.49555e-05
0.00036
28
<unused21>
5.49555e-05
0.00049
73
<unused66>
5.49555e-05
0.00035
99
<unused92>
5.49555e-05
0.00045
36
<unused29>
5.51343e-05
0.00039
101
<unused94>
5.51939e-05
0.00054
13
<unused6>
5.52535e-05
0.00043
57
<unused50>
5.54919e-05
0.00038
61
<unused54>
5.55515e-05
0.00048
84
<unused77>
5.55515e-05
0.00038
37
<unused30>
5.56111e-05
0.00047
59
<unused52>
5.56111e-05
0.00041
94
<unused87>
5.56111e-05
0.00035
50
<unused43>
5.57303e-05
0.00042
26
<unused19>
5.59688e-05
0.00045
56
<unused49>
5.59688e-05
0.00039
10
<unused3>
5.60284e-05
0.00039
20
<unused13>
5.61476e-05
0.00044
98
<unused91>
5.62072e-05
0.00044
29
<unused22>
5.6386e-05
0.00043
82
<unused75>
5.6386e-05
0.00039
19
<unused12>
5.65052e-05
0.00044
6
[@BOS@]
5.6982e-05
0.00043
39
<unused32>
5.6982e-05
0.00043
54
<unused47>
5.6982e-05
0.00037
53
<unused46>
5.70416e-05
0.00043
65
<unused58>
5.70416e-05
0.00041
32
<unused25>
5.72205e-05
0.00039
46
<unused39>
5.72205e-05
0.00044
30
<unused23>
5.72801e-05
0.00042
47
<unused40>
5.72801e-05
0.00038
77
<unused70>
5.72801e-05
0.0004
16
<unused9>
5.73397e-05
0.00042
51
<unused44>
5.74589e-05
0.00043
96
<unused89>
5.75185e-05
0.0004
44
<unused37>
5.78761e-05
0.00041
89
<unused82>
5.79357e-05
0.00043
60
<unused53>
5.87106e-05
0.00047
45
<unused38>
5.9247e-05
0.00041
5
<2mass>
5.99623e-05
0.00042
63
<unused56>
6.06775e-05
0.00047
64
<unused57>
6.07967e-05
0.00051
41
<unused34>
6.91414e-05
0.00042
9
<unused2>
7.05123e-05
0.00051
8
<unused1>
7.18236e-05
0.00043
255999
<unused99>
0.000144482
0.00046
7
<unused0>
0.000176907
0.0013
1 entries below threshold of 0.002
token_id
token
indicator
reencoded
158576
▁ссср
5.22137e-05
941: ▁с, 15497: сс, 235334: р