Commit 7bf0bf0
authored
feat: add vocabulary quantization (#271)
* remove multiword warning
* add superbpe tokenizers
* fix: pretokenize tokens before checking vocabulary
* feat: add quantization
* wip
* wip
* wip
* fixes
* fixes
* fix issue with mwe
* wip
* wip
* wip
* wip
* wip
* wip
* fixes
* fix: refactor quantization
* fix: refactor quantization
* wip
* wip
* typing
* fixes
* fix typing/linting
* add quantization helper to top
* change init to random
* fix: annotations import
* fix test import
* import Union for 3.9
* fix: union again
* store all relevant info in safetensors
* make weights float in training1 parent 13095c9 commit 7bf0bf0
File tree
19 files changed
+425
-100
lines changed- model2vec
- distill
- inference
- tokenizer
- train
- tests
19 files changed
+425
-100
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
16 | 18 | | |
| 19 | + | |
17 | 20 | | |
18 | 21 | | |
19 | 22 | | |
| |||
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| 35 | + | |
32 | 36 | | |
33 | 37 | | |
34 | 38 | | |
| |||
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
| 61 | + | |
57 | 62 | | |
58 | 63 | | |
59 | 64 | | |
| |||
103 | 108 | | |
104 | 109 | | |
105 | 110 | | |
106 | | - | |
107 | 111 | | |
108 | 112 | | |
109 | 113 | | |
| |||
113 | 117 | | |
114 | 118 | | |
115 | 119 | | |
116 | | - | |
117 | | - | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
118 | 130 | | |
119 | 131 | | |
120 | 132 | | |
| |||
148 | 160 | | |
149 | 161 | | |
150 | 162 | | |
| 163 | + | |
| 164 | + | |
151 | 165 | | |
152 | 166 | | |
153 | 167 | | |
| |||
211 | 225 | | |
212 | 226 | | |
213 | 227 | | |
| 228 | + | |
214 | 229 | | |
215 | 230 | | |
216 | 231 | | |
| |||
235 | 250 | | |
236 | 251 | | |
237 | 252 | | |
| 253 | + | |
238 | 254 | | |
239 | 255 | | |
240 | 256 | | |
| |||
255 | 271 | | |
256 | 272 | | |
257 | 273 | | |
| 274 | + | |
258 | 275 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
15 | 14 | | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | | - | |
| 101 | + | |
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
| 119 | + | |
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
157 | 159 | | |
158 | | - | |
| 160 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| 41 | + | |
| 42 | + | |
39 | 43 | | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
43 | | - | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
44 | 55 | | |
45 | 56 | | |
46 | 57 | | |
| |||
101 | 112 | | |
102 | 113 | | |
103 | 114 | | |
104 | | - | |
| 115 | + | |
105 | 116 | | |
106 | 117 | | |
107 | 118 | | |
| |||
114 | 125 | | |
115 | 126 | | |
116 | 127 | | |
117 | | - | |
| 128 | + | |
118 | 129 | | |
119 | 130 | | |
120 | 131 | | |
| |||
176 | 187 | | |
177 | 188 | | |
178 | 189 | | |
179 | | - | |
180 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
181 | 201 | | |
182 | 202 | | |
183 | 203 | | |
| |||
187 | 207 | | |
188 | 208 | | |
189 | 209 | | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
| 210 | + | |
196 | 211 | | |
197 | 212 | | |
198 | 213 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
273 | 273 | | |
274 | 274 | | |
275 | 275 | | |
276 | | - | |
| 276 | + | |
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
283 | | - | |
| 283 | + | |
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
| |||
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
| 295 | + | |
| 296 | + | |
295 | 297 | | |
296 | 298 | | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
301 | 307 | | |
302 | 308 | | |
303 | | - | |
304 | | - | |
| 309 | + | |
| 310 | + | |
305 | 311 | | |
306 | 312 | | |
307 | 313 | | |
| |||
0 commit comments