Commit 543cdb3
authored
serialize scales as bf16 and serialize in Named Data Map (pytorch#11031)
XNNPACK Currently uses BF16 scales for running GEMMS with groupwise
quantized weights. Currently we serialize scales as FP32, and then
convert them to BF16 before passing to XNNPACK. We can save both memory
and file size by serializing the scales as BF16 first.
As an additional step here, we move the serialization of scales both for
channelwise and groupwise quantized weights into the named data map. In
the future, if we want to swap data that could be a potential feature
because scales are no longer tied to the XNNPACK payload but can be
swappable through the ptd file.
cc @lucylq for the scale serialization
### Llama Experiments
```
-rw-r--r-- 1 maxren staff 1746392320 May 20 16:49 llama3_fp32_scales.pte
-rw-r--r-- 1 maxren staff 1707798912 May 20 18:47 llama3_bf16_scales.pte
```
we see ~40 mb reduction in model size.1 parent 4b15029 commit 543cdb3
File tree
5 files changed
+91
-11
lines changed- backends/xnnpack
- operators
- runtime
- serialization
5 files changed
+91
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
277 | | - | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
278 | 280 | | |
279 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
280 | 303 | | |
281 | 304 | | |
282 | | - | |
| 305 | + | |
283 | 306 | | |
284 | 307 | | |
| 308 | + | |
| 309 | + | |
285 | 310 | | |
286 | | - | |
| 311 | + | |
287 | 312 | | |
288 | | - | |
| 313 | + | |
289 | 314 | | |
| 315 | + | |
| 316 | + | |
290 | 317 | | |
291 | 318 | | |
292 | 319 | | |
| |||
449 | 476 | | |
450 | 477 | | |
451 | 478 | | |
452 | | - | |
| 479 | + | |
453 | 480 | | |
454 | 481 | | |
455 | 482 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
421 | 421 | | |
422 | 422 | | |
423 | 423 | | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
424 | 445 | | |
425 | 446 | | |
426 | 447 | | |
427 | 448 | | |
428 | | - | |
| 449 | + | |
429 | 450 | | |
430 | 451 | | |
431 | 452 | | |
| |||
452 | 473 | | |
453 | 474 | | |
454 | 475 | | |
455 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
456 | 490 | | |
457 | | - | |
458 | | - | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
459 | 494 | | |
460 | 495 | | |
461 | 496 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
51 | 53 | | |
52 | 54 | | |
53 | 55 | | |
| |||
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
66 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
56 | 58 | | |
| 59 | + | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
428 | 435 | | |
429 | 436 | | |
430 | 437 | | |
431 | 438 | | |
432 | 439 | | |
433 | 440 | | |
434 | 441 | | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
435 | 445 | | |
436 | 446 | | |
437 | 447 | | |
| |||
0 commit comments