Commit ebd5d87
serialize scales as bf16 and serialize in Named Data Map (#11031)
Summary:
XNNPACK Currently uses BF16 scales for running GEMMS with groupwise quantized weights. Currently we serialize scales as FP32, and then convert them to BF16 before passing to XNNPACK. We can save both memory and file size by serializing the scales as BF16 first.
As an additional step here, we move the serialization of scales both for channelwise and groupwise quantized weights into the named data map. In the future, if we want to swap data that could be a potential feature because scales are no longer tied to the XNNPACK payload but can be swappable through the ptd file.
cc lucylq for the scale serialization
### Llama Experiments
```
-rw-r--r-- 1 maxren staff 1746392320 May 20 16:49 llama3_fp32_scales.pte
-rw-r--r-- 1 maxren staff 1707798912 May 20 18:47 llama3_bf16_scales.pte
```
we see ~40 mb reduction in model size.
Reviewed By: kirklandsign
Differential Revision: D75151974
Pulled By: mcr2291 parent 1bc36c7 commit ebd5d87
File tree
5 files changed
+91
-11
lines changed- backends/xnnpack
- operators
- runtime
- serialization
5 files changed
+91
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
277 | | - | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
278 | 280 | | |
279 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
280 | 303 | | |
281 | 304 | | |
282 | | - | |
| 305 | + | |
283 | 306 | | |
284 | 307 | | |
| 308 | + | |
| 309 | + | |
285 | 310 | | |
286 | | - | |
| 311 | + | |
287 | 312 | | |
288 | | - | |
| 313 | + | |
289 | 314 | | |
| 315 | + | |
| 316 | + | |
290 | 317 | | |
291 | 318 | | |
292 | 319 | | |
| |||
449 | 476 | | |
450 | 477 | | |
451 | 478 | | |
452 | | - | |
| 479 | + | |
453 | 480 | | |
454 | 481 | | |
455 | 482 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
421 | 421 | | |
422 | 422 | | |
423 | 423 | | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
424 | 445 | | |
425 | 446 | | |
426 | 447 | | |
427 | 448 | | |
428 | | - | |
| 449 | + | |
429 | 450 | | |
430 | 451 | | |
431 | 452 | | |
| |||
452 | 473 | | |
453 | 474 | | |
454 | 475 | | |
455 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
456 | 490 | | |
457 | | - | |
458 | | - | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
459 | 494 | | |
460 | 495 | | |
461 | 496 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
51 | 53 | | |
52 | 54 | | |
53 | 55 | | |
| |||
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
66 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
56 | 58 | | |
| 59 | + | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
428 | 435 | | |
429 | 436 | | |
430 | 437 | | |
431 | 438 | | |
432 | 439 | | |
433 | 440 | | |
434 | 441 | | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
435 | 445 | | |
436 | 446 | | |
437 | 447 | | |
| |||
0 commit comments