Commit 9583e3b
committed
[not for land] online fp8 quant with streaming weight post-processing
Summary:
not for land, just a demo
1. during weight loading, keep track of how many elements we have loaded
2. when we have loaded all the elements, call post-processing
can be used to call weight post-processing in a streaming fashion
to minimize GPU memory usage. Will only work if we can assume we only
load each weight chunk once.
Test Plan:
tested locally with facebook/opt-125m and `fp8` online quantization
Reviewers:
Subscribers:
Tasks:
Tags:
Signed-off-by: <[email protected]>1 parent b34129b commit 9583e3b
1 file changed
+33
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
437 | 437 | | |
438 | 438 | | |
439 | 439 | | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
440 | 469 | | |
441 | 470 | | |
442 | 471 | | |
| |||
446 | 475 | | |
447 | 476 | | |
448 | 477 | | |
449 | | - | |
| 478 | + | |
450 | 479 | | |
451 | 480 | | |
452 | 481 | | |
| |||
487 | 516 | | |
488 | 517 | | |
489 | 518 | | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
490 | 522 | | |
491 | 523 | | |
492 | 524 | | |
| |||
0 commit comments