Commit 12c9995
Nikita Savelyev
[WC] Introduce flexible group size value search (#3556)
### Changes
Introduce flexible group size search logic as a part of mixed precision
algorithm. When enabled, each weight for which the
channel size is not divisible by the general group size value will be
compressed to a newly calculated group size.
The new group size value is the maximal power of two (i.e., 2^k) such
that:
- channel size is divisible by it;
- it is less than the originally specified group size value;
- it is greater than or equal to `min_flexible_group_size` (16 by
default).
If it's not possible to find a value satisfying these requirements, such
weight is compressed to backup precision. If ratio < 1.0 and some
weights have to be compressed to the backup precision because of group
size issues, then these weights won't contribute to the ratio of backup
mode group.
This method is disabled by default.
### Reason for changes
Some models may have channel size values that are not divisible by the
default group size. In such case a user can now provide
`nncf.AdvancedCompressionParameters(enable_flexible_group_size=True)`
advanced parameter instead of an ignored scope.
Example models:
- `microsoft/Phi-4-multimodal-instruct` (lm_model and
vision_embeddings_model)
- `HuggingFaceH4/Qwen2.5-Math-1.5B-Instruct-PRM-0.2`
### Metrics
Results for phi4-multimodal are below.
| Language Model Precision | Vision Embed. Model Precision | WWB
Similarity | Time of image-to-text request (sec.) | Time of
audio-to-text request (sec.) |
|------------------------------------|-------------------------------------|----------------|---------------------------------------|----------------------------------------|
| FP16 | FP16 | 99.19% | 31.21 | 17.76 |
| Mixed precision: int4 or bf16 | Mixed precision: int4 or bf16 | 77.51%
| 22.37 | 10.93 |
| Mixed precision: int4 or int8 | Mixed precision: int4 or int8 | 79.03%
| 19.95 | 9.47 |
| int4 with mixed group size: 128 or 64 | int4 with mixed group size:
128 or 16 | 81.36% | 19.89 | 9.16 |
Last row corresponds to
`nncf.AdvancedCompressionParameters(enable_flexible_group_size=True)`.
Third row corresponds to
`nncf.AdvancedCompressionParameters(enable_flexible_group_size=True,
min_flexible_group_size=128)`
Second row corresponds to
`nncf.AdvancedCompressionParameters(enable_flexible_group_size=True,
min_flexible_group_size=128)` with `backup_mode="none"`.
Inference time results are expected. Similarity not so much, but still
no degradation for group size 16 case.
### Related tickets
167337
### Tests
Added test cases which assert that the expected log messages are
printed.
https://github.com/openvinotoolkit/nncf/actions/runs/158523587551 parent 71ae2c1 commit 12c9995
File tree
20 files changed
+408
-178
lines changed- src/nncf/quantization
- algorithms/weight_compression
- tests
- cross_fw/test_templates
- openvino/native
- quantization
- torch2
- function_hook/quantization
- fx
20 files changed
+408
-178
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
372 | 388 | | |
373 | 389 | | |
374 | 390 | | |
| |||
377 | 393 | | |
378 | 394 | | |
379 | 395 | | |
380 | | - | |
381 | | - | |
382 | 396 | | |
383 | 397 | | |
384 | 398 | | |
| |||
387 | 401 | | |
388 | 402 | | |
389 | 403 | | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
390 | 407 | | |
391 | 408 | | |
392 | 409 | | |
393 | 410 | | |
394 | 411 | | |
395 | 412 | | |
396 | | - | |
397 | 413 | | |
398 | 414 | | |
399 | 415 | | |
| |||
Lines changed: 102 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
| |||
318 | 319 | | |
319 | 320 | | |
320 | 321 | | |
321 | | - | |
322 | 322 | | |
323 | | - | |
| 323 | + | |
324 | 324 | | |
325 | 325 | | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
326 | 329 | | |
327 | 330 | | |
328 | 331 | | |
| |||
454 | 457 | | |
455 | 458 | | |
456 | 459 | | |
457 | | - | |
| 460 | + | |
458 | 461 | | |
459 | 462 | | |
460 | 463 | | |
| |||
464 | 467 | | |
465 | 468 | | |
466 | 469 | | |
467 | | - | |
| 470 | + | |
468 | 471 | | |
469 | 472 | | |
470 | 473 | | |
| |||
474 | 477 | | |
475 | 478 | | |
476 | 479 | | |
| 480 | + | |
477 | 481 | | |
478 | 482 | | |
479 | 483 | | |
| |||
483 | 487 | | |
484 | 488 | | |
485 | 489 | | |
| 490 | + | |
486 | 491 | | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
491 | 496 | | |
492 | | - | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
493 | 576 | | |
494 | 577 | | |
495 | 578 | | |
| |||
625 | 708 | | |
626 | 709 | | |
627 | 710 | | |
628 | | - | |
629 | 711 | | |
630 | 712 | | |
631 | 713 | | |
| |||
654 | 736 | | |
655 | 737 | | |
656 | 738 | | |
657 | | - | |
| 739 | + | |
658 | 740 | | |
659 | 741 | | |
660 | 742 | | |
661 | 743 | | |
662 | 744 | | |
663 | | - | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
664 | 754 | | |
665 | 755 | | |
666 | 756 | | |
| |||
Lines changed: 9 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| |||
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
89 | | - | |
| 91 | + | |
90 | 92 | | |
91 | 93 | | |
92 | 94 | | |
93 | 95 | | |
94 | 96 | | |
95 | 97 | | |
96 | 98 | | |
97 | | - | |
| 99 | + | |
98 | 100 | | |
99 | 101 | | |
100 | 102 | | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
Lines changed: 0 additions & 32 deletions
This file was deleted.
Lines changed: 8 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | | - | |
51 | 50 | | |
52 | 51 | | |
53 | 52 | | |
54 | 53 | | |
55 | | - | |
56 | 54 | | |
57 | 55 | | |
58 | 56 | | |
| |||
79 | 77 | | |
80 | 78 | | |
81 | 79 | | |
82 | | - | |
| 80 | + | |
83 | 81 | | |
84 | | - | |
| 82 | + | |
| 83 | + | |
85 | 84 | | |
86 | 85 | | |
87 | 86 | | |
88 | 87 | | |
89 | 88 | | |
90 | 89 | | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
101 | 101 | | |
| 102 | + | |
102 | 103 | | |
103 | 104 | | |
104 | 105 | | |
| |||
0 commit comments