[WC] Introduce flexible group size value search #3556

nikita-savelyevv · 2025-06-24T10:27:45Z

Changes

Introduce flexible group size search logic as a part of mixed precision algorithm. When enabled, each weight for which the
channel size is not divisible by the general group size value will be compressed to a newly calculated group size.

The new group size value is the maximal power of two (i.e., 2^k) such that:

channel size is divisible by it;
it is less than the originally specified group size value;
it is greater than or equal to min_flexible_group_size (16 by default).

If it's not possible to find a value satisfying these requirements, such weight is compressed to backup precision. If ratio < 1.0 and some weights have to be compressed to the backup precision because of group size issues, then these weights won't contribute to the ratio of backup mode group.

This method is disabled by default.

Reason for changes

Some models may have channel size values that are not divisible by the default group size. In such case a user can now provide nncf.AdvancedCompressionParameters(enable_flexible_group_size=True) advanced parameter instead of an ignored scope.

Example models:

microsoft/Phi-4-multimodal-instruct (lm_model and vision_embeddings_model)
HuggingFaceH4/Qwen2.5-Math-1.5B-Instruct-PRM-0.2

Metrics

Results for phi4-multimodal are below.

Language Model Precision	Vision Embed. Model Precision	WWB Similarity	Time of image-to-text request (sec.)	Time of audio-to-text request (sec.)
FP16	FP16	99.19%	31.21	17.76
Mixed precision: int4 or bf16	Mixed precision: int4 or bf16	77.51%	22.37	10.93
Mixed precision: int4 or int8	Mixed precision: int4 or int8	79.03%	19.95	9.47
int4 with mixed group size: 128 or 64	int4 with mixed group size: 128 or 16	81.36%	19.89	9.16

Last row corresponds to nncf.AdvancedCompressionParameters(enable_flexible_group_size=True).

Third row corresponds to nncf.AdvancedCompressionParameters(enable_flexible_group_size=True, min_flexible_group_size=128)

Second row corresponds to nncf.AdvancedCompressionParameters(enable_flexible_group_size=True, min_flexible_group_size=128) with backup_mode="none".

Inference time results are expected. Similarity not so much, but still no degradation for group size 16 case.

Related tickets

167337

Tests

Added test cases which assert that the expected log messages are printed.

https://github.com/openvinotoolkit/nncf/actions/runs/15852358755

nikita-savelyevv · 2025-06-24T13:53:48Z

During integration with optimum-intel I propose to introduce a boolean CLI argument called --flexible-group-size to expose this logic. min_flexible_group_size will only by available through Python API.

nikita-savelyevv · 2025-06-24T13:59:29Z

nncf/quantization/advanced_parameters.py

+    """
+
+    enable_flexible_group_size: bool = False
+    min_flexible_group_size: int = 16


The value of 16 is open for debate. Possibly, it should be larger.

nncf/quantization/algorithms/weight_compression/algorithm.py

tests/cross_fw/test_templates/template_test_weights_compression.py

nncf/quantization/algorithms/weight_compression/mixed_precision.py

nncf/quantization/algorithms/weight_compression/algorithm.py

ljaljushkin

LGTM

src/nncf/quantization/advanced_parameters.py

src/nncf/quantization/algorithms/weight_compression/algorithm.py

… precision at all

alexsu52

LGTM

### Changes - Replaced boolean `enable_flexible_group_size` with a `group_size_fallback_mode` enum. Possible values are NONE, IGNORE, ADJUST. Meaning: - ERROR: raise exception if can't divide by group size. - IGNORE: node with invalid group size won't be compressed at all. - ADJUST: the same as with `enable_flexible_group_size=True` on develop, i.e. compute new group size if possible, otherwise compress to backup precision. - Renamed `min_flexible_group_size` to `min_adjusted_group_size`. Set `group_size_fallback_mode` to IGNORE by default. Users are informed the following way depending on the selected fallback mode: - ERROR: exception is raised with a suggestion to set `group_size_fallback_mode` to IGNORE or ADJUST. - IGNORE: a info message is logged that some nodes will be ignored. - ADJUST: an info message is logged that some nodes will have an adjusted group size value / compressed to backup mode. ### Reason for changes UX improvement: now the default behavior won't result in an exception. ### Related tickets 167337 ### Tests Adopted the tests introduced in #3556.

PRs: openvinotoolkit#3556 and openvinotoolkit#3583

Nikita Savelyev added 2 commits May 29, 2025 11:01

Initial commit

c5aa322

Initial commit

37b3fbb

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Jun 24, 2025

Nikita Savelyev added 2 commits June 24, 2025 12:52

Create advanced params if None

e67b26c

Added tests

49403a6

nikita-savelyevv changed the title ~~Introduce valid group size value search during weights compression~~ [WC] Introduce flexible group size value search Jun 24, 2025

Add docs

8cc5b9b

nikita-savelyevv commented Jun 24, 2025

View reviewed changes

nikita-savelyevv marked this pull request as ready for review June 24, 2025 14:42

nikita-savelyevv requested a review from a team as a code owner June 24, 2025 14:42

nikita-savelyevv requested review from andreyanufr and ljaljushkin June 24, 2025 14:43

Tweak docs

63faa90

nikita-savelyevv commented Jun 26, 2025

View reviewed changes

nncf/quantization/algorithms/weight_compression/algorithm.py Show resolved Hide resolved

Nikita Savelyev added 2 commits June 26, 2025 17:36

Keep the old exception for now

413ceb5

Add channel size info to exception message

0a90f69

ljaljushkin requested changes Jun 26, 2025

View reviewed changes

Nikita Savelyev added 6 commits June 30, 2025 17:36

Move flexible group size logic outside of mixed precision algorithm

e62a5d7

Add mixed precision tests

284fc76

Remove unused method

70cbc99

Update comment

151ef3b

Fix tests

35f7197

Fix tests 2

a7abc2d

nikita-savelyevv requested a review from ljaljushkin June 30, 2025 16:52

ljaljushkin reviewed Jul 1, 2025

View reviewed changes

nncf/quantization/algorithms/weight_compression/mixed_precision.py Outdated Show resolved Hide resolved

ljaljushkin reviewed Jul 1, 2025

View reviewed changes

nncf/quantization/algorithms/weight_compression/algorithm.py Show resolved Hide resolved

ljaljushkin approved these changes Jul 1, 2025

View reviewed changes

Nikita Savelyev added 4 commits July 1, 2025 15:32

Add docstring

37cb2eb

Revert accidental changes

6644de5

Tweak

70153f3

Fix linter

ae5b690

github-actions bot removed NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ labels Jul 2, 2025

Merge branch 'develop' into ns/flexible-group-size-backup

729ab65

andreyanufr approved these changes Jul 4, 2025

View reviewed changes

nikita-savelyevv assigned ljaljushkin Jul 4, 2025

nikita-savelyevv requested a review from alexsu52 July 4, 2025 09:18

alexsu52 suggested changes Jul 8, 2025

View reviewed changes

src/nncf/quantization/advanced_parameters.py Outdated Show resolved Hide resolved

src/nncf/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

src/nncf/quantization/algorithms/weight_compression/algorithm.py Outdated Show resolved Hide resolved

Nikita Savelyev added 3 commits July 9, 2025 14:37

When cant find group size, do not consider these weights during mixed…

b25ff90

… precision at all

Remove AdvancedGroupSizeParameters

f6fdc5f

Update ratio_defining_params higher in call stack

c6148a9

nikita-savelyevv requested a review from alexsu52 July 9, 2025 13:07

alexsu52 approved these changes Jul 9, 2025

View reviewed changes

Merge branch 'develop' into ns/flexible-group-size

f3eedb9

alexsu52 merged commit 12c9995 into openvinotoolkit:develop Jul 10, 2025
20 checks passed

nikita-savelyevv mentioned this pull request Jul 10, 2025

[WC] GroupSizeFallbackMode instead of enable_flexible_group_size #3583

Merged

nikita-savelyevv pushed a commit to AlexanderDokuchaev/nncf that referenced this pull request Aug 25, 2025

Update ReleaseNotes.md

587f924

PRs: openvinotoolkit#3556 and openvinotoolkit#3583

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WC] Introduce flexible group size value search #3556

[WC] Introduce flexible group size value search #3556

Uh oh!

nikita-savelyevv commented Jun 24, 2025 •

edited

Loading

Uh oh!

nikita-savelyevv commented Jun 24, 2025 •

edited

Loading

Uh oh!

nikita-savelyevv Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ljaljushkin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexsu52 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WC] Introduce flexible group size value search #3556

[WC] Introduce flexible group size value search #3556

Uh oh!

Conversation

nikita-savelyevv commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Metrics

Related tickets

Tests

Uh oh!

nikita-savelyevv commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikita-savelyevv Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ljaljushkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexsu52 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nikita-savelyevv commented Jun 24, 2025 •

edited

Loading

nikita-savelyevv commented Jun 24, 2025 •

edited

Loading