-
Notifications
You must be signed in to change notification settings - Fork 276
[WC] Introduce flexible group size value search #3556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WC] Introduce flexible group size value search #3556
Conversation
|
During integration with optimum-intel I propose to introduce a boolean CLI argument called |
| """ | ||
|
|
||
| enable_flexible_group_size: bool = False | ||
| min_flexible_group_size: int = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of 16 is open for debate. Possibly, it should be larger.
nncf/quantization/algorithms/weight_compression/mixed_precision.py
Outdated
Show resolved
Hide resolved
nncf/quantization/algorithms/weight_compression/mixed_precision.py
Outdated
Show resolved
Hide resolved
nncf/quantization/algorithms/weight_compression/mixed_precision.py
Outdated
Show resolved
Hide resolved
ljaljushkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/nncf/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
alexsu52
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### Changes - Replaced boolean `enable_flexible_group_size` with a `group_size_fallback_mode` enum. Possible values are NONE, IGNORE, ADJUST. Meaning: - ERROR: raise exception if can't divide by group size. - IGNORE: node with invalid group size won't be compressed at all. - ADJUST: the same as with `enable_flexible_group_size=True` on develop, i.e. compute new group size if possible, otherwise compress to backup precision. - Renamed `min_flexible_group_size` to `min_adjusted_group_size`. Set `group_size_fallback_mode` to IGNORE by default. Users are informed the following way depending on the selected fallback mode: - ERROR: exception is raised with a suggestion to set `group_size_fallback_mode` to IGNORE or ADJUST. - IGNORE: a info message is logged that some nodes will be ignored. - ADJUST: an info message is logged that some nodes will have an adjusted group size value / compressed to backup mode. ### Reason for changes UX improvement: now the default behavior won't result in an exception. ### Related tickets 167337 ### Tests Adopted the tests introduced in #3556.
Changes
Introduce flexible group size search logic as a part of mixed precision algorithm. When enabled, each weight for which the
channel size is not divisible by the general group size value will be compressed to a newly calculated group size.
The new group size value is the maximal power of two (i.e., 2^k) such that:
min_flexible_group_size(16 by default).If it's not possible to find a value satisfying these requirements, such weight is compressed to backup precision. If ratio < 1.0 and some weights have to be compressed to the backup precision because of group size issues, then these weights won't contribute to the ratio of backup mode group.
This method is disabled by default.
Reason for changes
Some models may have channel size values that are not divisible by the default group size. In such case a user can now provide
nncf.AdvancedCompressionParameters(enable_flexible_group_size=True)advanced parameter instead of an ignored scope.Example models:
microsoft/Phi-4-multimodal-instruct(lm_model and vision_embeddings_model)HuggingFaceH4/Qwen2.5-Math-1.5B-Instruct-PRM-0.2Metrics
Results for phi4-multimodal are below.
Last row corresponds to
nncf.AdvancedCompressionParameters(enable_flexible_group_size=True).Third row corresponds to
nncf.AdvancedCompressionParameters(enable_flexible_group_size=True, min_flexible_group_size=128)Second row corresponds to
nncf.AdvancedCompressionParameters(enable_flexible_group_size=True, min_flexible_group_size=128)withbackup_mode="none".Inference time results are expected. Similarity not so much, but still no degradation for group size 16 case.
Related tickets
167337
Tests
Added test cases which assert that the expected log messages are printed.
https://github.com/openvinotoolkit/nncf/actions/runs/15852358755