Skip to content

Conversation

@nikita-savelyevv
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv commented Jun 24, 2025

Changes

Introduce flexible group size search logic as a part of mixed precision algorithm. When enabled, each weight for which the
channel size is not divisible by the general group size value will be compressed to a newly calculated group size.

The new group size value is the maximal power of two (i.e., 2^k) such that:

  • channel size is divisible by it;
  • it is less than the originally specified group size value;
  • it is greater than or equal to min_flexible_group_size (16 by default).

If it's not possible to find a value satisfying these requirements, such weight is compressed to backup precision. If ratio < 1.0 and some weights have to be compressed to the backup precision because of group size issues, then these weights won't contribute to the ratio of backup mode group.

This method is disabled by default.

Reason for changes

Some models may have channel size values that are not divisible by the default group size. In such case a user can now provide nncf.AdvancedCompressionParameters(enable_flexible_group_size=True) advanced parameter instead of an ignored scope.

Example models:

  • microsoft/Phi-4-multimodal-instruct (lm_model and vision_embeddings_model)
  • HuggingFaceH4/Qwen2.5-Math-1.5B-Instruct-PRM-0.2

Metrics

Results for phi4-multimodal are below.

Language Model Precision Vision Embed. Model Precision WWB Similarity Time of image-to-text request (sec.) Time of audio-to-text request (sec.)
FP16 FP16 99.19% 31.21 17.76
Mixed precision: int4 or bf16 Mixed precision: int4 or bf16 77.51% 22.37 10.93
Mixed precision: int4 or int8 Mixed precision: int4 or int8 79.03% 19.95 9.47
int4 with mixed group size: 128 or 64 int4 with mixed group size: 128 or 16 81.36% 19.89 9.16

Last row corresponds to nncf.AdvancedCompressionParameters(enable_flexible_group_size=True).

Third row corresponds to nncf.AdvancedCompressionParameters(enable_flexible_group_size=True, min_flexible_group_size=128)

Second row corresponds to nncf.AdvancedCompressionParameters(enable_flexible_group_size=True, min_flexible_group_size=128) with backup_mode="none".

Inference time results are expected. Similarity not so much, but still no degradation for group size 16 case.

Related tickets

167337

Tests

Added test cases which assert that the expected log messages are printed.

https://github.com/openvinotoolkit/nncf/actions/runs/15852358755

@github-actions github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ API Public API-impacting changes labels Jun 24, 2025
@nikita-savelyevv nikita-savelyevv changed the title Introduce valid group size value search during weights compression [WC] Introduce flexible group size value search Jun 24, 2025
@nikita-savelyevv
Copy link
Collaborator Author

nikita-savelyevv commented Jun 24, 2025

During integration with optimum-intel I propose to introduce a boolean CLI argument called --flexible-group-size to expose this logic. min_flexible_group_size will only by available through Python API.

"""

enable_flexible_group_size: bool = False
min_flexible_group_size: int = 16
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of 16 is open for debate. Possibly, it should be larger.

@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review June 24, 2025 14:42
@nikita-savelyevv nikita-savelyevv requested a review from a team as a code owner June 24, 2025 14:42
Copy link
Contributor

@ljaljushkin ljaljushkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot removed NNCF PT Pull requests that updates NNCF PyTorch NNCF PTQ Pull requests that updates NNCF PTQ labels Jul 2, 2025
@nikita-savelyevv nikita-savelyevv requested a review from alexsu52 July 9, 2025 13:07
Copy link

@alexsu52 alexsu52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsu52 alexsu52 merged commit 12c9995 into openvinotoolkit:develop Jul 10, 2025
20 checks passed
alexsu52 pushed a commit that referenced this pull request Jul 15, 2025
### Changes

- Replaced boolean `enable_flexible_group_size` with a
`group_size_fallback_mode` enum. Possible values are NONE, IGNORE,
ADJUST. Meaning:
  - ERROR: raise exception if can't divide by group size.
  - IGNORE: node with invalid group size won't be compressed at all.
- ADJUST: the same as with `enable_flexible_group_size=True` on develop,
i.e. compute new group size if possible, otherwise compress to backup
precision.
- Renamed `min_flexible_group_size` to `min_adjusted_group_size`.

Set `group_size_fallback_mode` to IGNORE by default.

Users are informed the following way depending on the selected fallback
mode:
- ERROR: exception is raised with a suggestion to set
`group_size_fallback_mode` to IGNORE or ADJUST.
- IGNORE: a info message is logged that some nodes will be ignored.
- ADJUST: an info message is logged that some nodes will have an
adjusted group size value / compressed to backup mode.

### Reason for changes

UX improvement: now the default behavior won't result in an exception.

### Related tickets

167337

### Tests

Adopted the tests introduced in #3556.
nikita-savelyevv pushed a commit to AlexanderDokuchaev/nncf that referenced this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Public API-impacting changes NNCF OpenVINO Pull requests that updates NNCF OpenVINO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants