Skip to content

Conversation

anzr299
Copy link
Collaborator

@anzr299 anzr299 commented Sep 22, 2025

Changes

Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.

Reason for changes

To support Quantizers defined in torch ao.

Related tickets

169342

@anzr299 anzr299 requested a review from a team as a code owner September 22, 2025 14:43
@github-actions github-actions bot added the API Public API-impacting changes label Sep 22, 2025
@anzr299 anzr299 marked this pull request as draft September 22, 2025 14:56
@daniil-lyakhov daniil-lyakhov self-requested a review September 22, 2025 15:03
Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I see the PR with OpenVINOQuantizer?

from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression


class WeightsCompressionPT2E(Algorithm):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is not designed for the PT2E, this is experimental WC algorithm which could be implemented in any backend

Suggested change
class WeightsCompressionPT2E(Algorithm):
class WeightCompression(Algorithm):

Copy link
Collaborator Author

@anzr299 anzr299 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I rename it to ExperimentalWeightCompression instead? since it could be confused with the original

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inside the experimental directory, that should be descriptive enough. I suggest the WeightCompression name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doner


import torch

import nncf # type: ignore[import-untyped]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why # type: ignore[import-untyped] here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to update the typehint ignores since I copied them off my scripts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 34 to 35
) -> torch.fx.GraphModule:
self._quantizer = quantizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehints an docstring are missing

model,
parameters={
"mode": self._mode.value,
"mode": self._mode.value if not isinstance(self._mode, str) else self._mode,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the str mode here? Can we force self._mode to always be an ENUM param?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass the dataset param to the quantizer.get_nncf_weight_compression_parameters and simplify the pipeline? With that, we don't need get_nodes_to_compress and collect_weight_compression_statistics methods in WC algorithm

def get_quantization_setup(self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph) -> SingleConfigQuantizerSetup:
return self._quantizer.get_nncf_quantization_setup(model, nncf_graph)

def get_weight_compression_setup(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use the word setup in context of WC

Suggested change
def get_weight_compression_setup(
def get_weight_compression_params(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for weight_param in primary_precision_weight_params:
weight_param.compression_config = self._get_primary_config(group_size_values[weight_param.weight_name])

for weight_param in ratio_defining_params:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for weight_param in ratio_defining_params:
# ratio_defining_params are all in primary precision. Update parameters which set to backup precision
# by the mixed precision algorithm.
for weight_param in ratio_defining_params:

Comment on lines +961 to +962
):
# Collect statistics for the weights compression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description

:return: A tuple consisting of a list of weight compression parameters, based on the Weight
Compression algorithm configuration, and a mapping of target node names to the
collected statistics.
:return: A tuple consisting of a list of all weight compression parameters, based on the Weight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docstirng as discussed offline

from nncf.common.utils.backend import BackendType
from nncf.experimental.quantization.quantizer import Quantizer
from nncf.quantization.algorithms.algorithm import Algorithm
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good catch! I will change it.

@github-actions github-actions bot added the NNCF PT Pull requests that updates NNCF PyTorch label Oct 10, 2025
…is is because mixed precision algo has clashes in ranking layer sensitivities with very few samples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Public API-impacting changes NNCF PT Pull requests that updates NNCF PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants