-
Notifications
You must be signed in to change notification settings - Fork 259
[Torch FX] Compress PT2E Support #3663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…match in signatures in prepare_pt2e.
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/experimental/torch/fx/quantization/quantizer/__init__.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I see the PR with OpenVINOQuantizer?
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression | ||
|
||
|
||
class WeightsCompressionPT2E(Algorithm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm is not designed for the PT2E, this is experimental WC algorithm which could be implemented in any backend
class WeightsCompressionPT2E(Algorithm): | |
class WeightCompression(Algorithm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I rename it to ExperimentalWeightCompression
instead? since it could be confused with the original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is inside the experimental directory, that should be descriptive enough. I suggest the WeightCompression
name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doner
|
||
import torch | ||
|
||
import nncf # type: ignore[import-untyped] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why # type: ignore[import-untyped]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to update the typehint ignores since I copied them off my scripts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
) -> torch.fx.GraphModule: | ||
self._quantizer = quantizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typehints an docstring are missing
model, | ||
parameters={ | ||
"mode": self._mode.value, | ||
"mode": self._mode.value if not isinstance(self._mode, str) else self._mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the str mode here? Can we force self._mode
to always be an ENUM param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just pass the dataset param to the quantizer.get_nncf_weight_compression_parameters
and simplify the pipeline? With that, we don't need get_nodes_to_compress
and collect_weight_compression_statistics
methods in WC algorithm
def get_quantization_setup(self, model: torch.fx.GraphModule, nncf_graph: NNCFGraph) -> SingleConfigQuantizerSetup: | ||
return self._quantizer.get_nncf_quantization_setup(model, nncf_graph) | ||
|
||
def get_weight_compression_setup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use the word setup
in context of WC
def get_weight_compression_setup( | |
def get_weight_compression_params( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
…e static. Accepts data_aware_mixed_precision and data_aware_algo flags and mixed precision algorithm as input
src/nncf/quantization/algorithms/weight_compression/algorithm.py
Outdated
Show resolved
Hide resolved
for weight_param in primary_precision_weight_params: | ||
weight_param.compression_config = self._get_primary_config(group_size_values[weight_param.weight_name]) | ||
|
||
for weight_param in ratio_defining_params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for weight_param in ratio_defining_params: | |
# ratio_defining_params are all in primary precision. Update parameters which set to backup precision | |
# by the mixed precision algorithm. | |
for weight_param in ratio_defining_params: |
): | ||
# Collect statistics for the weights compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
:return: A tuple consisting of a list of weight compression parameters, based on the Weight | ||
Compression algorithm configuration, and a mapping of target node names to the | ||
collected statistics. | ||
:return: A tuple consisting of a list of all weight compression parameters, based on the Weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the docstirng as discussed offline
from nncf.common.utils.backend import BackendType | ||
from nncf.experimental.quantization.quantizer import Quantizer | ||
from nncf.quantization.algorithms.algorithm import Algorithm | ||
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression | |
from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch! I will change it.
…is is because mixed precision algo has clashes in ranking layer sensitivities with very few samples.
Changes
Introduced a new API to offer weights compression algorithm for quantizers defined in torch.ao.
Currently only supports OpenVINO Quantizer.
Reason for changes
To support Quantizers defined in torch ao.
Related tickets
169342