diff --git a/README.md b/README.md index 74ba162c7e5..5b7943f7da8 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,6 @@ learning frameworks. | :-------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | | [Quantization Aware Training](./docs/usage/training_time_compression/quantization_aware_training/Usage.md) | Supported | | [Weight-Only Quantization Aware Training with LoRA and NLS](./docs/usage/training_time_compression/quantization_aware_training_lora/Usage.md) | Supported | -| [Mixed-Precision Quantization](./docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#mixed-precision-quantization) | Supported | - Automatic, configurable model graph transformation to obtain the compressed model. - Common interface for compression methods. diff --git a/docs/FAQ.md b/docs/FAQ.md index fe52ed3ff91..35e1828a2c2 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -119,38 +119,6 @@ See the answer to the above question. Additional parameters are part of the comp Currently NNCF PyTorch can only properly handle models with acyclic execution graphs. RNNs, which inherently have cycles, can behave oddly when processed with NNCF PyTorch, which includes loss of quality, unreproducible results and failure to compress. - - -### I get a `Could not deduce the forward arguments from the initializing dataloader output.` runtime error when executing `create_compressed_model` - -Dataloaders can return anything, and this output may be preprocessed in the rest of the training pipeline before actually ending up in model's `forward` method. -NNCF needs a dataloader already at the compressed model creation stage, e.g. before training, and doesn't in general know about the further preprocessing (turning the output of `v8_dataloader` into actual `forward` args and kwargs. -You have to give NNCF this information by wrapping your dataloader object in an own subclass of a `nncf.torch.initialization.PTInitializingDataLoader` object that properly defines the `get_inputs` and `get_target` abstract methods: - -```python -from nncf.torch.initialization import PTInitializingDataLoader - -class MyInitializingDataLoader(PTInitializingDataLoader): - def get_inputs(self, dataloader_output: Any) -> Tuple[Tuple, Dict]: - # your implementation - `dataloader_output` is what is returned by your dataloader, - # and you have to turn it into a (args, kwargs) tuple that is required by your model - # in this function, for instance, if your dataloader returns dictionaries where - # the input image is under key `"img"`, and your YOLOv8 model accepts the input - # images as 0-th `forward` positional arg, you would do: - return (dataloader_output["img"],), {} - - def get_target(self, dataloader_output: Any) -> Any: - # and in this function you should extract the "ground truth" value from your - # dataloader, so, for instance, if your dataloader output is a dictionary where - # ground truth images are under a "gt" key, then here you would write: - return dataloader_output["gt"] - -init_dataloader = MyInitializingDataLoader(my_dataloader) -# now you pass this wrapped object instead of your original dataloader into the `register_default_init_args` -nncf_config = register_default_init_args(nncf_config, init_dataloader) -# and then call `create_compressed_model` with that config file as usual. -``` - ## ONNX *To be filled* diff --git a/docs/usage/post_training_compression/post_training_quantization/Usage.md b/docs/usage/post_training_compression/post_training_quantization/Usage.md index cae45b14656..6ea109a3459 100644 --- a/docs/usage/post_training_compression/post_training_quantization/Usage.md +++ b/docs/usage/post_training_compression/post_training_quantization/Usage.md @@ -2,7 +2,7 @@ Post-Training Quantization is a quantization algorithm that doesn't demand retraining of a quantized model. It utilizes a small subset of the initial dataset to calibrate quantization constants. -Please refer to this [document](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md) for details of the implementation. +Please refer to this [document](/docs/usage/training_time_compression/Quantization.md) for details of the implementation. NNCF provides an advanced Post-Training Quantization algorithm, which consists of the following techniques: diff --git a/docs/usage/post_training_compression/weights_compression/Usage.md b/docs/usage/post_training_compression/weights_compression/Usage.md index b17c94b1115..1400cbeff10 100644 --- a/docs/usage/post_training_compression/weights_compression/Usage.md +++ b/docs/usage/post_training_compression/weights_compression/Usage.md @@ -27,8 +27,8 @@ By default, the algorithm applies asymmetric 8-bit integer quantization (INT8_AS | Compression Mode | Element type | Scale type | Granularity | Description | |------------------|--------------|------------|--------------------------|----------------------------| -| INT8_ASYM | INT8 | FP16 | Per-channel | [Asymmetric quantization](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#asymmetric-quantization) | -| INT8_SYM | INT8 | FP16 | Per-channel | [Symmetric quantization](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#symmetric-quantization) | +| INT8_ASYM | INT8 | FP16 | Per-channel | [Asymmetric quantization](/docs/usage/training_time_compression/Quantization.md#asymmetric-quantization) | +| INT8_SYM | INT8 | FP16 | Per-channel | [Symmetric quantization](/docs/usage/training_time_compression/Quantization.md#symmetric-quantization) | #### Mixed precision modes @@ -40,8 +40,8 @@ NNCF can automatically distribute precision assignments based on quantization se | Compression Mode | Element type | Scale type | Granularity | Description | |------------------|--------------|------------|--------------------------|-------------| -| INT4_SYM | INT4 | FP16 | Per-channel / Group-wise | [Symmetric quantization](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#symmetric-quantization) | -| INT4_ASYM | INT4 | FP16 | Per-channel / Group-wise | [Asymmetric quantization](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md#asymmetric-quantization) | +| INT4_SYM | INT4 | FP16 | Per-channel / Group-wise | [Symmetric quantization](/docs/usage/training_time_compression/Quantization.md#symmetric-quantization) | +| INT4_ASYM | INT4 | FP16 | Per-channel / Group-wise | [Asymmetric quantization](/docs/usage/training_time_compression/Quantization.md#asymmetric-quantization) | | NF4 | FP32 | FP16 | Per-channel / Group-wise | [NormalFloat-4](https://arxiv.org/pdf/2305.14314v1.pdf) lookup table with 16 FP32 values | | CODEBOOK | Any | FP16 | Per-channel / Group-wise | Arbitrary lookup table (codebook) | | CB4_F8E4M3 | E4M3 | FP16 | Per-channel / Group-wise | A fixed lookup table with 16 E4M3 values based on NF4 values | diff --git a/docs/usage/training_time_compression/Quantization.md b/docs/usage/training_time_compression/Quantization.md new file mode 100644 index 00000000000..7488f63d417 --- /dev/null +++ b/docs/usage/training_time_compression/Quantization.md @@ -0,0 +1,158 @@ +# Uniform Quantization with Fine-Tuning + +A uniform "fake" quantization method supports an arbitrary number of bits (>=2) which is used to represent weights and activations. +The method performs differentiable sampling of the continuous signal (for example, activations or weights) during forward pass, simulating inference with integer arithmetic. + +## Common Quantization Formula + +Quantization is parametrized by clamping range and number of quantization levels. The sampling formula is the following: + +$ZP = \lfloor - input\\_low * s \rceil$ + +$output = \frac{\left\lfloor (clamp(input; input\\_low, input\\_high)-input\\_low) * s- ZP \right\rceil} {s}$ + +$clamp(input; input\\_low, input\\_high)$ + +$s = \frac{levels - 1}{input\\_high - input\\_low}$ + +$input\\_low$ and $input\\_high$ represent the quantization range and $\left\lfloor \cdot \right\rceil$ denotes rounding to the nearest integer. + +## Symmetric Quantization + +During the training, we optimize the **scale** parameter that represents the range `[input_low, input_range]` of the original signal using gradient descent: + +$input\\_low=scale*\frac{level\\_low}{level\\_high}$ + +$input\\_high=scale$ + +In the formula above, $level\\_low$ and $level\\_high$ represent the range of the discrete signal. + +- For weights: + + $level\\_low=-2^{bits-1}+1$ + + $level\\_high=2^{bits-1}-1$ + + $levels=255$ + +- For unsigned activations: + + $level\\_low=0$ + + $level\\_high=2^{bits}-1$ + + $levels=256$ + +- For signed activations: + + $level\\_low=-2^{bits-1}$ + + $level\\_high=2^{bits-1}-1$ + + $levels=256$ + +For all the cases listed above, the common quantization formula is simplified after substitution of $input\\_low$, $input\\_high$ and $levels$: + +$output = \left\lfloor clamp(input * \frac{level\\_high}{scale}, level\\_low, level\\_high)\right \rceil * \frac{scale}{level\\_high}$ + +Use the `num_init_samples` parameter from the `initializer` group to initialize the values of `scale` and determine which activation should be signed or unsigned from the collected statistics using given number of samples. + +## Asymmetric Quantization + +During the training we optimize the `input_low` and `input_range` parameters using gradient descent: + +$input\\_high=input\\_low + input\\_range$ + +$levels=256$ + +$level\\_low=0$ + +$level\\_high=2^{bits}-1$ + +For better accuracy, floating-point zero should be within quantization range and strictly mapped into quant (without rounding). Therefore, the following scheme is applied to ranges of weight and activation quantizers before applying actual quantization: + +${input\\_low}' = min(input\\_low, 0)$ + +${input\\_high}' = max(input\\_high, 0)$ + +$ZP= \left\lfloor \frac{-{input\\_low}'*(levels-1)}{{input\\_high}'-{input\\_low}'} \right \rceil$ + +${input\\_high}''=\frac{ZP-levels+1}{ZP}*{input\\_low}'$ + +${input\\_low}''=\frac{ZP}{ZP-levels+1}*{input\\_high}'$ + +$$ +\begin{flalign} & +{input\\_low,input\\_high} = \begin{cases} {input\\_low}',{input\\_high}', \& ZP \in {0,levels-1} \\ +{input\\_low}',{input\\_high}'', \& {input\\_high}'' - {input\\_low}' > {input\\_high}' - {input\\_low}'' \\ +{input\\_low}'',{input\\_high}', \& {input\\_high}'' - {input\\_low}' <= {input\\_high}' - {input\\_low}'' +\end{cases} +&\end{flalign} +$$ + +You can use the `num_init_samples` parameter from the `initializer` group to initialize the values of `input_low` and `input_range` from the collected statistics using given number of samples. + +## Quantizer setup and hardware config files + +NNCF allows to quantize models for best results on a given Intel hardware type when executed using OpenVINO runtime. +To achieve this, the quantizer setup should be performed with following considerations in mind: + +1. every operation that can accept quantized inputs on a given HW (i.e. can be executed using quantized input values) should have its inputs quantized in NNCF +2. the quantized inputs should be quantized with a configuration that is supported on a given HW for a given operation (e.g. per-tensor vs per-channel quantization, or 8 bits vs. 4 bits) +3. for operations that are agnostic to quantization, the execution should handle quantized tensors rather than full-precision tensors. +4. certain operation sequences will be runtime-optimized to execute in a single kernel call ("fused"), and additional quantizer insertion/quantization simulation within such operation sequences will be detrimental to overall performance + +These requirements are fulfilled by the quantizer propagation algorithm. +The algorithm first searches the internal NNCF representation of the model's control flow graph for predefined patterns that are "fusible", and apply the fusing to the internal graph representation as well. +Next, the operations in the graph that can be associated to input-quantizable operations on a given target hardware are assigned a single quantizer for each its quantizable activation input, with a number of possible quantizer configurations attached to it (that are feasible on target HW). +The quantizers are then "propagated" against the data flow in the model's control flow graph as far as possible, potentially merging with other quantizers. +Once all quantizers have reached a standstill in their propagation process, each will have a final (possibly reduced) set of possible quantizer configurations, from which a single one is either chosen manually, or using a precision initialization algorithm (which accepts the potential quantizer locations and associated potential quantizer configuration sets). +The resulting configuration is then applied as a final quantizer setup configuration. + +Note that this algorithm applies to activation quantization only - the weight quantizers do not require propagation. +However, the possible configurations of weight quantizers themselves are also sourced from the HW config file definitions. + +The HW to target for a given quantization algorithm run can be specified in NNCF config using the global `"target_device"` option. +The default corresponds to CPU-friendly quantization. +`"TRIAL"` corresponds to a configuration that uses the general quantizer propagation algorithm, but does not use any HW-specific information about quantizability of given operation types or possible quantizer configs for associated inputs or operation weights. +Instead it uses a default, basic 8-bit symmetric per-tensor quantization configuration for each quantizer, and quantizes inputs of a certain default operation set, which at the moment is defined internally in NNCF. +The quantization configuration in the `"target_device": "TRIAL"` case may be overridden using the regular `"activations"` and `"weights"` sections in the quantization compression algorithm sub-config, see below. + +For all target HW types, parts of the model graph can be marked as non-quantizable by using the `"ignored_scopes"` field - inputs and weights of matching nodes in the NNCF internal graph representation will not be quantized, and the downstream quantizers will not propagate upwards through such nodes. + +## Quantization Implementation + +In our implementation, we use a slightly transformed formula. It is equivalent by order of floating-point operations to simplified symmetric formula and the asymmetric one. The small difference is addition of small positive number `eps` to prevent division by zero and taking absolute value of range, since it might become negative on backward: + +$output = \frac{clamp(\left\lfloor (input-input\\_low^{*}) *s - ZP \right \rceil, level\\_low, level\\_high)}{s}$ + +$s = \frac{level\\_high}{|input\\_range^{*}| + eps}$ + +$ZP = \lfloor-input\\_low * s\rceil$ + +For asymmetric: + +$input\\_low^{*} = input\\_low$ + +$input\\_range^{*} = input\\_range$ + +For symmetric: + +$input\\_low^{*} = 0$ + +$input\\_range^{*} = scale$ + +The most common case of applying quantization is 8-bit uniform quantization. + +--- + +**NOTE** + +There is a known issue with AVX2 and AVX512 CPU devices. The issue appears with 8-bit matrix calculations with tensors which elements are close to the maximum or saturated. +AVX2 and AVX512 utilize a 16-bit register to store the result of operations on tensors. In case when tensors are saturated the buffer overflow happens. +This leads to accuracy degradation. For more details of the overflow issue please refer [here](https://www.intel.com/content/www/us/en/developer/articles/technical/lower-numerical-precision-deep-learning-inference-and-training.html). + +To fix this issue inside NNCF, by default, all weight tensors are quantized in 8 bits but only 7 bits are effectively used. +This regime is used when `target_device=TargetDevice.CPU` or `target_device=TargetDevice.ANY` set. This fix, potentially, requires longer fine-tuning. + +To control the application of overflow fix, `nncf.AdvancedQuantizationParameters(overflow_fix=OverflowFix.ENABLE)` config option is introduced. diff --git a/docs/usage/training_time_compression/other_algorithms/BatchnormAdaptation.md b/docs/usage/training_time_compression/other_algorithms/BatchnormAdaptation.md deleted file mode 100644 index 47df400808c..00000000000 --- a/docs/usage/training_time_compression/other_algorithms/BatchnormAdaptation.md +++ /dev/null @@ -1,48 +0,0 @@ -# Batch-norm statistics adaptation - -After the compression-related changes in the model have been committed, the statistics of the batchnorm layers (per-channel rolling means and variances of activation tensors) can be updated by passing several batches of data through the model before the fine-tuning starts. -This allows to correct the compression-induced bias in the model and reduce the corresponding accuracy drop even before model training. -This option is common for quantization, magnitude sparsity and filter pruning algorithms. -It can be enabled by setting a non-zero value of `num_bn_adaptation_samples` in the `batchnorm_adaptation` section of the `initializer` configuration - see [NNCF config schema](https://openvinotoolkit.github.io/nncf/) for reference. - -Note that in order to use batchnorm adaptation for your model, you must supply to NNCF a data loader using a `register_default_init_args` helper function or by registering a `nncf.config.structures.BNAdaptationInitArgs` structure within the `NNCFConfig` object in your integration code. - -## Example configuration files - ->_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_. - -- Apply batchnorm adaptation for 2048 samples (rounded to nearest batch size multiple) during model quantization: - -```json5 -{ - "input_info": {"sample_size" : [1, 3, 224, 224]}, // the input shape of your model may vary - "compression": { - "algorithm": "quantization", - "initializer": { - "batchnorm_adaptation": { - "num_bn_adaptation_samples": 2048 - } - } - } -} -``` - -- Apply batchnorm adaptation for 32 samples (rounded to nearest batch size multiple) during model magnitude-based sparsification: - -```json5 -{ - "input_info": {"sample_size" : [1, 3, 224, 224]}, // the input shape of your model may vary - "compression": { - "algorithm": "magnitude_sparsity", - "initializer": { - "batchnorm_adaptation": { - "num_bn_adaptation_samples": 32 - } - }, - "params": { - "sparsity_target": 0.5, - "sparsity_target_epoch": 10 - } - } -} -``` diff --git a/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md b/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md deleted file mode 100644 index 0c5296b82e4..00000000000 --- a/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md +++ /dev/null @@ -1,386 +0,0 @@ -# Uniform Quantization with Fine-Tuning - ->_Scroll down for the examples of the JSON configuration files that can be used to apply this algorithm_. - -A uniform "fake" quantization method supports an arbitrary number of bits (>=2) which is used to represent weights and activations. -The method performs differentiable sampling of the continuous signal (for example, activations or weights) during forward pass, simulating inference with integer arithmetic. - -## Common Quantization Formula - -Quantization is parametrized by clamping range and number of quantization levels. The sampling formula is the following: - -$ZP = \lfloor - input\\_low * s \rceil$ - -$output = \frac{\left\lfloor (clamp(input; input\\_low, input\\_high)-input\\_low) * s- ZP \right\rceil} {s}$ - -$clamp(input; input\\_low, input\\_high)$ - -$s = \frac{levels - 1}{input\\_high - input\\_low}$ - -$input\\_low$ and $input\\_high$ represent the quantization range and $\left\lfloor \cdot \right\rceil$ denotes rounding to the nearest integer. - -## Symmetric Quantization - -During the training, we optimize the **scale** parameter that represents the range `[input_low, input_range]` of the original signal using gradient descent: - -$input\\_low=scale*\frac{level\\_low}{level\\_high}$ - -$input\\_high=scale$ - -In the formula above, $level\\_low$ and $level\\_high$ represent the range of the discrete signal. - -- For weights: - - $level\\_low=-2^{bits-1}+1$ - - $level\\_high=2^{bits-1}-1$ - - $levels=255$ - -- For unsigned activations: - - $level\\_low=0$ - - $level\\_high=2^{bits}-1$ - - $levels=256$ - -- For signed activations: - - $level\\_low=-2^{bits-1}$ - - $level\\_high=2^{bits-1}-1$ - - $levels=256$ - -For all the cases listed above, the common quantization formula is simplified after substitution of $input\\_low$, $input\\_high$ and $levels$: - -$output = \left\lfloor clamp(input * \frac{level\\_high}{scale}, level\\_low, level\\_high)\right \rceil * \frac{scale}{level\\_high}$ - -Use the `num_init_samples` parameter from the `initializer` group to initialize the values of `scale` and determine which activation should be signed or unsigned from the collected statistics using given number of samples. - -## Asymmetric Quantization - -During the training we optimize the `input_low` and `input_range` parameters using gradient descent: - -$input\\_high=input\\_low + input\\_range$ - -$levels=256$ - -$level\\_low=0$ - -$level\\_high=2^{bits}-1$ - -For better accuracy, floating-point zero should be within quantization range and strictly mapped into quant (without rounding). Therefore, the following scheme is applied to ranges of weight and activation quantizers before applying actual quantization: - -${input\\_low}' = min(input\\_low, 0)$ - -${input\\_high}' = max(input\\_high, 0)$ - -$ZP= \left\lfloor \frac{-{input\\_low}'*(levels-1)}{{input\\_high}'-{input\\_low}'} \right \rceil$ - -${input\\_high}''=\frac{ZP-levels+1}{ZP}*{input\\_low}'$ - -${input\\_low}''=\frac{ZP}{ZP-levels+1}*{input\\_high}'$ - -$$ -\begin{flalign} & -{input\\_low,input\\_high} = \begin{cases} {input\\_low}',{input\\_high}', \& ZP \in {0,levels-1} \\ -{input\\_low}',{input\\_high}'', \& {input\\_high}'' - {input\\_low}' > {input\\_high}' - {input\\_low}'' \\ -{input\\_low}'',{input\\_high}', \& {input\\_high}'' - {input\\_low}' <= {input\\_high}' - {input\\_low}'' -\end{cases} -&\end{flalign} -$$ - -You can use the `num_init_samples` parameter from the `initializer` group to initialize the values of `input_low` and `input_range` from the collected statistics using given number of samples. - -## Quantizer setup and hardware config files - -NNCF allows to quantize models for best results on a given Intel hardware type when executed using OpenVINO runtime. -To achieve this, the quantizer setup should be performed with following considerations in mind: - -1. every operation that can accept quantized inputs on a given HW (i.e. can be executed using quantized input values) should have its inputs quantized in NNCF -2. the quantized inputs should be quantized with a configuration that is supported on a given HW for a given operation (e.g. per-tensor vs per-channel quantization, or 8 bits vs. 4 bits) -3. for operations that are agnostic to quantization, the execution should handle quantized tensors rather than full-precision tensors. -4. certain operation sequences will be runtime-optimized to execute in a single kernel call ("fused"), and additional quantizer insertion/quantization simulation within such operation sequences will be detrimental to overall performance - -These requirements are fulfilled by the quantizer propagation algorithm. -The algorithm first searches the internal NNCF representation of the model's control flow graph for predefined patterns that are "fusible", and apply the fusing to the internal graph representation as well. -Next, the operations in the graph that can be associated to input-quantizable operations on a given target hardware are assigned a single quantizer for each its quantizable activation input, with a number of possible quantizer configurations attached to it (that are feasible on target HW). -The quantizers are then "propagated" against the data flow in the model's control flow graph as far as possible, potentially merging with other quantizers. -Once all quantizers have reached a standstill in their propagation process, each will have a final (possibly reduced) set of possible quantizer configurations, from which a single one is either chosen manually, or using a precision initialization algorithm (which accepts the potential quantizer locations and associated potential quantizer configuration sets). -The resulting configuration is then applied as a final quantizer setup configuration. - -Note that this algorithm applies to activation quantization only - the weight quantizers do not require propagation. -However, the possible configurations of weight quantizers themselves are also sourced from the HW config file definitions. - -The HW to target for a given quantization algorithm run can be specified in NNCF config using the global `"target_device"` option. -The default corresponds to CPU-friendly quantization. -`"TRIAL"` corresponds to a configuration that uses the general quantizer propagation algorithm, but does not use any HW-specific information about quantizability of given operation types or possible quantizer configs for associated inputs or operation weights. -Instead it uses a default, basic 8-bit symmetric per-tensor quantization configuration for each quantizer, and quantizes inputs of a certain default operation set, which at the moment is defined internally in NNCF. -The quantization configuration in the `"target_device": "TRIAL"` case may be overridden using the regular `"activations"` and `"weights"` sections in the quantization compression algorithm sub-config, see below. - -For all target HW types, parts of the model graph can be marked as non-quantizable by using the `"ignored_scopes"` field - inputs and weights of matching nodes in the NNCF internal graph representation will not be quantized, and the downstream quantizers will not propagate upwards through such nodes. - -## Quantization Implementation - -In our implementation, we use a slightly transformed formula. It is equivalent by order of floating-point operations to simplified symmetric formula and the asymmetric one. The small difference is addition of small positive number `eps` to prevent division by zero and taking absolute value of range, since it might become negative on backward: - -$output = \frac{clamp(\left\lfloor (input-input\\_low^{*}) *s - ZP \right \rceil, level\\_low, level\\_high)}{s}$ - -$s = \frac{level\\_high}{|input\\_range^{*}| + eps}$ - -$ZP = \lfloor-input\\_low * s\rceil$ - -For asymmetric: - -$input\\_low^{*} = input\\_low$ - -$input\\_range^{*} = input\\_range$ - -For symmetric: - -$input\\_low^{*} = 0$ - -$input\\_range^{*} = scale$ - -The most common case of applying quantization is 8-bit uniform quantization. - ---- - -**NOTE** - -There is a known issue with AVX2 and AVX512 CPU devices. The issue appears with 8-bit matrix calculations with tensors which elements are close to the maximum or saturated. -AVX2 and AVX512 utilize a 16-bit register to store the result of operations on tensors. In case when tensors are saturated the buffer overflow happens. -This leads to accuracy degradation. For more details of the overflow issue please refer [here](https://www.intel.com/content/www/us/en/developer/articles/technical/lower-numerical-precision-deep-learning-inference-and-training.html). - -To fix this issue inside NNCF, by default, all weight tensors are quantized in 8 bits but only 7 bits are effectively used. -This regime is used when `"target_device": "CPU"` or `"target_device": "ANY"` set. This fix, potentially, requires longer fine-tuning. - -To control the application of overflow fix, `"overflow_fix"` config option is introduced. The default value is `"overflow_fix": "enable"`. To apply the overflow issue fix only to the first layer, use `"overflow_fix": "first_layer_only"`. To disable the overflow issue fix for all layers, use `"overflow_fix": "disable"`. - ---- - - - -## Mixed-Precision Quantization - -Quantization to lower precisions (e.g. 6, 4, 2 bits) is an efficient way to accelerate inference of neural networks. -Although NNCF supports quantization with an arbitrary number of bits to represent weights and activations values, -choosing ultra-low bitwidth could noticeably affect the model's accuracy. A good trade-off between accuracy and performance is achieved by assigning different precisions to different layers. NNCF provides two automatic precision assignment algorithms, namely **HAWQ** and **AutoQ**. - -### HAWQ - -NNCF utilizes the [HAWQ-v2](https://arxiv.org/pdf/1911.03852.pdf) method to automatically choose optimal mixed-precision -configuration by taking into account the sensitivity of each layer, i.e. how much lower-bit quantization of each layer -decreases the accuracy of model. The most sensitive layers are kept at higher precision. The sensitivity of the i-th layer is -calculated by multiplying the average Hessian trace with the L2 norm of quantization perturbation: - -$\overline{Tr}(H_{i}) * \left \|\| Q(W_{i}) - W_{i} \right \|\|^2_2$ - -The sum of the sensitivities for each layer forms a metric which serves as a proxy to the accuracy of the compressed -model: the lower the metric, the more accurate should be the corresponding mixed precision model on the validation -dataset. - -To find the optimal trade-off between accuracy and performance of the mixed precision model we also compute a -compression ratio - the ratio between **bit complexity** of a fully INT8 model and mixed-precision lower bitwidth one. -The bit complexity of the model is a sum of bit complexities for each quantized layer, which are defined as a product -of the layer FLOPS and the quantization bitwidth. The optimal configuration is found by calculating the sensitivity -metric and the compression ratio for all possible bitwidth settings and selecting the one with the minimal metric value -among all configurations with a compression ratio below the specified threshold. - -By default, the compression ratio is 1.5. It should be enough to compress the model with no more than 1% accuracy drop. -But if it doesn't happen, the lower ratio can be set by `compression_ratio` parameter in the `precision` section of -configuration file. E.g. uniformly int8 quantized model is 1 in compression ratio, 2 - for uniform int4 quantization, 0.25 - for uniform int32 quantization. - -To avoid the exponential search procedure, we apply the following restriction: layers with a small average Hessian -trace value are quantized to lower bitwidth and vice versa. - -The Hessian trace is estimated with the randomized [Hutchinson algorithm](https://www.researchgate.net/publication/220432178_Randomized_Algorithms_for_Estimating_the_Trace_of_an_Implicit_Symmetric_Positive_Semi-Definite_Matrix). -Given Rademacher distributed random vector v, the trace of symmetric matrix H is equal to the estimation of a quadratic form: - -$Tr(H) = \mathbb{E}[v^T H v]$ - -The randomized algorithm solves the expectation by Monte Carlo using sampling of v from its distribution, evaluating -the quadratic term, and averaging: - -$Tr(H) \approx \frac{1}{m}\sum\limits_{i=1}^{m}[v_i^T H v_i]$ - -Evaluation of the quadratic term happens by computing ![Hv](https://latex.codecogs.com/png.latex?Hv) - the result -of multiplication of the Hessian matrix with a given random vector v, without the explicit formation of the Hessian operator. -For gradient of the loss with respect to the i-th block ![g_i](https://latex.codecogs.com/png.latex?g_i) and for -a random vector v, which is independent of ![W_i](https://latex.codecogs.com/png.latex?W_i), we have the equation: - -$\frac{\partial(g_i^T v)}{\partial W_i} = H_i v$ - -where $H_i$ is the Hessian matrix of loss with respect to -$W_i$. Hence $Hv$ can be -computed by 2 backpropagation passes: first - with respect to the loss and second - with respect to the product of the -gradients and a random vector. - -The aforementioned procedure sets bitwidth for weight quantizers only. Bitwidth for activation quantizers is assigned -on the next step in two ways: strict or liberal. All quantizers between modules with quantizable inputs have the same -bitwidth in the strict mode. Liberal mode allows different precisions within the group. For both cases, bitwidth is -assigned based on the rules of the hardware config. If multiple variants are possible the minimal compatible bitwidth -is chosen. By default, liberal mode is used as it does not reject a large number of possible bitwidth settings. -The `bitwidth_assignment_mode` parameter can override it to the strict one. - -For automatic mixed-precision selection it's recommended to use the following template of configuration file: - -```json - "optimizer": { - "base_lr": 3.1e-4, - "schedule_type": "plateau", - "type": "Adam", - "schedule_params": { - "threshold": 0.1, - "cooldown": 3 - }, - "weight_decay": 1e-05 - }, - "compression": { - "algorithm": "quantization", - "initializer": { - "precision": { - "type": "hawq", - "bits": [4,8] - "compression_ratio": 1.5 - } - } - } -``` - -Note, optimizer parameters are model specific, this template contains optimal ones for ResNet-like models. - -This template uses `plateau` scheduler. Though it usually leads to a lot of epochs of tuning for achieving a good -model's accuracy, this is the most reliable way. Staged quantization is an alternative approach and can be more than -two times faster, but it may require tweaking of hyper-parameters for each model. Please refer to configuration files -ending by `*_staged` for an example of this method. - -The manual mode of mixed-precision quantization is also available by explicitly setting the bitwidth per layer - through `bitwidth_per_scope` parameter. - ---- -**NOTE** - -Precision initialization overrides bits settings specified in `weights` and `activations` sections of configuration -file. - ---- - -## Example configuration files - ->_For the full list of the algorithm configuration parameters via config file, see the corresponding section in the [NNCF config schema](https://openvinotoolkit.github.io/nncf/)_. - -- Quantize a model using default algorithm settings (8-bit, quantizers configuration chosen to be compatible with all Intel target HW types): - -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary - "compression": { - "algorithm": "quantization" - } -} -``` - -- Quantize a model to 8-bit precision targeted for Intel CPUs, with additional constraints of symmetric weight quantization and asymmetric activation quantization: - -```json5 -{ - "input_info": { "sample_size": [1, 3, 32, 32] }, // the input shape of your model may vary - "compression": { - "algorithm": "quantization", - "weights": {"mode": "symmetric"}, - "activations": {"mode": "asymmetric"} - }, - "target_device": "CPU" -} -``` - -- Quantize a model with fully symmetric INT8 quantization and increased number of quantizer range initialization samples (make sure to supply a corresponding data loader in code via `nncf.config.structures.QuantizationRangeInitArgs` or the `register_default_init_args` helper function): - -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary - "compression": { - "algorithm": "quantization", - "mode": "symmetric", - "initializer": { - "range": { "num_init_samples": 5000 } - } - } -} -``` - -- Quantize a model using 4-bit per-channel quantization for experimentation/trial purposes (end-to-end performance and/or compatibility with OpenVINO Inference Engine not guaranteed) - -```json5 -{ - "input_info": { "sample_size": [1, 3, 32, 32] }, // the input shape of your model may vary - "compression": { - "algorithm": "quantization", - "bits": 4, - "per_channel": true - }, - "target_device": "TRIAL" -} -``` - -- Quantize a multi-input model to 8-bit precision targeted for Intel CPUs, with a range initialization performed using percentile statistics (empirically known to be better for NLP models, for example) and excluding some parts of the model from quantization: - -```json5 -{ - "input_info": [ - { - "keyword": "input_ids", - "sample_size": [1, 128], - "type": "long", - "filler": "ones" - }, - { - "keyword": "attention_mask", - "sample_size": [1, 128], - "type": "long", - "filler": "ones" - } - ], // the input shape of your model may vary - "compression": { - "algorithm": "quantization", - "initializer": { - "range": { - "num_init_samples": 64, - "type": "percentile", - "params": { - "min_percentile": 0.01, - "max_percentile": 99.99 - } - } - }, - "ignored_scopes": ["{re}BertSelfAttention\\[self\\]/__add___0", - "RobertaForSequenceClassification/RobertaClassificationHead[classifier]/Linear[out_proj]", - "RobertaForSequenceClassification/RobertaClassificationHead[classifier]/Linear[dense]" - ] - }, - "target_device": "TRIAL" -} -``` - -- Quantize a model to variable bit width using 300 iterations of the AutoQ algorithm, with a target model size (w.r.t the effective parameter storage size) set to 15% of the FP32 model and possible quantizer bitwidths limited to INT2, INT4 or INT8. - -```json5 -{ - "input_info": { "sample_size": [1, 3, 224, 224] }, // the input shape of your model may vary - "compression": { - "algorithm": "quantization", - "initializer": { - "precision": { - "type": "autoq", // or "type": "hawq" - "bits": [2, 4, 8], - "compression_ratio": 0.15, - "iter_number": 300 - } - } - }, - "target_device": "TRIAL" -} -``` diff --git a/docs/usage/training_time_compression/quantization_aware_training/Usage.md b/docs/usage/training_time_compression/quantization_aware_training/Usage.md index eb60b988e05..717eb8ad73b 100644 --- a/docs/usage/training_time_compression/quantization_aware_training/Usage.md +++ b/docs/usage/training_time_compression/quantization_aware_training/Usage.md @@ -3,7 +3,7 @@ This is a step-by-step tutorial on how to integrate the NNCF package into the existing PyTorch projects. The use case implies that the user already has a training pipeline that reproduces training of the model in the floating point precision and pretrained model. The task is to prepare this model for accelerated inference by simulating the compression at train time. -Please refer to this [document](/docs/usage/training_time_compression/other_algorithms/LegacyQuantization.md) for details of the implementation. +Please refer to this [document](/docs/usage/training_time_compression/Quantization.md) for details of the implementation. ## Basic usage diff --git a/src/nncf/torch/__init__.py b/src/nncf/torch/__init__.py index 331227d436d..bd0da5416d5 100644 --- a/src/nncf/torch/__init__.py +++ b/src/nncf/torch/__init__.py @@ -34,13 +34,9 @@ warn_bkc_version_mismatch("torch", BKC_TORCH_SPEC, torch.__version__) -# Required for correct COMPRESSION_ALGORITHMS registry functioning -from nncf.torch.quantization import algo as quantization_algo - # Functions most commonly used in integrating NNCF into training pipelines are # listed below for importing convenience -from nncf.torch.model_creation import create_compressed_model from nncf.torch.model_creation import is_wrapped_model from nncf.torch.model_creation import wrap_model from nncf.torch.model_creation import load_from_config diff --git a/src/nncf/torch/nncf_network.py b/src/nncf/torch/nncf_network.py index 392602577b1..56117db3438 100644 --- a/src/nncf/torch/nncf_network.py +++ b/src/nncf/torch/nncf_network.py @@ -958,12 +958,6 @@ def strip(self, do_copy: bool = True, strip_format: StripFormat = StripFormat.NA :param strip format: Describes the format in which model is saved after strip. :return: The stripped model. """ - if self.compression_controller is None: - # PTQ algorithm does not set compressed controller - from nncf.torch.quantization.strip import strip_quantized_model - - model = deepcopy(self._model_ref) if do_copy else self._model_ref - return strip_quantized_model(model, strip_format=strip_format) return self.compression_controller.strip(do_copy, strip_format=strip_format) def get_reused_parameters(self): diff --git a/src/nncf/torch/quantization/adjust_padding.py b/src/nncf/torch/quantization/adjust_padding.py deleted file mode 100644 index b19435ed23d..00000000000 --- a/src/nncf/torch/quantization/adjust_padding.py +++ /dev/null @@ -1,87 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from collections import namedtuple -from typing import NamedTuple - -import networkx as nx -import torch - -import nncf -from nncf.common.graph import NNCFNodeName -from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.torch.layers import NNCFConv2d -from nncf.torch.module_operations import UpdatePaddingValue -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import QuantizerConfig -from nncf.torch.quantization.layers import SymmetricQuantizer - - -class AdjustPaddingArgs(NamedTuple): - weight_bitwidth: int - activation_quantizer: BaseQuantizer - module_op_node_name: NNCFNodeName - - -class CalculatePaddingAdjustment: - """ - Calculates padding value to perform a workaround for U4 support on NPU. - NPU supports only i4 for weights and activations with zero-point=0 and padding=0. This imposes some limitations on - the quantization scheme we can apply. In case of unsigned input for a quantizer (e.g. output of ReLU) half of - i4 range (8 values) is insufficient to preserve the accuracy. To overcome the problem it is proposed - to transform u4 to i4 in the NPU plugin by shifting the input by half of the quantization range to left. Padding - value should be shifted as well. And to make it zero after the shift (non-zero padding values are not - supported), the model should be trained with padding value equal to the half of the quantization range. - """ - - def __init__(self, activation_quantizer: SymmetricQuantizer): - if not isinstance(activation_quantizer, SymmetricQuantizer): - msg = "Padding adjustment is not supported for not symmetric quantization" - raise nncf.InternalError(msg) - self._activation_quantizer = activation_quantizer - self._is_enabled = True - - def __call__(self, previous_padding_value) -> torch.Tensor: - if self._is_enabled: - scale = self._activation_quantizer.scale - eps = self._activation_quantizer.eps - safe_scale = abs(scale) + eps - return safe_scale / 2 - return previous_padding_value - - @staticmethod - def is_config_applicable(qconfig: QuantizerConfig): - return ( - not qconfig.per_channel - and qconfig.num_bits == 4 - and not qconfig.signedness_to_force - and qconfig.mode == QuantizationMode.SYMMETRIC - ) - - -def add_adjust_padding_nodes(bitwidth_graph: nx.DiGraph, model: NNCFNetwork) -> nx.DiGraph(): - NewNodeArgs = namedtuple("NewNodeArgs", ("node_key", "attr", "parent_node_key")) - nncf_graph = model.nncf.get_graph() - args = [] - for node_key in bitwidth_graph.nodes: - node = nncf_graph.get_node_by_key(node_key) - module = model.nncf.get_containing_module(node.node_name) - if isinstance(module, NNCFConv2d): - adjust_padding_ops = filter(lambda x: isinstance(x, UpdatePaddingValue), module.pre_ops.values()) - for _ in adjust_padding_ops: - new_node_key = f"{node_key}_apad" - attr = dict(type="", label="adjust_padding_value", style="filled", color="yellow") - args.append(NewNodeArgs(new_node_key, attr, node_key)) - - for arg in args: - bitwidth_graph.add_node(arg.node_key, **arg.attr) - bitwidth_graph.add_edge(arg.node_key, arg.parent_node_key) - return bitwidth_graph diff --git a/src/nncf/torch/quantization/algo.py b/src/nncf/torch/quantization/algo.py deleted file mode 100644 index 7aadff199c1..00000000000 --- a/src/nncf/torch/quantization/algo.py +++ /dev/null @@ -1,1638 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -""" -Contains builder and controller class definitions for the quantization algorithm. -""" - -from collections import Counter -from collections import OrderedDict -from copy import deepcopy -from enum import IntEnum -from string import Template -from typing import Any, Optional - -import torch - -import nncf -from nncf.api.compression import CompressionLoss -from nncf.api.compression import CompressionScheduler -from nncf.api.compression import CompressionStage -from nncf.common.deprecation import warning_deprecated -from nncf.common.graph import NNCFGraph -from nncf.common.graph import NNCFNode -from nncf.common.graph.definitions import MODEL_INPUT_OP_NAME -from nncf.common.graph.layer_attributes import ConvolutionLayerAttributes -from nncf.common.graph.layer_attributes import WeightedLayerAttributes -from nncf.common.graph.patterns.manager import PatternsManager -from nncf.common.graph.patterns.manager import TargetDevice -from nncf.common.graph.transformations.commands import TargetType -from nncf.common.graph.utils import get_first_nodes_of_type -from nncf.common.graph.utils import get_target_dim_for_compression_legacy -from nncf.common.graph.utils import get_weight_shape_legacy -from nncf.common.hardware.config import HWConfig -from nncf.common.hardware.config import get_hw_config_type -from nncf.common.initialization.batchnorm_adaptation import BatchnormAdaptationAlgorithm -from nncf.common.logging import nncf_logger -from nncf.common.quantization.config_assignment import assign_qconfig_lists_to_modules -from nncf.common.quantization.quantizer_propagation.structs import IgnoreReason -from nncf.common.quantization.quantizer_setup import DEFAULT_QUANTIZER_CONFIG -from nncf.common.quantization.quantizer_setup import MultiConfigQuantizerSetup -from nncf.common.quantization.quantizer_setup import QuantizationPointId -from nncf.common.quantization.quantizer_setup import QuantizerSetupBase -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.common.quantization.structs import QuantizableWeightedLayerNode -from nncf.common.quantization.structs import QuantizationConstraints -from nncf.common.quantization.structs import QuantizationPreset -from nncf.common.quantization.structs import QuantizerGroup -from nncf.common.quantization.structs import QuantizerId -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.common.schedulers import BaseCompressionScheduler -from nncf.common.statistics import NNCFStatistics -from nncf.common.tensor_statistics.collectors import ReductionAxes -from nncf.common.utils.api_marker import api -from nncf.common.utils.backend import BackendType -from nncf.common.utils.backend import copy_model -from nncf.common.utils.debug import is_debug -from nncf.config import NNCFConfig -from nncf.config.extractors import extract_algo_specific_config -from nncf.config.extractors import extract_bn_adaptation_init_params -from nncf.config.extractors import extract_range_init_params -from nncf.config.schemata.algo.quantization import PRECISION_INIT_TYPES_VS_DESCRIPTION -from nncf.config.schemata.defaults import QUANTIZATION_EXPORT_TO_ONNX_STANDARD_OPS -from nncf.config.schemata.defaults import QUANTIZATION_LOGARITHM_SCALE -from nncf.config.schemata.defaults import QUANTIZATION_OVERFLOW_FIX -from nncf.config.schemata.defaults import QUANTIZATION_PRESET -from nncf.config.schemata.defaults import QUANTIZE_INPUTS -from nncf.config.schemata.defaults import QUANTIZE_OUTPUTS -from nncf.experimental.common.tensor_statistics.statistics import MinMaxTensorStatistic -from nncf.experimental.common.tensor_statistics.statistics import TensorStatistic -from nncf.parameters import StripFormat -from nncf.torch.algo_selector import PT_COMPRESSION_ALGORITHMS -from nncf.torch.algo_selector import ZeroCompressionLoss -from nncf.torch.compression_method_api import PTCompressionAlgorithmBuilder -from nncf.torch.compression_method_api import PTCompressionAlgorithmController -from nncf.torch.graph.graph import PTNNCFGraph -from nncf.torch.graph.operator_metatypes import ELEMENTWISE_OPERATIONS -from nncf.torch.graph.operator_metatypes import UNIFICATION_PRODUCING_METATYPES -from nncf.torch.graph.operator_metatypes import PTCatMetatype -from nncf.torch.graph.operator_metatypes import PTModuleConv2dMetatype -from nncf.torch.graph.operator_metatypes import PTModuleDepthwiseConv2dSubtype -from nncf.torch.graph.transformations.commands import ExtraCompressionModuleType -from nncf.torch.graph.transformations.commands import PTInsertionCommand -from nncf.torch.graph.transformations.commands import PTTargetPoint -from nncf.torch.graph.transformations.commands import TransformationPriority -from nncf.torch.graph.transformations.layout import PTTransformationLayout -from nncf.torch.hardware.config import PTHWConfig -from nncf.torch.initialization import SimpleDataLoaderRunner -from nncf.torch.module_operations import UpdatePaddingValue -from nncf.torch.nncf_network import LoadStateListener -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.adjust_padding import AdjustPaddingArgs -from nncf.torch.quantization.adjust_padding import CalculatePaddingAdjustment -from nncf.torch.quantization.base_ctrl import QuantizationControllerBase -from nncf.torch.quantization.debug_interface import QuantizationDebugInterface -from nncf.torch.quantization.default_quantization import DEFAULT_PT_QUANT_TRAIT_TO_OP_DICT -from nncf.torch.quantization.default_quantization import QUANTIZATION_LAYER_METATYPES -from nncf.torch.quantization.external_quantizer import ExternalQuantizerCallHook -from nncf.torch.quantization.init_precision import PrecisionInitializerFactory -from nncf.torch.quantization.init_range import DataLoaderRangeInitializeRunner -from nncf.torch.quantization.init_range import PTRangeInitParams -from nncf.torch.quantization.init_range import StatCollectorGenerator -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import PTQuantizationPoint -from nncf.torch.quantization.layers import PTQuantizerSetup -from nncf.torch.quantization.layers import PTQuantizerSpec -from nncf.torch.quantization.layers import QuantizerConfig -from nncf.torch.quantization.layers import QuantizerExportMode -from nncf.torch.quantization.layers import QuantizersSwitcher -from nncf.torch.quantization.layers import SymmetricQuantizer -from nncf.torch.quantization.layers import get_scale_shape -from nncf.torch.quantization.metrics import MemoryConsumptionStatisticsCollector -from nncf.torch.quantization.metrics import PTQuantizationStatisticsCollector -from nncf.torch.quantization.metrics import QuantizationShareBuildTimeInfo -from nncf.torch.quantization.metrics import ShareEdgesQuantizedDataPathStatisticsCollector -from nncf.torch.quantization.precision_constraints import HardwareQuantizationConstraints -from nncf.torch.quantization.precision_init.adjacent_quantizers import GroupsOfAdjacentQuantizers -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitializer -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitParams -from nncf.torch.quantization.precision_init.hawq_init import HAWQPrecisionInitParams -from nncf.torch.quantization.precision_init.manual_init import ManualPrecisionInitParams -from nncf.torch.quantization.schedulers import QUANTIZATION_SCHEDULERS -from nncf.torch.quantization.strip import strip_quantized_model -from nncf.torch.quantization.structs import NonWeightQuantizerInfo -from nncf.torch.quantization.structs import WeightQuantizerInfo -from nncf.torch.quantization.translator import PTTargetPointTranslator -from nncf.torch.structures import QuantizationPrecisionInitArgs -from nncf.torch.tensor_statistics.algo import TensorStatisticsCollectionBuilder -from nncf.torch.tensor_statistics.statistics import pt_convert_stat_to_min_max_tensor_stat -from nncf.torch.utils import get_model_device -from nncf.torch.utils import get_model_dtype -from nncf.torch.utils import get_state_dict_names_with_modules -from nncf.torch.utils import is_main_process -from nncf.torch.utils import training_mode_switcher - -QUANTIZER_BUILDER_STATE_VERSION_SAVE_NAME = "version" - - -class QuantizerBuilderStateVersion(IntEnum): - # In Quantization builder state SingleConfigQuantizerSetup is being saved as quantizer setup. - v1 = 1 - # In Quantization builder state PTQuantizerSetup is being saved as quantizer setup. - v2 = 2 - - @staticmethod - def from_compression_state(compression_state): - if QUANTIZER_BUILDER_STATE_VERSION_SAVE_NAME in compression_state: - return compression_state.get(QUANTIZER_BUILDER_STATE_VERSION_SAVE_NAME) - return QuantizerBuilderStateVersion.v1 - - -class QuantizerSetupGeneratorBase: - def __init__( - self, - quant_config: dict, - target_model: NNCFNetwork, - precision_init_type: str = None, - precision_init_params: BasePrecisionInitParams = None, - range_init_params: PTRangeInitParams = None, - hw_config: HWConfig = None, - ): - self._target_model = target_model - self._quantization_config = quant_config - self.hw_config = hw_config - self._target_device = None if hw_config is None else hw_config.target_device - self._quantize_inputs = self._quantization_config.get("quantize_inputs", QUANTIZE_INPUTS) - self._quantize_outputs = self._quantization_config.get("quantize_outputs", QUANTIZE_OUTPUTS) - - self.ignored_scopes = self._quantization_config.get("ignored_scopes") - self.target_scopes = self._quantization_config.get("target_scopes") - - self.global_quantizer_constraints: dict[QuantizerGroup, QuantizationConstraints] = {} - self._ignored_scopes_per_group: dict[QuantizerGroup, list[str]] = {} - self._target_scopes_per_group: dict[QuantizerGroup, list[str]] = {} - - for quantizer_group in QuantizerGroup: - self._parse_group_params(self._quantization_config, quantizer_group) - - self._precision_init_type = precision_init_type - self._precision_init_params = precision_init_params - self._range_init_params = range_init_params - self._num_potential_quantized_weights = len(self._target_model.nncf.get_nncf_modules()) - - def generate_setup(self) -> SingleConfigQuantizerSetup: - raise NotImplementedError - - def get_build_time_metric_infos(self): - raise NotImplementedError - - def _parse_group_params(self, quant_config: dict, quantizer_group: QuantizerGroup): - group_name = quantizer_group.value - params_dict = {} - params_dict_from_config = quant_config.get(group_name, {}) - preset = quant_config.get("preset") - if self._target_device in ["ANY", "CPU", "GPU"] or (self._target_device is None and preset is not None): - preset = QuantizationPreset(quant_config.get("preset", QUANTIZATION_PRESET)) - params_dict = preset.get_params_configured_by_preset(quantizer_group) - overridden_params = params_dict.keys() & params_dict_from_config.keys() - if overridden_params: - nncf_logger.info(f"Preset quantizer parameters {overridden_params} explicitly overridden by config.") - params_dict.update(params_dict_from_config) - self.global_quantizer_constraints[quantizer_group] = QuantizationConstraints.from_config_dict(params_dict) - self._ignored_scopes_per_group[quantizer_group] = params_dict_from_config.get("ignored_scopes", []) - if self.ignored_scopes is not None: - self._ignored_scopes_per_group[quantizer_group] += self.ignored_scopes - target_scopes = params_dict_from_config.get("target_scopes") - if target_scopes is None and self.target_scopes is not None: - self._target_scopes_per_group[quantizer_group] = self.target_scopes - else: - self._target_scopes_per_group[quantizer_group] = target_scopes - - def _get_default_qconfig(self, constraints: QuantizationConstraints = None): - qconfig = deepcopy(DEFAULT_QUANTIZER_CONFIG) - if constraints is not None: - qconfig = constraints.apply_constraints_to(qconfig) - return qconfig - - def _filter_by_ignored_algo(self, nodes: list[NNCFNode]) -> list[NNCFNode]: - retval = [] - for node in nodes: - if "quantization" in node.ignored_algorithms: - continue - retval.append(node) - return retval - - def _assign_qconfig_lists_to_modules(self, weighted_nodes: list[NNCFNode]) -> dict[NNCFNode, list[QuantizerConfig]]: - raise NotImplementedError - - def get_quantizable_module_nodes(self) -> list[QuantizableWeightedLayerNode]: - weighted_nodes = self._target_model.nncf.get_original_graph().get_nodes_by_metatypes( - QUANTIZATION_LAYER_METATYPES - ) - quantized_modules_with_potential_qconfig = [] - - weighted_nodes = self._filter_by_ignored_algo(weighted_nodes) - weighted_node_vs_qconfig_list = self._assign_qconfig_lists_to_modules(weighted_nodes) - - for node, qconfig_list in weighted_node_vs_qconfig_list.items(): - if qconfig_list is not None: - qconfig_list_copy = deepcopy(qconfig_list) - quantized_modules_with_potential_qconfig.append(QuantizableWeightedLayerNode(node, qconfig_list_copy)) - return quantized_modules_with_potential_qconfig - - -class IQuantizerSetupDisambiguator: - def select_final_quantizer_setup(self, multi_config_setup: MultiConfigQuantizerSetup) -> SingleConfigQuantizerSetup: - raise NotImplementedError - - -class DefaultQuantizerSetupDisambiguator(IQuantizerSetupDisambiguator): - def __init__( - self, - target_model: NNCFNetwork, - precision_init_type: str = None, - precision_init_params: BasePrecisionInitParams = None, - range_init_params: PTRangeInitParams = None, - override_bit_options_with_precision_init: bool = False, - hw_config: HWConfig = None, - ): - self._precision_init_type = precision_init_type - self._precision_init_params = precision_init_params - self._range_init_params = range_init_params - self._target_model = target_model - self._override_bit_options_with_precision_init = override_bit_options_with_precision_init - self.hw_config = hw_config - - @staticmethod - def select_first_qconfig_with_bitwidth_variants_for_each_point( - multi_config_setup: MultiConfigQuantizerSetup, - ) -> MultiConfigQuantizerSetup: - new_setup = deepcopy(multi_config_setup) - for qp_id, qp in multi_config_setup.quantization_points.items(): - main_qconfig = qp.possible_qconfigs[0] - constrained_qconfig_list = [main_qconfig] - if len(qp.possible_qconfigs) > 1: - constrained_qconfig_list += list(filter(main_qconfig.is_a_bitwidth_variant, qp.possible_qconfigs[1:])) - new_setup.quantization_points[qp_id].possible_qconfigs = constrained_qconfig_list - return new_setup - - def select_final_quantizer_setup(self, multi_config_setup: MultiConfigQuantizerSetup) -> SingleConfigQuantizerSetup: - if self._precision_init_type is not None: - with self._target_model.nncf.temporary_clean_view() as intermediate_model: - stats = QuantizationBuilder.get_statistics_for_quantizer_setup( - intermediate_model, multi_config_setup, self._range_init_params - ) - bitwidth_varying_only_multi_setup = self.select_first_qconfig_with_bitwidth_variants_for_each_point( - multi_config_setup - ) - - init_setup = bitwidth_varying_only_multi_setup.select_first_qconfig_for_each_point() - intermediate_builder = ExperimentalQuantizationBuilder( - bitwidth_varying_only_multi_setup, init_setup, stats, hw_config=self.hw_config - ) - intermediate_builder.apply_to(intermediate_model) - intermediate_ctrl = intermediate_builder.build_controller(intermediate_model) - - # intermediate_ctrl.init_range() - hw_constraints = HardwareQuantizationConstraints() - if not self._override_bit_options_with_precision_init: - for qp_id, qp in multi_config_setup.quantization_points.items(): - quantizer_module_id = intermediate_ctrl.setup_to_module_id_translation_dict[qp_id] - hw_constraints.add(quantizer_module_id, qp.possible_qconfigs) - final_quantizer_setup = intermediate_ctrl.init_precision( - self._precision_init_type, self._precision_init_params, hw_constraints - ) - else: - final_quantizer_setup = multi_config_setup.select_first_qconfig_for_each_point() - return final_quantizer_setup - - -class PropagationBasedQuantizerSetupGenerator(QuantizerSetupGeneratorBase): - def __init__( - self, - quant_config: dict, - target_model: NNCFNetwork, - hw_config: HWConfig = None, - device: TargetDevice = None, - precision_init_type: str = None, - precision_init_params: BasePrecisionInitParams = None, - range_init_params: PTRangeInitParams = None, - debug_interface: "QuantizationDebugInterface" = None, - ): - super().__init__( - quant_config, target_model, precision_init_type, precision_init_params, range_init_params, hw_config - ) - - self._pattern_fusing_graph = PatternsManager.get_full_hw_pattern_graph(backend=BackendType.TORCH, device=device) - - self._hw_precision_constraints = HardwareQuantizationConstraints() - self._debug_interface = debug_interface - self._num_potential_quantized_activations = 0 - - act_config = quant_config.get(QuantizerGroup.ACTIVATIONS.value, {}) - self._unified_scale_ops = act_config.get("unified_scale_ops") - - def generate_setup(self) -> SingleConfigQuantizerSetup: - quantizable_module_nodes = self.get_quantizable_module_nodes() - - insertion_point_graph = self._target_model.nncf.get_original_insertion_point_graph() - if self._debug_interface: - self._debug_interface.visualize_insertion_point_graph(insertion_point_graph) - from nncf.common.quantization.quantizer_propagation.solver import QuantizerPropagationSolver - - scales_unification_map = {PTCatMetatype: UNIFICATION_PRODUCING_METATYPES} - ignored_scopes_for_solver = { - name: IgnoreReason.USER_REQUESTED for name in self._ignored_scopes_per_group[QuantizerGroup.ACTIVATIONS] - } - prop_graph_solver = QuantizerPropagationSolver( - activation_ignored_scopes=ignored_scopes_for_solver, - weight_ignored_scopes=self._ignored_scopes_per_group[QuantizerGroup.WEIGHTS], - activation_target_scopes=self._target_scopes_per_group[QuantizerGroup.ACTIVATIONS], - weight_target_scopes=self._target_scopes_per_group[QuantizerGroup.WEIGHTS], - hw_config=self.hw_config, - default_trait_to_metatype_map=DEFAULT_PT_QUANT_TRAIT_TO_OP_DICT, - default_qconfig_list=[ - self._get_default_qconfig(constraints=self.global_quantizer_constraints[QuantizerGroup.ACTIVATIONS]) - ], - quantizable_layer_nodes=quantizable_module_nodes, - scope_overrides=self._quantization_config.get("scope_overrides", {}), - global_constraints=self.global_quantizer_constraints, - additional_unified_scale_op_scopes=self._unified_scale_ops, - quantize_outputs=self._quantize_outputs, - scales_unification_map=scales_unification_map, - ) - - merged_ip_graph = insertion_point_graph.get_ip_graph_with_merged_hw_optimized_operations( - self._pattern_fusing_graph - ) - quantization_proposal = prop_graph_solver.run_on_ip_graph(merged_ip_graph, ELEMENTWISE_OPERATIONS) - self._num_potential_quantized_activations = prop_graph_solver.get_num_potential_quantized_activations() - - quantizer_setup = deepcopy(quantization_proposal.quantizer_setup) - quantization_proposal.quantizer_setup = quantizer_setup - - disambiguator = DefaultQuantizerSetupDisambiguator( - self._target_model, - self._precision_init_type, - self._precision_init_params, - self._range_init_params, - override_bit_options_with_precision_init=self.hw_config is None, - hw_config=self.hw_config, - ) - - single_config_quantizer_setup = disambiguator.select_final_quantizer_setup( - quantization_proposal.quantizer_setup - ) - - finalized_proposal = quantization_proposal.finalize( - single_config_quantizer_setup, strict=self.hw_config is not None - ) - finalized_quantizer_setup = prop_graph_solver.get_final_quantizer_setup(finalized_proposal) - finalized_quantizer_setup = self._handle_quantize_inputs_option(finalized_quantizer_setup) - return finalized_quantizer_setup - - def _assign_qconfig_lists_to_modules(self, weighted_nodes: list[NNCFNode]) -> dict[NNCFNode, list[QuantizerConfig]]: - global_constraints = self.global_quantizer_constraints[QuantizerGroup.WEIGHTS] - scope_overrides_dict = self._quantization_config.get("scope_overrides", {}) - return assign_qconfig_lists_to_modules( - weighted_nodes, self._get_default_qconfig(), global_constraints, scope_overrides_dict, self.hw_config - ) - - def _handle_quantize_inputs_option(self, quantizer_setup: SingleConfigQuantizerSetup) -> SingleConfigQuantizerSetup: - nncf_graph = self._target_model.nncf.get_original_graph() - qp_ids_to_discard = [] - for qp_id, qp in quantizer_setup.quantization_points.items(): - if qp.is_activation_quantization_point(): - insertion_point = qp.insertion_point - target_node = nncf_graph.get_node_by_name(insertion_point.target_node_name) - if not self._quantize_inputs and target_node.node_type == MODEL_INPUT_OP_NAME: - qp_ids_to_discard.append(qp_id) - for qp_id in qp_ids_to_discard: - quantizer_setup.discard(qp_id, keep_shared_input_qps=True) - return quantizer_setup - - def get_build_time_metric_infos(self): - return QuantizationShareBuildTimeInfo( - self._num_potential_quantized_activations, self._num_potential_quantized_weights - ) - - -class QBuilderStateNames: - BUILD_TIME_METRIC_INFOS = "build_time_metric_infos" - QUANTIZER_SETUP = "quantizer_setup" - - -@PT_COMPRESSION_ALGORITHMS.register("quantization") -class QuantizationBuilder(PTCompressionAlgorithmBuilder): - _state_names = QBuilderStateNames - - def __init__(self, config, should_init: bool = True): - super().__init__(config, should_init) - self._debug_interface = QuantizationDebugInterface() if is_debug() else None - self._weight_quantizers = OrderedDict() # Quantizers applied via UpdateWeights - self._non_weight_quantizers = OrderedDict() # All the other quantizers - self._quantizers_input_shapes = OrderedDict() - self._processed_insertion_points: set[PTTargetPoint] = set() - self._groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers = GroupsOfAdjacentQuantizers() - self._setup_to_module_id_translation_dict: dict[QuantizationPointId, QuantizerId] = {} - self.eval_ops_exec_ctx = [] - self._build_time_metric_infos: Optional[QuantizationShareBuildTimeInfo] = None - self.hw_config = None - self._legacy_single_config_quantizer_setup_from_comp_state: Optional[SingleConfigQuantizerSetup] = None - self._pt_quantizer_setup: Optional[PTQuantizerSetup] = None - self._minmax_values_for_range_init: Optional[dict[QuantizationPointId, MinMaxTensorStatistic]] = {} - - # can be False to disable setting of adjust padding operations on precision init, because it may add unnecessary - # noise on model evaluation (e.g. in AutoQ) - self._should_setup_adjust_pad_ops = True - hw_config_type = None - self._target_device = self.config.get("target_device", "ANY") - hw_config_type = get_hw_config_type(self._target_device) - if hw_config_type is not None: - hw_config_path = PTHWConfig.get_path_to_hw_config(hw_config_type) - self.hw_config = PTHWConfig.from_json(hw_config_path) - - algo_config = self._get_algo_specific_config_section() - if self._target_device == "NPU" and "preset" in algo_config: - msg = "The NPU target device does not support presets." - raise nncf.InternalError(msg) - if self._target_device == "CPU_SPR": - msg = "The CPU_SPR target device does not supported." - raise nncf.InternalError(msg) - - self._range_init_params = None - self._precision_init_type = None - self._precision_init_params = None - if self.should_init: - self._parse_init_params() - - self._use_logarithm_scale_per_group: dict[QuantizerGroup, bool] = {} - - for quantizer_group in QuantizerGroup: - group_name = quantizer_group.value - params_dict = self._algo_config.get(group_name, {}) - self._use_logarithm_scale_per_group[quantizer_group] = params_dict.get( - "logarithm_scale", QUANTIZATION_LOGARITHM_SCALE - ) - - self._overflow_fix = self._algo_config.get("overflow_fix", QUANTIZATION_OVERFLOW_FIX) - self._device_for_callable_obj_creation = "cpu" - - def _load_state_without_name(self, state_without_name: dict[str, Any]): - """ - Initializes object from the state. - - :param state_without_name: Output of `get_state()` method. - """ - quantizer_setup_state = state_without_name[self._state_names.QUANTIZER_SETUP] - version = state_without_name.get(QUANTIZER_BUILDER_STATE_VERSION_SAVE_NAME, QuantizerBuilderStateVersion.v1) - if version == QuantizerBuilderStateVersion.v1: - self._legacy_single_config_quantizer_setup_from_comp_state = SingleConfigQuantizerSetup.from_state( - quantizer_setup_state - ) - else: - self._pt_quantizer_setup = PTQuantizerSetup.from_state(quantizer_setup_state) - self._build_time_metric_infos = QuantizationShareBuildTimeInfo.from_state( - state_without_name[self._state_names.BUILD_TIME_METRIC_INFOS] - ) - - def _get_state_without_name(self) -> dict[str, Any]: - """ - Returns a dictionary with Python data structures (dict, list, tuple, str, int, float, True, False, None) that - represents state of the object. - - :return: state of the object - """ - build_time_metric_infos_state = {} - if self._build_time_metric_infos: - build_time_metric_infos_state = self._build_time_metric_infos.get_state() - quantizer_setup_state = {} - if self._pt_quantizer_setup: - quantizer_setup_state = self._pt_quantizer_setup.get_state() - return { - self._state_names.QUANTIZER_SETUP: quantizer_setup_state, - self._state_names.BUILD_TIME_METRIC_INFOS: build_time_metric_infos_state, - QUANTIZER_BUILDER_STATE_VERSION_SAVE_NAME: max(QuantizerBuilderStateVersion).value, - } - - def _parse_init_params(self): - self._range_init_params = self._parse_range_init_params() - self._precision_init_type, self._precision_init_params = self._parse_precision_init_params( - self._algo_config.get("initializer", {}) - ) - - def _parse_range_init_params(self) -> Optional[PTRangeInitParams]: - range_init_params = extract_range_init_params(self.config) - return PTRangeInitParams(**range_init_params) if range_init_params is not None else None - - def _parse_precision_init_params(self, initializer_config: dict) -> tuple[str, BasePrecisionInitParams]: - init_precision_config = initializer_config.get("precision") - if not init_precision_config: - return None, None - precision_init_type = init_precision_config.get("type", "manual") - if precision_init_type not in PRECISION_INIT_TYPES_VS_DESCRIPTION: - msg = f"Unrecognized precision init type: {precision_init_type}" - raise nncf.InternalError(msg) - if precision_init_type == "hawq": - try: - precision_init_args = self.config.get_extra_struct(QuantizationPrecisionInitArgs) - except KeyError as e: - msg = ( - "Specified non-manual precision initialization in the NNCF config, " - "but the initializing data loader and loss criterion are not provided as an extra struct. " - "Refer to `NNCFConfig.register_extra_structs` and the `QuantizationPrecisionInitArgs` " - "class" - ) - raise ValueError(msg) from e - precision_init_params = HAWQPrecisionInitParams.from_config(init_precision_config, precision_init_args) - elif precision_init_type == "manual": - precision_init_params = ManualPrecisionInitParams.from_config(init_precision_config) - else: - msg = f"Unhandled precision init type: {precision_init_type}" - raise ValueError(msg) - return precision_init_type, precision_init_params - - def _get_minmax_values_for_quantizer_locations( - self, - quantizer_setup: SingleConfigQuantizerSetup, - tensor_statistics: dict[PTTargetPoint, dict[ReductionAxes, TensorStatistic]], - target_model_graph: PTNNCFGraph, - ) -> dict[QuantizationPointId, MinMaxTensorStatistic]: - retval = {} - for qp_id, qp in quantizer_setup.quantization_points.items(): - qip = qp.insertion_point - tp = PTTargetPointTranslator.translate(qip) - if tp not in tensor_statistics: - nncf_logger.debug(f"TP {tp} not found in tensor statistics") - retval[qp_id] = None - else: - target_node = target_model_graph.get_node_by_name(tp.target_node_name) - if qp.is_weight_quantization_point(): - layer_attrs = target_node.layer_attributes - assert isinstance(layer_attrs, WeightedLayerAttributes) - input_shape = get_weight_shape_legacy(layer_attrs) - channel_idx = get_target_dim_for_compression_legacy(layer_attrs) - else: - input_shape = target_model_graph.get_input_shape_for_insertion_point(qp.insertion_point) - channel_idx = 1 # channel dim for activations - scale_shape = tuple( - get_scale_shape(input_shape, qp.is_weight_quantization_point(), qp.qconfig.per_channel, channel_idx) - ) - - if scale_shape not in tensor_statistics[tp]: - nncf_logger.debug(f"Did not collect tensor statistics at {tp} for shape {scale_shape}") - retval[qp_id] = None - else: - minmax_stat = pt_convert_stat_to_min_max_tensor_stat(tensor_statistics[tp][scale_shape]) - retval[qp_id] = minmax_stat - return retval - - def _get_transformation_layout(self, target_model: NNCFNetwork) -> PTTransformationLayout: - # TODO (vshampor): a simpler solution would be to always create callables on CPU and - # to move these to model-specific device upon actual application, but would this impact - # the time required to create a compressed model? - self._device_for_callable_obj_creation = get_model_device(target_model) - target_model.nncf.register_compression_module_type(ExtraCompressionModuleType.EXTERNAL_QUANTIZER) - if self._pt_quantizer_setup is None: - self._pt_quantizer_setup = self._get_quantizer_setup(target_model) - - ( - insertion_commands, - setup_to_module_id_translation_dict, - ) = self._build_insertion_commands_list_for_quantizer_setup( - self._pt_quantizer_setup, target_model, self._minmax_values_for_range_init - ) - - transformation_layout = PTTransformationLayout() - for command in insertion_commands: - transformation_layout.register(command) - - self._setup_to_module_id_translation_dict = setup_to_module_id_translation_dict - all_quantizations = {} - all_quantizations.update({k: v.quantizer_module_ref for k, v in self._weight_quantizers.items()}) - all_quantizations.update({k: v.quantizer_module_ref for k, v in self._non_weight_quantizers.items()}) - self._groups_of_adjacent_quantizers.parse_from_quantizer_setup( - all_quantizations, self._pt_quantizer_setup, setup_to_module_id_translation_dict - ) - - # NOTE: Order of activations must be the same to correctly broadcast parameters (e.g. scales) in distributed - # mode (see call of `_dist_broadcast_coalesced` in torch/nn/parallel/distributed.py for more details) - - target_model.nncf.sort_compression_modules(ExtraCompressionModuleType.EXTERNAL_QUANTIZER) - - if self._debug_interface is not None: - target_model.nncf.debug_interface.add_interface(self._debug_interface) - - quantization_types = [class_type.__name__ for class_type in QUANTIZATION_MODULES.registry_dict.values()] - all_quantizations = get_state_dict_names_with_modules(target_model, quantization_types) - target_model.nncf._load_listener = LoadStateListener(target_model, all_quantizations) - - return transformation_layout - - @staticmethod - def get_statistics_for_quantizer_setup( - target_model: NNCFNetwork, quantizer_setup: QuantizerSetupBase, range_init_params: PTRangeInitParams - ) -> dict[PTTargetPoint, dict[ReductionAxes, TensorStatistic]]: - if range_init_params is None: - return {} - observation_points_vs_collectors_dict = ( - StatCollectorGenerator.generate_collectors_for_range_init_statistics_collection( - target_model.nncf.get_original_graph(), quantizer_setup, range_init_params - ) - ) - - with target_model.nncf.temporary_clean_view() as intermediate_model: - stat_builder = TensorStatisticsCollectionBuilder(NNCFConfig(), observation_points_vs_collectors_dict) - stat_builder.apply_to(intermediate_model) - stat_ctrl = stat_builder.build_controller(intermediate_model) - runner = SimpleDataLoaderRunner(intermediate_model, range_init_params.device) - runner.progressbar_description = "Collecting tensor statistics" - with training_mode_switcher(intermediate_model, is_training=False): - # Run statistics collection in eval mode, otherwise it may fail because graph was built in eval mode - runner.run(range_init_params.init_range_data_loader, range_init_params.get_max_num_init_steps()) - - retval = {} - for ip, rs_vs_collector in stat_ctrl.ip_vs_collector_dict.items(): - retval[ip] = {rs: collector.get_statistics() for rs, collector in rs_vs_collector.items()} - return retval - - def _get_statistics_for_final_range_init( - self, target_model: NNCFNetwork, quantizer_setup: QuantizerSetupBase, range_init_params: PTRangeInitParams - ) -> dict[PTTargetPoint, dict[ReductionAxes, TensorStatistic]]: - return self.get_statistics_for_quantizer_setup(target_model, quantizer_setup, range_init_params) - - def _get_single_config_quantizer_setup(self, target_model) -> SingleConfigQuantizerSetup: - setup_generator = PropagationBasedQuantizerSetupGenerator( - self._algo_config, - target_model, - self.hw_config, - self._target_device, - self._precision_init_type, - self._precision_init_params, - self._range_init_params, - self._debug_interface, - ) - single_config_quantizer_setup = setup_generator.generate_setup() - self._build_time_metric_infos = setup_generator.get_build_time_metric_infos() - return single_config_quantizer_setup - - def _get_quantizer_setup(self, target_model: NNCFNetwork) -> PTQuantizerSetup: - if self._legacy_single_config_quantizer_setup_from_comp_state is None: - single_config_quantizer_setup = self._get_single_config_quantizer_setup(target_model) - else: - single_config_quantizer_setup = self._legacy_single_config_quantizer_setup_from_comp_state - - target_model_graph = target_model.nncf.get_original_graph() - - if is_main_process() and self.should_init: - stats_for_range_init = self._get_statistics_for_final_range_init( - target_model, single_config_quantizer_setup, self._range_init_params - ) - self._minmax_values_for_range_init = self._get_minmax_values_for_quantizer_locations( - single_config_quantizer_setup, stats_for_range_init, target_model_graph - ) - - self._check_and_log_missing_stats_for_setup( - single_config_quantizer_setup, self._minmax_values_for_range_init - ) - - bitwidth_per_scope = BasePrecisionInitializer.get_bitwidth_per_scope(single_config_quantizer_setup) - str_bw = [str(element) for element in bitwidth_per_scope] - nncf_logger.debug("\n".join(['\n"bitwidth_per_scope": [', ",\n".join(str_bw), "]"])) - - setup = PTQuantizerSetup( - single_config_quantizer_setup.unified_scale_groups, - single_config_quantizer_setup.shared_input_operation_set_groups, - ) - - for qp_id, qp in single_config_quantizer_setup.quantization_points.items(): - qconfig = qp.qconfig - insertion_point = qp.insertion_point # QuantizationInsertionPointBase - - compression_lr_multiplier = self._get_compression_lr_multiplier() - - half_range = False - if self.hw_config and qp.is_weight_quantization_point(): - target_node = target_model_graph.get_node_by_name(insertion_point.target_node_name) - if self.hw_config.target_device in ["CPU", "ANY"] and qconfig.num_bits == 8: - if self._overflow_fix == "enable": - half_range = True - quantizers_with_overflow_fix_str = "all weight quantizers" - elif self._overflow_fix == "first_layer_only": - if target_node in get_first_nodes_of_type(target_model_graph, ["conv2d", "conv3d"]): - half_range = True - quantizers_with_overflow_fix_str = "first convolution weight quantizers" - elif self._overflow_fix != "disable": - msg = f"Unknown overflow fix type: {self._overflow_fix}" - raise nncf.InternalError(msg) - if half_range: - nncf_logger.debug(f"Overflow issue fix will be applied to {quantizers_with_overflow_fix_str}") - - if qp.is_weight_quantization_point(): - use_logarithm_scale = self._use_logarithm_scale_per_group[QuantizerGroup.WEIGHTS] - narrow_range = qconfig.num_bits == 8 and not half_range - else: - use_logarithm_scale = self._use_logarithm_scale_per_group[QuantizerGroup.ACTIVATIONS] - narrow_range = False - - if qp.is_weight_quantization_point(): - target_node = target_model_graph.get_node_by_name(insertion_point.target_node_name) - layer_attributes = target_node.layer_attributes - assert isinstance(layer_attributes, WeightedLayerAttributes) - scale_shape = get_scale_shape( - get_weight_shape_legacy(layer_attributes), - is_weights=True, - per_channel=qconfig.per_channel, - channel_idx=get_target_dim_for_compression_legacy(layer_attributes), - ) - else: - input_shape = target_model_graph.get_input_shape_for_insertion_point(insertion_point) - scale_shape = get_scale_shape(list(input_shape), is_weights=False, per_channel=qconfig.per_channel) - - qspec = PTQuantizerSpec.from_config( - qconfig, - narrow_range=narrow_range, - scale_shape=tuple(scale_shape), - logarithm_scale=use_logarithm_scale, - half_range=half_range, - is_quantized_on_export=qp.is_weight_quantization_point(), - compression_lr_multiplier=compression_lr_multiplier, - ) - pt_qp = PTQuantizationPoint( - qspec, PTTargetPointTranslator.translate(insertion_point), qp.directly_quantized_operator_node_names - ) - setup.add_quantization_point(qp_id, pt_qp) - - return setup - - def _build_controller(self, model: NNCFNetwork) -> PTCompressionAlgorithmController: - return QuantizationController( - model, - self.config, - self._debug_interface, - self._weight_quantizers, - self._non_weight_quantizers, - self._groups_of_adjacent_quantizers, - self._quantizers_input_shapes, - build_time_metric_info=self._build_time_metric_infos, - build_time_range_init_params=self._range_init_params, - ) - - def __create_quantize_module(self, quantizer_spec: PTQuantizerSpec): - quantizer_cls = QUANTIZATION_MODULES.get(quantizer_spec.mode) - return quantizer_cls(quantizer_spec) - - @staticmethod - def _get_adjust_padding_args( - target_model_graph: NNCFGraph, - quantization_point: PTQuantizationPoint, - activation_quantizer: BaseQuantizer, - quantization_points: list[PTQuantizationPoint], - ) -> list[AdjustPaddingArgs]: - result = [] - for op_node_name in quantization_point.directly_quantized_operator_node_names: - weight_bitwidth = None - for qp in quantization_points: - is_weight = qp.is_weight_quantization_point() - if is_weight and (qp.target_point.target_node_name == op_node_name): - weight_bitwidth = qp.qspec.num_bits - break - if weight_bitwidth: - is_applicable = False - target_node = target_model_graph.get_node_by_name(op_node_name) - if target_node.metatype in [PTModuleConv2dMetatype, PTModuleDepthwiseConv2dSubtype]: - layer_attrs = target_node.layer_attributes - assert isinstance(layer_attrs, ConvolutionLayerAttributes) - padding_values = set(layer_attrs.padding_values) - padding_enabled = len(padding_values) >= 1 and padding_values.pop() - if padding_enabled: - symmetric = isinstance(activation_quantizer, SymmetricQuantizer) - per_tensor = not activation_quantizer.per_channel - a_int4 = activation_quantizer.num_bits == 4 - w_int24 = weight_bitwidth <= 4 - unsigned = not activation_quantizer.signed - is_applicable = symmetric and per_tensor and a_int4 and w_int24 and unsigned - if is_applicable: - result.append(AdjustPaddingArgs(weight_bitwidth, activation_quantizer, op_node_name)) - return result - - def _add_adjust_padding_ops(self, adjust_padding_args: list[AdjustPaddingArgs]): - commands = [] - for args in adjust_padding_args: - ap = CalculatePaddingAdjustment(args.activation_quantizer) - op = UpdatePaddingValue(ap).to(self._device_for_callable_obj_creation) - insertion_point = PTTargetPoint( - target_type=TargetType.PRE_LAYER_OPERATION, target_node_name=args.module_op_node_name - ) - nncf_logger.debug_once(f"Padding will be adjusted for {args.module_op_node_name}") - commands.append(PTInsertionCommand(insertion_point, op, TransformationPriority.DEFAULT_PRIORITY)) - return commands - - @staticmethod - def _check_and_log_missing_stats_for_setup( - quantizer_setup: SingleConfigQuantizerSetup, - minmax_values_for_range_init: dict[QuantizationPointId, MinMaxTensorStatistic], - ): - tps_with_uncollected_stats = set() - for qp_id in quantizer_setup.quantization_points: - if qp_id not in minmax_values_for_range_init: - tps_with_uncollected_stats.add(quantizer_setup.quantization_points[qp_id].insertion_point) - if tps_with_uncollected_stats: - nncf_logger.error("Tensor statistics for the following locations were not collected:") - for tp in tps_with_uncollected_stats: - nncf_logger.error(f"\t{tp}") - nncf_logger.error( - "The corresponding quantizer range will not be initialized! If the model has " - "data-dependent control flow branches, make sure that your initializing data loader is " - "producing data that allows the model cover to all of these branches. If this is not the " - "case, consider adding the corresponding nodes to `ignored_scopes`." - ) - - def _build_insertion_commands_list_for_quantizer_setup( - self, - quantizer_setup: PTQuantizerSetup, - target_model: NNCFNetwork, - minmax_values_for_range_init: dict[QuantizationPointId, MinMaxTensorStatistic], - ) -> tuple[list[PTInsertionCommand], dict[QuantizationPointId, QuantizerId]]: - insertion_commands = [] - qp_id_vs_quant_module_id_dict: dict[QuantizationPointId, QuantizerId] = {} - target_model_graph = target_model.nncf.get_original_graph() - non_unified_scales_quantization_point_ids = set(quantizer_setup.quantization_points.keys()) - already_weight_quantized_shared_layers: dict[str, QuantizerId] = {} - - for unified_scales_group in quantizer_setup.unified_scale_groups.values(): - for us_qp_id in unified_scales_group: - non_unified_scales_quantization_point_ids.discard(us_qp_id) - - ( - filtered_unified_scales_group, - shared_weight_quantized_layers_in_group, - ) = self._remove_shared_layer_weight_quantization_point_duplicates( - unified_scales_group, quantizer_setup, target_model_graph - ) - - quant_module_id, commands = self._build_commands_for_single_unified_scale_group( - target_model, quantizer_setup, filtered_unified_scales_group, minmax_values_for_range_init - ) - - for layer_name in shared_weight_quantized_layers_in_group: - if layer_name in already_weight_quantized_shared_layers: - msg = ( - "Attempted to assign a unified-scale quantizer to a shared layer node that has " - "already had its weights quantized by another unified-scale quantizer!" - ) - raise nncf.InternalError(msg) - already_weight_quantized_shared_layers[layer_name] = quant_module_id - - for us_qp_id in unified_scales_group: - qp_id_vs_quant_module_id_dict[us_qp_id] = quant_module_id - insertion_commands += commands - - for qp_id in non_unified_scales_quantization_point_ids: - qp = quantizer_setup.quantization_points[qp_id] - nncf_node = target_model_graph.get_node_by_name(qp.target_point.target_node_name) - if qp.is_weight_quantization_point() and nncf_node.is_shared(): - layer_name = nncf_node.layer_name - if layer_name in already_weight_quantized_shared_layers: - nncf_logger.debug_once( - f"Filtering a regular weight quantization point {qp_id} - " - f"already quantized as a shared layer {nncf_node.layer_name}", - ) - qp_id_vs_quant_module_id_dict[qp_id] = already_weight_quantized_shared_layers[layer_name] - continue - - qspec = quantizer_setup.quantization_points[qp_id].qspec - tp = quantizer_setup.quantization_points[qp_id].target_point - - range_init_minmax_values = None - if minmax_values_for_range_init: - minmax_stat = minmax_values_for_range_init.get(qp_id) - if minmax_stat is not None: - range_init_minmax_values = (minmax_stat.min_values, minmax_stat.max_values) - - quantizer_module_id, commands = self._quantize_at_points_by_single_module( - target_model, - [ - tp, - ], - qspec, - range_init_minmax_values, - ) - - if ( - qp.is_weight_quantization_point() - and nncf_node.is_shared() - and nncf_node.layer_name not in already_weight_quantized_shared_layers - ): - already_weight_quantized_shared_layers[nncf_node.layer_name] = quantizer_module_id - - qp_id_vs_quant_module_id_dict[qp_id] = quantizer_module_id - insertion_commands += commands - - adjust_padding_args = self._collect_adjust_padding_args( - non_unified_scales_quantization_point_ids, - qp_id_vs_quant_module_id_dict, - quantizer_setup, - target_model_graph, - ) - - commands = self._add_adjust_padding_ops(adjust_padding_args) - if commands: - insertion_commands += commands - - return insertion_commands, qp_id_vs_quant_module_id_dict - - def _remove_shared_layer_weight_quantization_point_duplicates( - self, - unified_scales_group: set[QuantizationPointId], - quantizer_setup: PTQuantizerSetup, - target_model_graph: NNCFGraph, - ) -> tuple[set[QuantizationPointId], set[str]]: - observed_shared_layer_names = set() - retval = set() - for us_qp_id in unified_scales_group: - qp = quantizer_setup.quantization_points[us_qp_id] - if qp.is_weight_quantization_point(): - nncf_node = target_model_graph.get_node_by_name(qp.target_point.target_node_name) - if nncf_node.is_shared(): - if nncf_node.layer_name not in observed_shared_layer_names: - observed_shared_layer_names.add(nncf_node.layer_name) - else: - nncf_logger.debug_once( - f"Filtering a unified-scale weight quantization point {us_qp_id} " - f"- already quantized as a shared layer {nncf_node.layer_name}", - ) - continue - retval.add(us_qp_id) - return retval, observed_shared_layer_names - - def _collect_adjust_padding_args( - self, - non_unified_scales_quantization_point_ids: set[QuantizationPointId], - qp_id_vs_quant_module_id_dict: dict[QuantizationPointId, QuantizerId], - quantizer_setup: PTQuantizerSetup, - target_model_graph: NNCFGraph, - ) -> list[AdjustPaddingArgs]: - def weight_qp_filter_fn(qp_id_): - qp_ = quantizer_setup.quantization_points[qp_id_] - return qp_.is_weight_quantization_point() - - weight_qps = list(filter(weight_qp_filter_fn, non_unified_scales_quantization_point_ids)) - adjust_padding_args = [] - adjust_padding_operation_set = set() - if self.hw_config is not None: - adjust_padding_operation_set = self.hw_config.get_operations_with_adjusted_paddings() - for wqp_id in weight_qps: - wqp = quantizer_setup.quantization_points[wqp_id] - tp = quantizer_setup.quantization_points[wqp_id].target_point - target_node = target_model_graph.get_node_by_name(tp.target_node_name) - - op_type = target_node.metatype - is_adjust_padding_applicable = op_type in adjust_padding_operation_set - if self._should_setup_adjust_pad_ops and is_adjust_padding_applicable: - gid = quantizer_setup.get_shared_inputs_group_id(wqp_id) - shared_input_group = quantizer_setup.shared_input_operation_set_groups[gid] - - def is_qp_quantizing_same_op_as_wqp(qp_id_): - qp_ = quantizer_setup.quantization_points[qp_id_] - node_matched = target_node.node_name in qp_.directly_quantized_operator_node_names # noqa: B023 - return qp_.is_activation_quantization_point() and node_matched - - for qp_id in filter(is_qp_quantizing_same_op_as_wqp, shared_input_group): - quantizer_module_id = qp_id_vs_quant_module_id_dict[qp_id] - activation_quantizer = self._non_weight_quantizers[quantizer_module_id].quantizer_module_ref - args = self._get_adjust_padding_args( - target_model_graph, - wqp, - activation_quantizer, - list(quantizer_setup.quantization_points.values()), - ) - if args: - adjust_padding_args.extend(args) - return adjust_padding_args - - def _build_commands_for_single_unified_scale_group( - self, - target_model: NNCFNetwork, - quantizer_setup: PTQuantizerSetup, - unified_scales_group: set[QuantizationPointId], - minmax_values_for_range_init: dict[QuantizationPointId, MinMaxTensorStatistic], - ) -> tuple[QuantizerId, list[PTInsertionCommand]]: - qp_ids_list_for_current_group = list(unified_scales_group) - - # The primary insertion point (to be associated with the actual quantizer module, not just hooks to it) - # will be determined based on the string representation of said insertion point, to avoid random selection. - # Weight insertion points are given priority. - weight_qp_ids = [ - qp_id - for qp_id in qp_ids_list_for_current_group - if quantizer_setup.quantization_points[qp_id].is_weight_quantization_point() - ] - act_qp_ids = [ - qp_id - for qp_id in qp_ids_list_for_current_group - if quantizer_setup.quantization_points[qp_id].is_activation_quantization_point() - ] - - def ip_str_repr_key_lambda(x): - return str(quantizer_setup.quantization_points[x].target_point.target_node_name) - - sorted_wqp_ids = sorted(weight_qp_ids, key=ip_str_repr_key_lambda) - sorted_aqp_ids = sorted(act_qp_ids, key=ip_str_repr_key_lambda) - sorted_qp_ids = sorted_wqp_ids + sorted_aqp_ids - - primary_qp_id = sorted_qp_ids[0] - linked_qp_ids = sorted_qp_ids[1:] - qspec = quantizer_setup.quantization_points[primary_qp_id].qspec - linked_qspecs = [quantizer_setup.quantization_points[qp_id].qspec for qp_id in linked_qp_ids] - for linked_qspec in linked_qspecs: - if qspec != linked_qspec: - msg = "The qspecs for unified scale quantization points should be identical!" - raise nncf.InternalError(msg) - - range_init_minmax_values = None - if minmax_values_for_range_init: - # Hopefully this will suffice. - # TODO: gather unified statistic by linking stat collectors_and_modules_to_init instead - min_values = None - max_values = None - for qp_id in sorted_qp_ids: - minmax_stat = minmax_values_for_range_init.get(qp_id) - if minmax_stat is None: - continue - - if min_values is None: - min_values = minmax_stat.min_values.data - else: - min_values = torch.min(min_values, minmax_stat.min_values.data) - - if max_values is None: - max_values = minmax_stat.max_values.data - else: - max_values = torch.max(max_values, minmax_stat.max_values.data) - if min_values is not None and max_values is not None: - range_init_minmax_values = min_values, max_values - - target_points = [quantizer_setup.quantization_points[qp_id].target_point for qp_id in sorted_qp_ids] - quantizer_module_id, commands = self._quantize_at_points_by_single_module( - target_model, target_points, qspec, range_init_minmax_values - ) - return quantizer_module_id, commands - - def _select_final_qconfig(self, quantizer_config_list: list[QuantizerConfig]) -> QuantizerConfig: - # Quantizer config list entries should arrive in the same order as they are listed - # in the HW config, where they are sorted by descending order of priority - return quantizer_config_list[0] - - def _quantize_at_points_by_single_module( - self, - target_model: NNCFNetwork, - insertion_points: list[PTTargetPoint], - qspec: PTQuantizerSpec, - range_init_minmax_values: tuple[torch.Tensor, torch.Tensor] = None, - ) -> tuple[QuantizerId, list[PTInsertionCommand]]: - """ - Will generate insertion commands for quantization at possibly multiple points - in the network using one and the same trainable quantizer module. The trainable - quantizer module will be saved either inside the weightable module which weights - it quantizes (for single-point weight quantization), or in a NNCFNetwork wrapper - module (i.e. in a storage external to the original module). - :param: target_model - the model to be quantized. - :param: insertion_points - a list of target points for quantization using one - quantizer module - :param: qconfig - the QuantizerConfig for the resulting quantizer module - :param: range_init_minmax_values - a pair of minimum and maximum values of input statistics - for initializing the quantizer's trainable parameters - :return: A tuple with the identifier of the new quantizer module and a list of - insertion commands registering this module for quantization at spots described by - insertion_points. - """ - target_model_graph = target_model.nncf.get_original_graph() - if not insertion_points: - msg = "No insertion points to put quantizers into!" - raise nncf.InternalError(msg) - - def is_weights(ip: PTTargetPoint) -> bool: - return ip.target_type is TargetType.OPERATION_WITH_WEIGHTS - - primary_ip = insertion_points[0] - - quantizer = self.__create_quantize_module(qspec).to(self._device_for_callable_obj_creation) - if range_init_minmax_values is not None: - # Need to cast to the model's current dtype since the statistics could have been gathered in an - # AMP autocast model (and therefore be FP16 since AMP autocast switches precision of activations - # at forward pass time) - own_type = get_model_dtype(target_model) - min_values = range_init_minmax_values[0].data.type(own_type) - max_values = range_init_minmax_values[1].data.type(own_type) - - quantizer.apply_minmax_init(min_values=min_values, max_values=max_values, log_module_name=str(primary_ip)) - - qids: list[QuantizerId] = [] - for ip in insertion_points: - if is_weights(ip): - qids.append(WeightQuantizerId(ip.target_node_name)) - else: - qids.append(NonWeightQuantizerId(ip.target_node_name, ip.input_port_id)) - - serialized_insertions_list = [str(x) for x in qids] - external_quantizer_storage_key = ";".join(serialized_insertions_list) - if len(insertion_points) > 1: - linked_quantizers_str = "\n".join(serialized_insertions_list) - nncf_logger.info_once(f"Scales will be unified for quantizer group:\n{linked_quantizers_str}\n") - - if is_weights(primary_ip): - primary_qid = WeightQuantizerId(primary_ip.target_node_name) - self._weight_quantizers[primary_qid] = WeightQuantizerInfo( - quantizer, insertion_points, target_model.nncf.get_containing_module(primary_ip.target_node_name) - ) - module_node = target_model_graph.get_node_by_name(primary_ip.target_node_name) - layer_attributes = module_node.layer_attributes - input_shape = get_weight_shape_legacy(layer_attributes) - self._quantizers_input_shapes[primary_qid] = tuple(input_shape) - else: - primary_qid = NonWeightQuantizerId(primary_ip.target_node_name, primary_ip.input_port_id) - self._non_weight_quantizers[primary_qid] = NonWeightQuantizerInfo(quantizer, insertion_points) - input_shape = target_model_graph.get_input_shape_for_insertion_point(insertion_points[0]) - self._quantizers_input_shapes[primary_qid] = input_shape - - if not (is_weights(primary_ip) and len(insertion_points) == 1): - assert external_quantizer_storage_key not in target_model.nncf.get_compression_modules_by_type( - ExtraCompressionModuleType.EXTERNAL_QUANTIZER - ) - - target_model.nncf.add_compression_module( - external_quantizer_storage_key, quantizer, ExtraCompressionModuleType.EXTERNAL_QUANTIZER - ) - - insertion_commands = [] - for curr_insertion_point in insertion_points: - if curr_insertion_point in self._processed_insertion_points: - msg = f"Insertion point {str(curr_insertion_point)} already quantized!" - raise nncf.InternalError(msg) - self._processed_insertion_points.add(curr_insertion_point) - - if is_weights(curr_insertion_point): - if len(insertion_points) == 1: - # For backward compatibility, if only one weight is quantized by a single quantizer, - # insert UpdateWeight ops with a genuine quantizer module - callable_obj = quantizer - else: - # Otherwise use external quantizer module storage since the quantization points will have to - # share the single module and this would be impossible for multiple weight quantizer sharing if - # the corresponding UpdateWeights operations contained real modules (these would simply get copied - # by PyTorch internals) - callable_obj = ExternalQuantizerCallHook(external_quantizer_storage_key, self._debug_interface) - else: - # Hooks will be identical for each affected op_address in the linked scenario - # - will call one and the same quantizer - callable_obj = ExternalQuantizerCallHook(external_quantizer_storage_key, self._debug_interface) - - nncf_logger.debug_once( - f"Performing " - f"{'signed' if quantizer.signed else 'unsigned'} " - f"{'logarithm_scale' if quantizer.is_using_log_scale_storage else ''} " - f"{'weight' if is_weights(curr_insertion_point) else 'activation'} " - f"quantization for: {str(curr_insertion_point)}", - ) - - insertion_commands.append( - PTInsertionCommand(curr_insertion_point, callable_obj, TransformationPriority.QUANTIZATION_PRIORITY) - ) - return primary_qid, insertion_commands - - def _are_frozen_layers_allowed(self) -> tuple[bool, str]: - message_template = Template("Frozen layers are$denial allowed for $algo_prefix quantization") - bits = set() - bits.update({wq.quantizer_module_ref.num_bits for wq in self._weight_quantizers.values()}) - bits.update({nwq.quantizer_module_ref.num_bits for nwq in self._non_weight_quantizers.values()}) - - if self._precision_init_params or len(bits) > 1: - return False, message_template.substitute(denial=" not", algo_prefix="mixed precision") - - if len(bits) == 1: - bitwidth = bits.pop() - algo_prefix = f"INT{bitwidth}" - if bitwidth == 8: - return True, message_template.substitute(denial="", algo_prefix=algo_prefix) - return False, message_template.substitute(denial=" not", algo_prefix=algo_prefix) - return True, message_template.substitute(denial="", algo_prefix="empty") - - def _get_compression_lr_multiplier(self) -> Optional[float]: - return self.config.get_redefinable_global_param_value_for_algo("compression_lr_multiplier", self.name) - - def initialize(self, model: NNCFNetwork) -> None: - if is_main_process() and self.should_init: - bn_adapt_params = self._parse_bn_adapt_params() - if bn_adapt_params is not None: - bn_adaptation = BatchnormAdaptationAlgorithm( - **extract_bn_adaptation_init_params(self.config, "quantization") - ) - bn_adaptation.run(model) - - -@api() -class QuantizationController(QuantizationControllerBase): - """ - Controller for the quantization algorithm in PT. - """ - - def __init__( - self, - target_model: NNCFNetwork, - config: NNCFConfig, - debug_interface: "QuantizationDebugInterface", - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - non_weight_quantizers: dict[NonWeightQuantizerId, NonWeightQuantizerInfo], - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - quantizers_input_shapes: dict[QuantizerId, tuple[int]], - build_time_metric_info: QuantizationShareBuildTimeInfo = None, - build_time_range_init_params: PTRangeInitParams = None, - ): - super().__init__(target_model) - self._loss = ZeroCompressionLoss(get_model_device(target_model)) - self._scheduler = BaseCompressionScheduler() - self.debug_interface = debug_interface - self.config = config - algo_config = self._get_algo_config() - self._build_time_range_init_params = build_time_range_init_params - - self.weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo] = weight_quantizers - self.non_weight_quantizers: dict[NonWeightQuantizerId, NonWeightQuantizerInfo] = non_weight_quantizers - self.all_quantizations: dict[QuantizerId, BaseQuantizer] = OrderedDict() - self.all_quantizations.update({k: v.quantizer_module_ref for k, v in self.weight_quantizers.items()}) - self.all_quantizations.update({k: v.quantizer_module_ref for k, v in self.non_weight_quantizers.items()}) - self._quantizers_input_shapes = quantizers_input_shapes - self._distributed = False - self._groups_of_adjacent_quantizers = groups_of_adjacent_quantizers - self._bn_adaptation = None - self._build_time_metric_info = build_time_metric_info - self._target_device = self.config.get("target_device", "ANY") - - should_export_to_onnx_qdq = algo_config.get( - "export_to_onnx_standard_ops", QUANTIZATION_EXPORT_TO_ONNX_STANDARD_OPS - ) - if should_export_to_onnx_qdq: - warning_deprecated( - "The config option `export_to_onnx_standard_ops` is deprecated and will be removed " - "in a future version. Please use the `nncf.strip(quantized_model)` method before export to ONNX " - "to get model with QuantizeLinear-DequantizeLinear node pairs." - ) - export_mode = QuantizerExportMode.ONNX_QUANTIZE_DEQUANTIZE_PAIRS - else: - export_mode = QuantizerExportMode.FAKE_QUANTIZE - - for quantizer in self.all_quantizations.values(): - quantizer.set_export_mode(export_mode) - - params = algo_config.get("params", None) - self.is_staged_scheduler = bool(params) - - # Staged scheduler must be created after initialized to prevent extra logic with disabled quantizations - if self.is_staged_scheduler: - scheduler_cls = QUANTIZATION_SCHEDULERS.get("staged") - self._scheduler = scheduler_cls(self, params) - - @property - def scheduler(self) -> CompressionScheduler: - return self._scheduler - - @property - def loss(self) -> CompressionLoss: - return self._loss - - @property - def groups_of_adjacent_quantizers(self) -> GroupsOfAdjacentQuantizers: - return self._groups_of_adjacent_quantizers - - def prepare_for_export(self): - for quantizer_id, quantizer in self.all_quantizations.items(): - if not quantizer.is_enabled_quantization(): - nncf_logger.debug(f"Disabled quantization on export to ONNX: {quantizer_id}") - - def distributed(self): - self._distributed = True - self._broadcast_initialized_params_for_each_quantizer() - - def _get_algo_config(self) -> dict: - return extract_algo_specific_config(self.config, "quantization") - - def _broadcast_initialized_params_for_each_quantizer(self): - # NOTE: Order of quantization modules must be the same on GPUs to correctly broadcast num_bits - sorted_quantizers = OrderedDict(sorted(self.all_quantizations.items(), key=lambda x: str(x[0]))) - for quantizer in sorted_quantizers.values(): - quantizer.broadcast_initialized_params() - - def _do_runtime_range_init(self, range_init_params: PTRangeInitParams): - modules_to_init = OrderedDict() - for wq_id, wq_info in self.weight_quantizers.items(): - group = QuantizerGroup.WEIGHTS - init_config = range_init_params.get_init_config_for_scope_and_group(wq_id, group) - is_weights = True - modules_to_init[str(wq_id)] = ( - wq_info.quantizer_module_ref, - init_config, - is_weights, - self._quantizers_input_shapes[wq_id], - ) - - for aq_id, aq_info in self.non_weight_quantizers.items(): - group = QuantizerGroup.ACTIVATIONS - init_config = range_init_params.get_init_config_for_scope_and_group(aq_id, group) - is_weights = False - modules_to_init[str(aq_id)] = ( - aq_info.quantizer_module_ref, - init_config, - is_weights, - self._quantizers_input_shapes[aq_id], - ) - - # NOTE: Order of modules must be the same to correctly broadcast parameters (e.g. input_low - # and input_range) - modules_to_init = OrderedDict(sorted(modules_to_init.items())) - self.modules_to_range_init = modules_to_init - runner = DataLoaderRangeInitializeRunner(self._model, modules_to_init, range_init_params.device) - - quantizers = [module for module, config, is_weights, input_shape in modules_to_init.values()] - quantizers_switcher = QuantizersSwitcher(quantizers) - # bypass quantization to collect statistics from floating point model - quantizers_switcher.disable_quantizers() - with training_mode_switcher(self._model, is_training=False): - # Statistics should be collected in eval mode because the model in train mode may behave differently - runner.run(range_init_params.init_range_data_loader, range_init_params.get_max_num_init_steps()) - quantizers_switcher.enable_quantizers() - - self._model.nncf.rebuild_graph() - - def compression_stage(self) -> CompressionStage: - if self.is_staged_scheduler: - return self.scheduler.compression_stage() - return CompressionStage.FULLY_COMPRESSED - - def init_precision( - self, - precision_init_type: str, - precision_init_params: BasePrecisionInitParams, - precision_constraints: HardwareQuantizationConstraints, - ) -> SingleConfigQuantizerSetup: - """ - Precision initialization happens based on an measure of layer sensitivity to perturbations. The measure is - calculated by average Hessian trace estimation for each layer using Hutchinson algorithm. - """ - init_impl = PrecisionInitializerFactory.create(precision_init_type) - initializer = init_impl(self, precision_init_params, precision_constraints) - nncf_logger.info("Initializing quantizer precisions...") - return initializer.apply_init() - - def init_range(self, range_init_params: PTRangeInitParams = None): - """ - Tracks input statistics for quantizers in the model and sets ranges of the quantizers to correspond to - minimum and maximum input tensor levels observed. - :param range_init_params: specifies parameters for this range initialization call; if None, the parameters - that were used during compressed model creation will be used. - """ - if range_init_params is None: - if self._build_time_range_init_params is None: - nncf_logger.error( - "Requested a quantization controller to do range initialization without " - "`range_init_params` function parameter supplied, but the build time range " - "initialization was not supplied with params as well. " - "Range initialization will not be done." - ) - return - range_init_params = self._build_time_range_init_params - - self._do_runtime_range_init(range_init_params) - - if self._distributed: - self._broadcast_initialized_params_for_each_quantizer() - - def enable_activation_quantization(self): - for m in self.non_weight_quantizers.values(): - m.quantizer_module_ref.enable_quantization() - - def enable_weight_quantization(self): - for m in self.weight_quantizers.values(): - m.quantizer_module_ref.enable_quantization() - - def disable_activation_quantization(self): - for m in self.non_weight_quantizers.values(): - m.quantizer_module_ref.disable_quantization() - - def disable_weight_quantization(self): - for m in self.weight_quantizers.values(): - m.quantizer_module_ref.disable_quantization() - - def statistics(self, quickly_collected_only=False) -> NNCFStatistics: - if not quickly_collected_only and is_debug(): - stats = MemoryConsumptionStatisticsCollector( - self.model, self.weight_quantizers, self.non_weight_quantizers - ).collect() - nncf_logger.debug(stats.to_str()) - - stats = ShareEdgesQuantizedDataPathStatisticsCollector(self.model, self, self._target_device).collect() - nncf_logger.debug(stats.to_str()) - - collector = PTQuantizationStatisticsCollector( - self.weight_quantizers, self.non_weight_quantizers, self._build_time_metric_info - ) - stats = collector.collect() - - nncf_stats = NNCFStatistics() - nncf_stats.register("quantization", stats) - return nncf_stats - - def strip_model( - self, model: NNCFNetwork, do_copy: bool = False, strip_format: StripFormat = StripFormat.NATIVE - ) -> NNCFNetwork: - if do_copy: - model = copy_model(model) - model = strip_quantized_model(model, strip_format) - return model - - -class ExperimentalQuantizationBuilder(QuantizationBuilder): - def __init__( - self, - quantizer_setup: MultiConfigQuantizerSetup, - initial_quantizer_setup: SingleConfigQuantizerSetup, - tensor_stats_for_all_setup_variations: dict[PTTargetPoint, dict[ReductionAxes, TensorStatistic]], - hw_config: HWConfig = None, - ): - should_init = bool(tensor_stats_for_all_setup_variations) - super().__init__(NNCFConfig(), should_init=should_init) - self._initial_quantizer_setup = initial_quantizer_setup - self._quantizer_setup = quantizer_setup - self._tensor_stats = tensor_stats_for_all_setup_variations - self._should_setup_adjust_pad_ops = False - self.hw_config = hw_config - - def _handle_frozen_layers(self, target_model: NNCFNetwork): - pass - - def _get_single_config_quantizer_setup(self, target_model) -> SingleConfigQuantizerSetup: - return self._initial_quantizer_setup - - def _get_statistics_for_final_range_init( - self, target_model: NNCFNetwork, quantizer_setup: QuantizerSetupBase, range_init_params: PTRangeInitParams - ) -> dict[PTTargetPoint, dict[ReductionAxes, TensorStatistic]]: - return self._tensor_stats - - def _build_controller(self, model: NNCFNetwork) -> "ExperimentalQuantizationController": - groups_of_adjacent_quantizers = GroupsOfAdjacentQuantizers() - all_quantizations: dict[QuantizerId, BaseQuantizer] = {} - all_quantizations.update({k: v.quantizer_module_ref for k, v in self._weight_quantizers.items()}) - all_quantizations.update({k: v.quantizer_module_ref for k, v in self._non_weight_quantizers.items()}) - - groups_of_adjacent_quantizers.parse_from_quantizer_setup( - all_quantizations, self._pt_quantizer_setup, self._setup_to_module_id_translation_dict - ) - - build_time_metric_infos = QuantizationShareBuildTimeInfo( - len(self._non_weight_quantizers), len(self._weight_quantizers) - ) - - return ExperimentalQuantizationController( - model, - self._weight_quantizers, - self._non_weight_quantizers, - groups_of_adjacent_quantizers, - self._quantizers_input_shapes, - self._quantizer_setup, - self._initial_quantizer_setup, - self._setup_to_module_id_translation_dict, - self._tensor_stats, - build_time_metric_infos, - self._should_setup_adjust_pad_ops, - self.hw_config, - ) - - def initialize(self, model: NNCFNetwork) -> None: - pass - - def _get_algo_specific_config_section(self) -> dict: - return {} - - def _parse_range_init_params(self) -> Optional[PTRangeInitParams]: - return None - - def _get_compression_lr_multiplier(self) -> Optional[float]: - return None - - -class ExperimentalQuantizationController(QuantizationController): - def __init__( - self, - target_model: NNCFNetwork, - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - non_weight_quantizers: dict[NonWeightQuantizerId, NonWeightQuantizerInfo], - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - quantizers_input_shapes: dict[QuantizerId, tuple[int]], - quantizer_setup: MultiConfigQuantizerSetup, - initial_quantizer_setup: SingleConfigQuantizerSetup, - setup_to_module_id_translation_dict: dict[QuantizationPointId, QuantizerId], - tensor_stats: dict[PTTargetPoint, dict[ReductionAxes, TensorStatistic]], - build_time_metric_info: QuantizationShareBuildTimeInfo, - should_setup_adjust_pad_ops=False, - hw_config: HWConfig = None, - ): - super().__init__( - target_model, - NNCFConfig(), - debug_interface=None, - weight_quantizers=weight_quantizers, - non_weight_quantizers=non_weight_quantizers, - groups_of_adjacent_quantizers=groups_of_adjacent_quantizers, - quantizers_input_shapes=quantizers_input_shapes, - build_time_metric_info=build_time_metric_info, - ) - self._target_model_ref = target_model - self._should_setup_adjust_pad_ops = should_setup_adjust_pad_ops - self._quantizer_setup = quantizer_setup - self._initial_quantizer_setup = initial_quantizer_setup - self._tensor_stats = tensor_stats - self.setup_to_module_id_translation_dict = setup_to_module_id_translation_dict - self.module_id_to_qp_id_translation_dict: dict[QuantizerId, set[QuantizationPointId]] = {} - for qp_id, qid in self.setup_to_module_id_translation_dict.items(): - if qid in self.module_id_to_qp_id_translation_dict: - self.module_id_to_qp_id_translation_dict[qid].add(qp_id) - else: - self.module_id_to_qp_id_translation_dict[qid] = {qp_id} - self.hw_config = hw_config - - @property - def loss(self) -> CompressionLoss: - return self._loss - - @property - def scheduler(self) -> CompressionScheduler: - return self._scheduler - - def get_quantizer_setup_for_current_state(self) -> SingleConfigQuantizerSetup: - qpid_vs_selected_qconfig = {} - for qp_id in self._initial_quantizer_setup.quantization_points: - quant_module_id = self.setup_to_module_id_translation_dict[qp_id] - quant_module = self.all_quantizations[quant_module_id] - qconfig = quant_module.get_quantizer_config() - qpid_vs_selected_qconfig[qp_id] = qconfig - return self._quantizer_setup.select_qconfigs(qpid_vs_selected_qconfig, strict=False) - - def is_new_setup_requires_regeneration(self, quantizer_setup: SingleConfigQuantizerSetup) -> bool: - current_setup = self.get_quantizer_setup_for_current_state() - if Counter(current_setup.quantization_points.keys()) != Counter(quantizer_setup.quantization_points.keys()): - msg = "The new setup is inconsistent with the original parameter space!" - raise ValueError(msg) - for qp_id, qp in quantizer_setup.quantization_points.items(): - current_qconfig = current_setup.quantization_points[qp_id].qconfig - new_qconfig = quantizer_setup.quantization_points[qp_id].qconfig - new_padding_adjust_applicable = CalculatePaddingAdjustment.is_config_applicable(new_qconfig) - current_padding_adjust_applicable = CalculatePaddingAdjustment.is_config_applicable(current_qconfig) - need_padding_regeneration = ( - self._should_setup_adjust_pad_ops - and qp.is_activation_quantization_point() - and new_padding_adjust_applicable != current_padding_adjust_applicable - ) - if ( - current_qconfig.per_channel != new_qconfig.per_channel - or ( - new_qconfig.signedness_to_force is not None - and current_qconfig.signedness_to_force != new_qconfig.signedness_to_force - ) - or current_qconfig.mode != new_qconfig.mode - or need_padding_regeneration - ): - return True - return False - - def apply_new_quantizer_setup( - self, quantizer_setup: SingleConfigQuantizerSetup - ) -> tuple["ExperimentalQuantizationController", NNCFNetwork]: - if not self.is_new_setup_requires_regeneration(quantizer_setup): - for qp_id, qp in quantizer_setup.quantization_points.items(): - quant_module_id = self.setup_to_module_id_translation_dict[qp_id] - quant_module = self.all_quantizations[quant_module_id] - quant_module.num_bits = qp.qconfig.num_bits - return self, self._target_model_ref - new_model = self._target_model_ref.nncf.get_clean_shallow_copy() - new_builder = ExperimentalQuantizationBuilder( - self._quantizer_setup, - initial_quantizer_setup=quantizer_setup, - tensor_stats_for_all_setup_variations=self._tensor_stats, - hw_config=self.hw_config, - ) - new_builder.apply_to(new_model) - new_ctrl = new_builder.build_controller(new_model) - return new_ctrl, new_model - - def _get_algo_config(self) -> dict: - return {} diff --git a/src/nncf/torch/quantization/base_ctrl.py b/src/nncf/torch/quantization/base_ctrl.py deleted file mode 100644 index b2d1dc17090..00000000000 --- a/src/nncf/torch/quantization/base_ctrl.py +++ /dev/null @@ -1,32 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from nncf.torch.compression_method_api import PTCompressionAlgorithmController - - -class QuantizationControllerBase(PTCompressionAlgorithmController): - """ - Base controller class for the quantization controllers in PT. - """ - - def enable_activation_quantization(self): - raise NotImplementedError - - def enable_weight_quantization(self): - raise NotImplementedError - - def disable_activation_quantization(self): - raise NotImplementedError - - def disable_weight_quantization(self): - raise NotImplementedError - - def init_range(self): - raise NotImplementedError diff --git a/src/nncf/torch/quantization/hessian_trace.py b/src/nncf/torch/quantization/hessian_trace.py deleted file mode 100644 index 5e4f24d1a3e..00000000000 --- a/src/nncf/torch/quantization/hessian_trace.py +++ /dev/null @@ -1,167 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from functools import partial -from typing import Any, Callable, Union - -import torch -from torch import Tensor -from torch import nn -from torch.nn import Parameter -from torch.nn.modules.loss import _Loss -from torch.utils.data import DataLoader - -from nncf.common.logging import nncf_logger -from nncf.torch.initialization import PTInitializingDataLoader -from nncf.torch.initialization import wrap_dataloader_for_init -from nncf.torch.nested_objects_traversal import objwalk -from nncf.torch.utils import get_model_device -from nncf.torch.utils import is_tensor - - -class ParameterHandler: - def __init__(self, parameters: list[Parameter], device: str): - self._device = device - self._parameters = parameters - - @property - def parameters(self) -> list[Parameter]: - return self._parameters - - def get_gradients(self) -> list[Union[Tensor, float]]: - gradients = [] - for parameter in self.parameters: - gradients.append(0.0 if parameter.grad is None else parameter.grad + 0.0) - return gradients - - def sample_rademacher_like_params(self) -> list[Tensor]: - def sample(parameter): - r = torch.randint_like(parameter, high=2, device=self._device) - return r.masked_fill_(r == 0, -1) - - return [sample(p) for p in self.parameters] - - def sample_normal_like_params(self) -> list[Tensor]: - return [torch.randn(p.size(), device=self._device) for p in self.parameters] - - -class GradientsCalculator: - def __init__( - self, - model: nn.Module, - criterion_fn: Callable[[Any, Any, _Loss], torch.Tensor], - criterion: _Loss, - data_loader: PTInitializingDataLoader, - num_data_iter: int, - parameter_handler: ParameterHandler, - ): - self._model = model - self._criterion_fn = criterion_fn - self._criterion = criterion - self._data_loader = data_loader - self._num_data_iter = num_data_iter - self._parameter_handler = parameter_handler - self.num_iter = 0 - - def __iter__(self): - self.data_loader_iter = iter(self._data_loader) - self.num_iter = 0 - return self - - def __next__(self): - if self.num_iter >= self._num_data_iter: - raise StopIteration - self.num_iter += 1 - dataloader_output = next(self.data_loader_iter) - - device = get_model_device(self._model) - to_device_fn = partial(torch.Tensor.to, device=device) - dataloader_output = objwalk(dataloader_output, is_tensor, to_device_fn) - args, kwargs = self._data_loader.get_inputs(dataloader_output) - - self._model.zero_grad() - - target = self._data_loader.get_target(dataloader_output) - outputs = self._model(*args, **kwargs) - loss = self._criterion_fn(outputs, target, self._criterion) - - loss.backward(create_graph=True) - grads = self._parameter_handler.get_gradients() - self._model.zero_grad() - return grads - - -class HessianTraceEstimator: - """ - Performs estimation of Hessian Trace based on Hutchinson algorithm. - """ - - def __init__( - self, - model: nn.Module, - criterion_fn: Callable[[Any, Any, _Loss], torch.Tensor], - criterion: _Loss, - device: str, - data_loader: DataLoader, - num_data_points: int, - ): - self._model = model - parameters = [p for p in model.parameters() if p.requires_grad] - self._parameter_handler = ParameterHandler(parameters, device) - self._batch_size = data_loader.batch_size - data_loader = wrap_dataloader_for_init(data_loader) - self._num_data_iter = num_data_points // self._batch_size if num_data_points >= self._batch_size else 1 - self._gradients_calculator = GradientsCalculator( - self._model, criterion_fn, criterion, data_loader, self._num_data_iter, self._parameter_handler - ) - self._diff_eps = 1e-6 - - def get_average_traces(self, max_iter=500, tolerance=1e-5) -> Tensor: - """ - Estimates average hessian trace for each parameter - :param max_iter: maximum number of iterations for Hutchinson algorithm - :param tolerance: - minimum relative tolerance for stopping the algorithm. - It's calculated between mean average trace from previous iteration and current one. - :return: Tensor with average hessian trace per parameter - """ - avg_total_trace = 0.0 - avg_traces_per_iter: list[Tensor] = [] - mean_avg_traces_per_param = None - - for i in range(max_iter): - avg_traces_per_iter.append(self._calc_avg_traces_per_param()) - - mean_avg_traces_per_param = self._get_mean(avg_traces_per_iter) - mean_avg_total_trace = torch.sum(mean_avg_traces_per_param) - - diff_avg = abs(mean_avg_total_trace - avg_total_trace) / (abs(avg_total_trace) + self._diff_eps) - if diff_avg < tolerance: - return mean_avg_traces_per_param - avg_total_trace = mean_avg_total_trace - nncf_logger.debug(f"{i}# difference_avg={diff_avg} avg_trace={avg_total_trace}") - - return mean_avg_traces_per_param - - def _calc_avg_traces_per_param(self) -> Tensor: - v = self._parameter_handler.sample_rademacher_like_params() - vhp = self._parameter_handler.sample_normal_like_params() - num_all_data = self._num_data_iter * self._batch_size - for gradients in self._gradients_calculator: - vhp_curr = torch.autograd.grad( - gradients, self._parameter_handler.parameters, grad_outputs=v, only_inputs=True, retain_graph=False - ) - vhp = [a + b * float(self._batch_size) + 0.0 for a, b in zip(vhp, vhp_curr)] - vhp = [a / float(num_all_data) for a in vhp] - avg_traces_per_param = torch.stack([torch.sum(a * b) / a.size().numel() for (a, b) in zip(vhp, v)]) - return avg_traces_per_param - - @staticmethod - def _get_mean(data: list[Tensor]) -> Tensor: - return torch.mean(torch.stack(data), dim=0) diff --git a/src/nncf/torch/quantization/init_precision.py b/src/nncf/torch/quantization/init_precision.py deleted file mode 100644 index 931245bdb1b..00000000000 --- a/src/nncf/torch/quantization/init_precision.py +++ /dev/null @@ -1,25 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitializer -from nncf.torch.quantization.precision_init.hawq_init import HAWQPrecisionInitializer -from nncf.torch.quantization.precision_init.manual_init import ManualPrecisionInitializer - - -class PrecisionInitializerFactory: - @staticmethod - def create(init_type: str) -> type[BasePrecisionInitializer]: - if init_type == "manual": - return ManualPrecisionInitializer - if init_type == "hawq": - return HAWQPrecisionInitializer - raise NotImplementedError diff --git a/src/nncf/torch/quantization/init_range.py b/src/nncf/torch/quantization/init_range.py deleted file mode 100644 index 52496b5ce88..00000000000 --- a/src/nncf/torch/quantization/init_range.py +++ /dev/null @@ -1,326 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import OrderedDict -from copy import deepcopy -from typing import Callable - -import numpy as np -import torch - -import nncf -from nncf.common.graph.layer_attributes import WeightedLayerAttributes -from nncf.common.graph.utils import get_target_dim_for_compression_legacy -from nncf.common.graph.utils import get_weight_shape_legacy -from nncf.common.quantization.initialization.range import RangeInitCollectorParams -from nncf.common.quantization.initialization.range import RangeInitConfig -from nncf.common.quantization.initialization.range import RangeInitParams -from nncf.common.quantization.quantizer_setup import QuantizationPointBase -from nncf.common.quantization.quantizer_setup import QuantizerSetupBase -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.common.quantization.structs import QuantizationScheme -from nncf.common.quantization.structs import QuantizerGroup -from nncf.common.quantization.structs import QuantizerId -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.common.scopes import should_consider_scope -from nncf.common.tensor_statistics.collectors import ReductionAxes -from nncf.common.tensor_statistics.collectors import TensorStatisticCollectorBase -from nncf.config.schemata.algo.quantization import RANGE_INIT_TYPES_VS_DESCRIPTIONS -from nncf.experimental.common.tensor_statistics.collectors import AggregationAxes -from nncf.torch.graph.graph import PTNNCFGraph -from nncf.torch.initialization import DataLoaderBaseRunner -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import SymmetricQuantizer -from nncf.torch.quantization.layers import get_scale_shape -from nncf.torch.quantization.translator import PTTargetPointTranslator -from nncf.torch.tensor_statistics.algo import TensorStatisticObservationPoint -from nncf.torch.tensor_statistics.algo import create_register_input_hook -from nncf.torch.tensor_statistics.collectors import get_mean_percentile_statistic_collector -from nncf.torch.tensor_statistics.collectors import get_median_mad_statistic_collector -from nncf.torch.tensor_statistics.collectors import get_min_max_statistic_collector -from nncf.torch.tensor_statistics.collectors import get_mixed_min_max_statistic_collector -from nncf.torch.tensor_statistics.collectors import get_percentile_tensor_collector -from nncf.torch.tensor_statistics.statistics import pt_convert_stat_to_min_max_tensor_stat - - -class PTRangeInitParams(RangeInitParams): - def get_max_num_init_steps(self) -> int: - steps = [] - if self.global_init_config is not None: - steps.append(self.global_init_config.num_init_samples) - for pl_config in self.per_layer_range_init_configs: - steps.append(pl_config.num_init_samples) - batch_size = self.init_range_data_loader.batch_size - return int(np.ceil(max(steps) / batch_size)) - - def get_init_config_for_quantization_point(self, qp: QuantizationPointBase) -> RangeInitConfig: - if qp.is_weight_quantization_point(): - qid = WeightQuantizerId(qp.insertion_point.target_node_name) - group = QuantizerGroup.WEIGHTS - else: - qid = NonWeightQuantizerId(qp.insertion_point.target_node_name, qp.insertion_point.input_port_id) - group = QuantizerGroup.ACTIVATIONS - return self.get_init_config_for_scope_and_group(qid, group) - - def get_init_config_for_scope_and_group(self, qid: QuantizerId, group: QuantizerGroup) -> RangeInitConfig: - matches: list[RangeInitConfig] = [] - for pl_config in self.per_layer_range_init_configs: - should_be_considered = should_consider_scope(qid, pl_config.ignored_scopes, pl_config.target_scopes) - if should_be_considered and (group == pl_config.target_group or pl_config.target_group is None): - matches.append( - RangeInitConfig( - pl_config.init_type, pl_config.num_init_samples, pl_config.init_type_specific_params - ) - ) - if len(matches) > 1: - msg = f"Location {str(qid)} matches more than one per-layer initialization parameter definition!" - raise ValueError(msg) - if len(matches) == 1: - return matches[0] - if not matches and self.global_init_config is not None: - return deepcopy(self.global_init_config) - - msg = f"Location {str(qid)} does not match any per-layer initialization parameter definition!" - raise ValueError(msg) - - -class PTRangeInitCollectorParams(RangeInitCollectorParams): - def __init__( - self, is_weights: bool, scheme: QuantizationScheme, per_channel: bool, input_shape: tuple, channel_idx: int - ): - """ - :param is_weights: Boolean that defines tensor type. True for Weights, False for Activations. - :param scheme: Quantization scheme: symmetric or asymmetric. - :param input_shape: Shape of the input tensor. - :param channel_idx: Channel dimension. - """ - super().__init__(is_weights, scheme, per_channel) - self._input_shape = input_shape - self._channel_idx = channel_idx - - def get_reduction_aggregation_axes(self, is_per_sample: bool) -> tuple[ReductionAxes, AggregationAxes]: - if self.is_per_channel: - return super().get_reduction_aggregation_axes(self._input_shape, (self._channel_idx,), is_per_sample) - return super().get_reduction_aggregation_axes(self._input_shape, (), is_per_sample) - - -class StatCollectorGenerator: - @staticmethod - def generate_collectors_for_range_init_statistics_collection( - target_model_graph: PTNNCFGraph, quantizer_setup: QuantizerSetupBase, range_init_params: PTRangeInitParams - ) -> dict[TensorStatisticObservationPoint, dict[ReductionAxes, TensorStatisticCollectorBase]]: - retval = {} - for qp in quantizer_setup.quantization_points.values(): - init_config = range_init_params.get_init_config_for_quantization_point(qp) - is_weights = qp.is_weight_quantization_point() - num_batches = int( - np.ceil(init_config.num_init_samples / range_init_params.init_range_data_loader.batch_size) - ) - if is_weights: - # No need to store extra statistics in memory since weights won't change during range init - num_batches = 1 - - tp = PTTargetPointTranslator.translate(qp.insertion_point) - scale_shapes_vs_params = StatCollectorGenerator.get_all_scale_shapes_with_params(qp, target_model_graph) - - obs_p = TensorStatisticObservationPoint(tp, reduction_shapes=set(scale_shapes_vs_params.keys())) - - retval[obs_p] = {} - for scale_shape in obs_p.reduction_shapes: - collector_params = scale_shapes_vs_params[scale_shape] - collector = StatCollectorGenerator.generate_stat_collector_for_range_init_config( - init_config, scale_shape, collector_params, num_samples_to_collect_override=num_batches - ) - retval[obs_p][scale_shape] = collector - - return retval - - @staticmethod - def generate_stat_collector_for_range_init_config( - init_config: RangeInitConfig, - scale_shape: ReductionAxes = None, - collector_params: PTRangeInitCollectorParams = None, - num_samples_to_collect_override: int = None, - ) -> TensorStatisticCollectorBase: - num_samples = init_config.num_init_samples - if num_samples_to_collect_override is not None: - num_samples = num_samples_to_collect_override - if init_config.init_type not in RANGE_INIT_TYPES_VS_DESCRIPTIONS: - msg = f"Unknown range init type: {init_config.init_type}" - raise nncf.InternalError(msg) - - use_per_sample_stats = collector_params.use_per_sample_stats(init_config.init_type == "mixed_min_max") - reduction_axes, aggregation_axes = collector_params.get_reduction_aggregation_axes(use_per_sample_stats) - if init_config.init_type == "min_max": - return get_min_max_statistic_collector( - use_abs_max=collector_params.use_abs_max, - reduction_axes=reduction_axes, - aggregation_axes=aggregation_axes, - scale_shape=scale_shape, - num_samples=num_samples, - ) - if init_config.init_type == "mixed_min_max": - return get_mixed_min_max_statistic_collector( - use_abs_max=collector_params.use_abs_max, - reduction_axes=reduction_axes, - aggregation_axes=aggregation_axes, - scale_shape=scale_shape, - use_means_of_mins=collector_params.use_means_of_mins, - use_means_of_maxs=collector_params.use_means_of_maxs, - num_samples=num_samples, - ) - if init_config.init_type == "mean_min_max": - return get_mixed_min_max_statistic_collector( - use_abs_max=collector_params.use_abs_max, - reduction_axes=reduction_axes, - aggregation_axes=aggregation_axes, - scale_shape=scale_shape, - use_means_of_mins=True, - use_means_of_maxs=True, - num_samples=num_samples, - ) - if init_config.init_type == "threesigma": - return get_median_mad_statistic_collector( - reduction_axes=reduction_axes, - aggregation_axes=aggregation_axes, - scale_shape=scale_shape, - num_samples=num_samples, - ) - if init_config.init_type == "percentile": - min_percentile = init_config.init_type_specific_params.get("min_percentile", 0.1) - max_percentile = init_config.init_type_specific_params.get("max_percentile", 99.9) - return get_percentile_tensor_collector( - percentiles_to_collect=(min_percentile, max_percentile), - reduction_axes=reduction_axes, - aggregation_axes=aggregation_axes, - scale_shape=scale_shape, - num_samples=num_samples, - ) - - if init_config.init_type == "mean_percentile": - min_percentile = init_config.init_type_specific_params.get("min_percentile", 0.1) - max_percentile = init_config.init_type_specific_params.get("max_percentile", 99.9) - return get_mean_percentile_statistic_collector( - percentiles_to_collect=(min_percentile, max_percentile), - reduction_axes=reduction_axes, - aggregation_axes=aggregation_axes, - scale_shape=scale_shape, - num_samples=num_samples, - ) - msg = "Range init type not handled!" - raise ValueError(msg) - - @classmethod - def get_all_scale_shapes_with_params( - cls, qp: QuantizationPointBase, target_nncf_graph: PTNNCFGraph - ) -> dict[ReductionAxes, PTRangeInitCollectorParams]: - qconfigs = qp.get_all_configs_list() - if qp.is_weight_quantization_point(): - module_node = target_nncf_graph.get_node_by_name(qp.insertion_point.target_node_name) - layer_attributes = module_node.layer_attributes - assert isinstance(layer_attributes, WeightedLayerAttributes) - input_shape = get_weight_shape_legacy(layer_attributes) - channel_idx = get_target_dim_for_compression_legacy(layer_attributes) - else: - input_shape = target_nncf_graph.get_input_shape_for_insertion_point(qp.insertion_point) - channel_idx = 1 # channel dim for activations - - retval = {} - for qconfig in qconfigs: - is_weights = qp.is_weight_quantization_point() - scale_shape = tuple( - get_scale_shape( - input_shape, is_weights=is_weights, per_channel=qconfig.per_channel, channel_idx=channel_idx - ) - ) - - if scale_shape not in retval: - retval[scale_shape] = PTRangeInitCollectorParams( - is_weights, qconfig.mode, qconfig.per_channel, input_shape, channel_idx - ) - return retval - - -class DataLoaderRangeInitializeRunner(DataLoaderBaseRunner): - def __init__( - self, - model: NNCFNetwork, - modules_to_init_vs_init_configs: dict[str, tuple[BaseQuantizer, RangeInitConfig, bool, tuple[int]]], - init_device: str, - batch_size: int = None, - ): - super().__init__(model, init_device) - self.modules_to_init = modules_to_init_vs_init_configs - self.progressbar_description = "Range parameters initialization" - - self.collectors_and_modules_to_init: dict[str, tuple[TensorStatisticCollectorBase, BaseQuantizer]] = ( - OrderedDict() - ) - self.hook_handles = [] - self.batch_size = batch_size - - def _get_fwd_hook( - self, collector: TensorStatisticCollectorBase - ) -> Callable[["torch.Module", torch.Tensor, torch.Tensor], torch.Tensor]: - hook = create_register_input_hook(collector=collector) - - def fwd_hook(module, input_, output): - hook(input_[0]) - - return fwd_hook - - def _prepare_initialization(self): - for name, data in self.modules_to_init.items(): - quantizer_module, init_config, is_weights, input_shape = data - num_samples_override = None - if self.batch_size is not None: - num_batches = np.ceil(init_config.num_init_samples / self.batch_size) - num_samples_override = num_batches - - if isinstance(quantizer_module, SymmetricQuantizer): - mode = QuantizationScheme.SYMMETRIC - else: - mode = QuantizationScheme.ASYMMETRIC - - shape = quantizer_module.scale_shape - if shape == (1,): # Per-tensor - channel_idx = None - elif len(shape) > 1 and all(item == 1 for item in shape): - channel_idx = 0 # (1, 1, 1, 1) - doest not matter which dim is channel_idx - else: - if not is_weights: - channel_idx = 1 # channel dim for activations - else: - channel_idx = [i for i, val in enumerate(shape) if val != 1][0] - - collector_params = PTRangeInitCollectorParams( - is_weights, mode, quantizer_module.per_channel, input_shape, channel_idx - ) - - collector = StatCollectorGenerator.generate_stat_collector_for_range_init_config( - init_config, tuple(quantizer_module.scale_shape), collector_params, num_samples_override - ) - - self.collectors_and_modules_to_init[name] = collector, quantizer_module - - self.hook_handles.append(quantizer_module.register_forward_hook(self._get_fwd_hook(collector))) - - def _apply_initializers(self): - for handle in self.hook_handles: - handle.remove() - for scope_str, collector_and_module in self.collectors_and_modules_to_init.items(): - collector, quantizer_module = collector_and_module - target_stat = collector.get_statistics() - minmax_stats = pt_convert_stat_to_min_max_tensor_stat(target_stat) - quantizer_module.apply_minmax_init( - minmax_stats.min_values.data, minmax_stats.max_values.data, log_module_name=scope_str - ) diff --git a/src/nncf/torch/quantization/metrics.py b/src/nncf/torch/quantization/metrics.py deleted file mode 100644 index 82c1d39bcba..00000000000 --- a/src/nncf/torch/quantization/metrics.py +++ /dev/null @@ -1,384 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import deque -from copy import deepcopy -from itertools import chain -from typing import Any, Optional - -import networkx as nx -import numpy as np -import torch - -from nncf.common.collector import StatisticsCollector -from nncf.common.graph import NNCFGraph -from nncf.common.graph.graph import NNCFNode -from nncf.common.graph.graph_matching import find_subgraphs_matching_pattern -from nncf.common.graph.patterns.manager import PatternsManager -from nncf.common.graph.patterns.manager import TargetDevice -from nncf.common.quantization.collectors import QuantizationStatisticsCollector -from nncf.common.quantization.collectors import QuantizerDescription -from nncf.common.quantization.quantizer_propagation.structs import QuantizationTrait -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.common.utils.backend import BackendType -from nncf.common.utils.debug import is_debug -from nncf.torch.nncf_module_replacement import is_nncf_module -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.nncf_network import PTNNCFGraph -from nncf.torch.quantization.default_quantization import DEFAULT_PT_QUANT_TRAIT_TO_OP_DICT -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import SymmetricQuantizer -from nncf.torch.quantization.statistics import MemoryConsumptionStatistics -from nncf.torch.quantization.statistics import QuantizationConfigurationStatistics -from nncf.torch.quantization.structs import NonWeightQuantizerInfo -from nncf.torch.quantization.structs import WeightQuantizerInfo - - -class QuantizationShareBuildTimeInfo: - def __init__(self, aq_potential_num: int, wq_potential_num: int): - self.aq_potential_num = aq_potential_num - self.wq_potential_num = wq_potential_num - - def get_state(self) -> dict[str, Any]: - """ - Returns a dictionary with Python data structures (dict, list, tuple, str, int, float, True, False, None) that - represents state of the object. - - :return: state of the object - """ - return {"aq_potential_num": self.aq_potential_num, "wq_potential_num": self.wq_potential_num} - - @classmethod - def from_state(cls, state: dict[str, Any]) -> "QuantizationShareBuildTimeInfo": - """ - Creates the object from its state. - - :param state: Output of `get_state()` method. - """ - return cls(**state) - - -class PTQuantizationStatisticsCollector(QuantizationStatisticsCollector): - """ - Implementation of the quantization statistics collector for the PyTorch backend. - """ - - def __init__( - self, - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - non_weight_quantizers: dict[NonWeightQuantizerId, NonWeightQuantizerInfo], - build_time_info: QuantizationShareBuildTimeInfo, - ): - """ - Initializes a collector of the quantization statistics. - """ - self._weight_quantizers = {k: v.quantizer_module_ref for k, v in weight_quantizers.items()} - self._non_weight_quantizers = {k: v.quantizer_module_ref for k, v in non_weight_quantizers.items()} - self._info = build_time_info - - def _collect_quantizers_descriptions(self) -> list[QuantizerDescription]: - """ - Collects descriptions of the quantizers. - - :return: Descriptions of the quantizers. - """ - # `True` for weight quantizer, `False` otherwise. - quantizers = chain( - map(lambda x: (True, x), self._weight_quantizers.values()), - map(lambda x: (False, x), self._non_weight_quantizers.values()), - ) - - quantizers_descriptions = [] - for is_weight_quantizer, q in quantizers: - is_symmetric = isinstance(q, SymmetricQuantizer) - - quantizers_descriptions.append( - QuantizerDescription( - q.num_bits, q.per_channel, q.signed, is_symmetric, is_weight_quantizer, q.is_enabled_quantization() - ) - ) - - return quantizers_descriptions - - def _get_potential_quantizers_num(self) -> tuple[int, int]: - """ - Returns a potential number of quantizers for weights and activations. - - :return: A tuple (wq_potential_num, aq_potential_num) where - - `wq_potential_num` is a potential number of quantizers for weights. - - `aq_potential_num` is a potential number of quantizers for activations. - """ - aq_potential_num = self._info.aq_potential_num if is_debug() else None - return self._info.wq_potential_num, aq_potential_num - - -class MemoryConsumptionStatisticsCollector(StatisticsCollector): - """ - This metric considers: - - how many times memory consumption for network weights will decrease. - - how many times memory consumption* for activations tensor will decrease. - - * Reflects host memory consumption, assuming only the final low-precision output activation tensors are stored - in host memory (i.e. assuming intermediate accumulation results are only stored in device memory) - """ - - def __init__( - self, - compressed_model: NNCFNetwork, - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - non_weight_quantizers: dict[NonWeightQuantizerId, NonWeightQuantizerInfo], - ): - """ - Initializes collector of the memory consumption statistics. - """ - self._compressed_model = compressed_model - self._weight_quantizers = weight_quantizers - self._non_weight_quantizers = non_weight_quantizers - - def collect(self) -> MemoryConsumptionStatistics: - stats = MemoryConsumptionStatistics() - - fp_num_bits = 32 - nncf_modules = self._compressed_model.nncf.get_nncf_modules() - for nncf_module in nncf_modules: - count_el = np.prod(nncf_module.weight.shape) - stats.fp32_weight_size += count_el * fp_num_bits - quantizer = self._get_weight_quantizer_for_module(nncf_module) - if quantizer is not None: - num_bits = quantizer.num_bits - stats.quantized_weight_size += count_el * num_bits - else: - stats.quantized_weight_size += count_el * fp_num_bits - - try: - stats.weight_memory_consumption_decrease = stats.fp32_weight_size / stats.quantized_weight_size - except ZeroDivisionError: - stats.weight_memory_consumption_decrease = 0 - - stats.quantized_weight_size /= 2**23 - stats.fp32_weight_size /= 2**23 - - original_graph = deepcopy(self._compressed_model.nncf.get_original_graph()) - - memory_consumption_fp_model = {} - memory_consumption_compressed_model = {} - - original_nx_graph = original_graph._nx_graph - nx.set_edge_attributes(original_nx_graph, 32, "precision") - - for u, v in original_nx_graph.edges: - shape = original_nx_graph.edges[u, v][NNCFGraph.ACTIVATION_SHAPE_EDGE_ATTR] - num_bits = self._get_precision_for_activation_tensor(u, v, original_nx_graph) - original_nx_graph.edges[u, v]["precision"] = num_bits - u_node_name = original_nx_graph.nodes[u][NNCFNode.NODE_NAME_ATTR] - memory_consumption_fp_model[u_node_name] = np.prod(shape) * fp_num_bits - memory_consumption_compressed_model[u_node_name] = np.prod(shape) * num_bits - try: - stats.max_fp32_activation_size = max(memory_consumption_fp_model.values()) / 2**23 - stats.max_compressed_activation_size = max(memory_consumption_compressed_model.values()) / 2**23 - except ValueError: - stats.max_fp32_activation_size = 0 - stats.max_compressed_activation_size = 0 - return stats - - def _get_precision_for_activation_tensor(self, u_node: str, v_node: str, original_nx_graph: nx.DiGraph) -> int: - pred_u_nodes = original_nx_graph._pred[u_node] - precision_enter_activation_tensor = max( - [0] + [original_nx_graph.edges[pred_u_node, u_node]["precision"] for pred_u_node in pred_u_nodes] - ) - u_node_name = original_nx_graph.nodes[u_node][NNCFNode.NODE_NAME_ATTR] - module = self._compressed_model.nncf.get_containing_module(u_node_name) - if is_nncf_module(module): - quantizer = self._get_weight_quantizer_for_module(module) - if quantizer is not None: - precision = max(quantizer.num_bits, precision_enter_activation_tensor) - else: - precision = 32 - return precision - - for aq_id, aq in self._non_weight_quantizers.items(): - if u_node_name == aq_id.target_node_name: - precision = aq.quantizer_module_ref.num_bits - break - else: - precision = precision_enter_activation_tensor - return precision - - def _get_weight_quantizer_for_module(self, module: torch.nn.Module) -> Optional[BaseQuantizer]: - for wq_info in self._weight_quantizers.values(): - if wq_info.quantized_module is module: - return wq_info.quantizer_module_ref - return None - - -class ShareEdgesQuantizedDataPathStatisticsCollector(StatisticsCollector): - """ - This metric calculates the percentage of quantized edges relative to the total number of edges - in the original network graph. "Quantized edge" is an edge representing a quantized activation tensor. - """ - - QUANTIZED_EDGES_ATTR = "quantized" - PASSED_EDGES_ATTR = "passed" - NODES_GRAPH_ATTR = "nodes" - IS_MERGED_GRAPH_ATTR = "is_merged" - - def __init__( - self, - compressed_model: NNCFNetwork, - qctrl: "QuantizationController", # noqa: F821 - target_device: TargetDevice, - ): # noqa: E501, F821 - self._compressed_model = compressed_model - self._qctrl = qctrl - self.stats = QuantizationConfigurationStatistics(0, 0) - self._target_device = target_device - - def collect(self) -> QuantizationConfigurationStatistics: - merged_original_graph = self.get_merged_original_graph_with_patterns( - self._compressed_model.nncf.get_original_graph() - ) - self.stats.quantized_edges_in_cfg = 0 - nx.set_edge_attributes(merged_original_graph, False, self.QUANTIZED_EDGES_ATTR) - nx.set_edge_attributes(merged_original_graph, False, self.PASSED_EDGES_ATTR) - - input_nodes = [node for node in merged_original_graph.nodes if len(merged_original_graph._pred[node]) == 0] - queue = deque() - for input_node in input_nodes: - next_nodes = merged_original_graph._succ[input_node] - for next_node_key in next_nodes: - edge = merged_original_graph.edges[input_node, next_node_key] - edge[self.PASSED_EDGES_ATTR] = True - edge[self.QUANTIZED_EDGES_ATTR] = True - self.stats.quantized_edges_in_cfg += 1 - queue.appendleft(next_node_key) - visited_nodes = {} - - while len(queue) != 0: - node_key = queue.pop() - if node_key in visited_nodes: - continue - if self._all_enter_edges_in_node_of_type(merged_original_graph, node_key, self.PASSED_EDGES_ATTR): - visited_nodes[node_key] = True - node = merged_original_graph.nodes[node_key] - if node[self.IS_MERGED_GRAPH_ATTR]: - last_node = node[self.NODES_GRAPH_ATTR][-1] - node_name = str(last_node[NNCFNode.NODE_NAME_ATTR]) - matched = False - for aq_info in self._qctrl.non_weight_quantizers.values(): - for target_point in aq_info.affected_insertions: - if node_name == target_point.target_node_name: - matched = True - break - if matched: - self._marking_edges(merged_original_graph, node_key, queue) - else: - self._marking_edges(merged_original_graph, node_key, queue, False) - else: - node_name = str(node[NNCFNode.NODE_NAME_ATTR]) - - matched = False - for aq_key in self._compressed_model.nncf.external_quantizers: - if node_name in aq_key: - matched = True - break - if matched: - self._marking_edges(merged_original_graph, node_key, queue) - else: - is_op_non_change_precision_activation_tensor = True - node_metatype = node[NNCFNode.METATYPE_ATTR] - is_op_non_change_precision_activation_tensor = ( - node_metatype not in DEFAULT_PT_QUANT_TRAIT_TO_OP_DICT[QuantizationTrait.INPUTS_QUANTIZABLE] - ) - - status = is_op_non_change_precision_activation_tensor and self._all_enter_edges_in_node_of_type( - merged_original_graph, node_key, self.QUANTIZED_EDGES_ATTR - ) - self._marking_edges(merged_original_graph, node_key, queue, status) - else: - queue.appendleft(node_key) - self.num_merged_original_graph_edges = len(merged_original_graph.edges) - self.stats.total_edges_in_cfg = self.num_merged_original_graph_edges - return self.stats - - def _all_enter_edges_in_node_of_type(self, graph, node_key, type_edge): - prev_nodes = graph._pred[node_key] - retval = True - for prev_node_key in prev_nodes: - edge = graph.edges[prev_node_key, node_key] - if not edge[type_edge]: - retval = False - break - return retval - - def _marking_edges(self, graph, node_key, queue, mark=True): - next_nodes = graph._succ[node_key] - for next_node_key in next_nodes: - edge = graph.edges[node_key, next_node_key] - edge[self.QUANTIZED_EDGES_ATTR] = mark - edge[self.PASSED_EDGES_ATTR] = True - queue.appendleft(next_node_key) - if mark: - self.stats.quantized_edges_in_cfg += 1 - - def get_merged_original_graph_with_patterns(self, original_graph: PTNNCFGraph): - pattern = PatternsManager.get_full_hw_pattern_graph(backend=BackendType.TORCH, device=self._target_device) - - matches = find_subgraphs_matching_pattern(original_graph._nx_graph, pattern) - merged_graph = deepcopy(original_graph._nx_graph) - nx.set_node_attributes(merged_graph, False, self.IS_MERGED_GRAPH_ATTR) - for match in matches: - if len(match) == 1: - continue - - input_node_key = match[0] - output_node_key = match[-1] - in_edges = list(merged_graph.in_edges(input_node_key)) - out_edges = list(merged_graph.out_edges(output_node_key)) - - in_edge_copies_dict = {} - for in_edge_key in in_edges: - in_edge_copies_dict[in_edge_key] = deepcopy(merged_graph.edges[in_edge_key]) - out_edge_copies_dict = {} - for out_edge_key in out_edges: - out_edge_copies_dict[out_edge_key] = deepcopy(merged_graph.edges[out_edge_key]) - - merged_node_key = "" - merged_nodes = [] - for node_key in match: - merged_node_key += node_key + "\n" - - merged_nodes.append(original_graph._nx_graph.nodes[node_key]) - merged_graph.remove_node(node_key) - merged_node_attrs = { - NNCFNode.KEY_NODE_ATTR: merged_node_key, - self.NODES_GRAPH_ATTR: merged_nodes, - self.IS_MERGED_GRAPH_ATTR: True, - } - merged_graph.add_node(merged_node_key, **merged_node_attrs) - for in_edge_key, in_edge_attrs in in_edge_copies_dict.items(): - merged_graph.add_edge(in_edge_key[0], merged_node_key, **in_edge_attrs) - for out_edge_key, out_edge_attrs in out_edge_copies_dict.items(): - merged_graph.add_edge(merged_node_key, out_edge_key[1], **out_edge_attrs) - - return merged_graph - - @staticmethod - def visualize_marked_graph(merged_original_graph): - out_graph = nx.DiGraph() - for node_key in merged_original_graph.nodes: - out_graph.add_node(node_key) - for u, v in merged_original_graph.edges: - edge = merged_original_graph.edges[u, v] - if edge[ShareEdgesQuantizedDataPathStatisticsCollector.QUANTIZED_EDGES_ATTR]: - attrs = {"color": "blue"} - out_graph.add_edge(u, v, **attrs) - return out_graph diff --git a/src/nncf/torch/quantization/precision_init/__init__.py b/src/nncf/torch/quantization/precision_init/__init__.py deleted file mode 100644 index e5a42efc0ef..00000000000 --- a/src/nncf/torch/quantization/precision_init/__init__.py +++ /dev/null @@ -1,10 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/src/nncf/torch/quantization/precision_init/adjacent_quantizers.py b/src/nncf/torch/quantization/precision_init/adjacent_quantizers.py deleted file mode 100644 index 61163105855..00000000000 --- a/src/nncf/torch/quantization/precision_init/adjacent_quantizers.py +++ /dev/null @@ -1,107 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from typing import NamedTuple - -from nncf.common.graph import NNCFNodeName -from nncf.common.logging import nncf_logger -from nncf.common.quantization.quantizer_setup import QuantizationPointId -from nncf.common.quantization.quantizer_setup import QuantizerSetupBase -from nncf.common.quantization.structs import QuantizerId -from nncf.torch.quantization.layers import BaseQuantizer - - -class AdjacentQuantizers(NamedTuple): - """ - Combines activation and weight quantizers so that each quantizer is in the same group as the operation that it is - affecting. Each quantizer that does not affect any node (e.g. if it only affects other quantizers as a topmost - quantizer in a requantization scenario) will be placed in a separate group. - :param: activation_quantizers list of pairs of activation quantizers with their ids - :param: weight_quantizers list of pairs of weight quantizers with their ids - """ - - activation_quantizers: list[tuple[QuantizerId, BaseQuantizer]] - weight_quantizers: list[tuple[QuantizerId, BaseQuantizer]] - - -class GroupsOfAdjacentQuantizers: - """ - Contains groups of adjacent quantizers - :param: weight_qp_id_per_activation_qp_id gives a single activation quantizer for a given weight quantizer - that directly quantize a weightable module (e.g. conv or linear) - """ - - def __init__(self): - self.weight_qp_id_per_activation_qp_id: dict[QuantizationPointId, QuantizationPointId] = {} - self._quantizer_per_group_id = {} - self._groups_of_adjacent_quantizers: list[AdjacentQuantizers] = [] - - def get_group_id_for_quantizer(self, quantizer_id: QuantizerId): - return self._quantizer_per_group_id.get(quantizer_id, None) - - def get_adjacent_quantizers_by_group_id(self, group_id): - return ( - self._groups_of_adjacent_quantizers[group_id].weight_quantizers - + self._groups_of_adjacent_quantizers[group_id].activation_quantizers - ) - - def __iter__(self): - return iter(self._groups_of_adjacent_quantizers) - - def __bool__(self): - return bool(self._groups_of_adjacent_quantizers) and bool(self._quantizer_per_group_id) - - def __getitem__(self, group_id): - return self._groups_of_adjacent_quantizers[group_id] - - def parse_from_quantizer_setup( - self, - all_quantizations: dict[QuantizerId, BaseQuantizer], - quantizer_setup: QuantizerSetupBase, - quantization_point_id_vs_quantizer_id: dict[QuantizationPointId, QuantizerId], - ): - for group_idx, group in quantizer_setup.shared_input_operation_set_groups.items(): - act_quant_tuples: list[tuple[QuantizerId, BaseQuantizer]] = [] - wt_quant_tuples: list[tuple[QuantizerId, BaseQuantizer]] = [] - - quantized_node_per_activation_qp_id: dict[NNCFNodeName, QuantizationPointId] = {} - module_scope_per_weight_qp_id: dict[NNCFNodeName, QuantizationPointId] = {} - - for qp_id in group: - qp = quantizer_setup.quantization_points[qp_id] - quant_id = quantization_point_id_vs_quantizer_id[qp_id] - quantizer_module = all_quantizations[quant_id] - resulting_tuple = (quant_id, quantizer_module) - if qp.is_weight_quantization_point(): - wt_quant_tuples.append(resulting_tuple) - weight_quantized_module_node_name = qp.target_point.target_node_name - module_scope_per_weight_qp_id[weight_quantized_module_node_name] = qp_id - elif qp.is_activation_quantization_point(): - act_quant_tuples.append(resulting_tuple) - quantized_node_names = qp.directly_quantized_operator_node_names - quantized_node_per_activation_qp_id.update({node_name: qp_id for node_name in quantized_node_names}) - self._quantizer_per_group_id[quant_id] = group_idx - - for weight_quantized_module_node_name, w_qp_id in module_scope_per_weight_qp_id.items(): - if weight_quantized_module_node_name not in quantized_node_per_activation_qp_id: - nncf_logger.debug( - f"Module {weight_quantized_module_node_name} has quantized weights and no quantized inputs!" - ) - continue - a_qp_id = quantized_node_per_activation_qp_id[weight_quantized_module_node_name] - if w_qp_id in self.weight_qp_id_per_activation_qp_id: - nncf_logger.debug( - f"Multiple weight quantizers per activation quantizer for {weight_quantized_module_node_name}" - ) - continue - self.weight_qp_id_per_activation_qp_id[w_qp_id] = a_qp_id - - adj_quants = AdjacentQuantizers(act_quant_tuples, wt_quant_tuples) - self._groups_of_adjacent_quantizers.append(adj_quants) diff --git a/src/nncf/torch/quantization/precision_init/base_init.py b/src/nncf/torch/quantization/precision_init/base_init.py deleted file mode 100644 index 9674c58fcfd..00000000000 --- a/src/nncf/torch/quantization/precision_init/base_init.py +++ /dev/null @@ -1,140 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import OrderedDict -from copy import deepcopy -from typing import Union - -from nncf.common.graph import NNCFNodeName -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup -from nncf.common.quantization.structs import QuantizerId -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.torch.dynamic_graph.scope import Scope -from nncf.torch.graph.transformations.commands import ExtraCompressionModuleType -from nncf.torch.module_operations import UpdateWeight -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.precision_constraints import HardwareQuantizationConstraints -from nncf.torch.quantization.structs import WeightQuantizerInfo -from nncf.torch.structures import NNCFExtraConfigStruct -from nncf.torch.utils import get_all_modules_by_type - - -class BasePrecisionInitParams: # noqa: B903 - def __init__(self, user_init_args: NNCFExtraConfigStruct = None): - self.user_init_args = user_init_args - - -class BasePrecisionInitializer: - def __init__( - self, - algo: "ExperimentalQuantizationController", # noqa: F821 - params: BasePrecisionInitParams, - hw_precision_constraints: HardwareQuantizationConstraints = None, - ): - self._algo = algo - self._model: NNCFNetwork = self._algo._model - all_quantizers = algo.all_quantizations - self._hw_precision_constraints = hw_precision_constraints - self.original_precisions = {q_id: quantizer.num_bits for q_id, quantizer in all_quantizers.items()} - self._quantizers_handler = WeightQuantizersHandler( - self._model, self._algo.weight_quantizers, self._hw_precision_constraints - ) - quantization_types = [class_type.__name__ for class_type in QUANTIZATION_MODULES.registry_dict.values()] - self._weight_quantizations_by_execution_order = ( - self._quantizers_handler.get_weight_quantizers_in_execution_order_per_id() - ) - - self._all_quantizers_per_scope = get_all_modules_by_type( - self._model.nncf.get_compression_modules_by_type(ExtraCompressionModuleType.EXTERNAL_QUANTIZER), - quantization_types, - ) - self._all_quantizers_per_scope.update( - self._quantizers_handler.get_all_weight_quantizers_in_execution_order_per_scope() - ) - - def apply_init(self) -> SingleConfigQuantizerSetup: - raise NotImplementedError - - @staticmethod - def get_bitwidth_per_scope(quantizer_setup: SingleConfigQuantizerSetup) -> list[list[Union[int, str]]]: - scope_vs_bitwidth = {} - for qp in quantizer_setup.quantization_points.values(): - scope_vs_bitwidth[str(qp.insertion_point)] = qp.qconfig.num_bits - sorted_scope_vs_bitwidth = OrderedDict(sorted(scope_vs_bitwidth.items(), key=lambda x: x[0])) - full_bitwidth_per_scope = [] - for scope, bitwidth in sorted_scope_vs_bitwidth.items(): - full_bitwidth_per_scope.append([bitwidth, scope]) - return full_bitwidth_per_scope - - -class WeightQuantizersHandler: - """ - Defines weight quantizers for precision initialization in the order of execution. - """ - - def is_wq_scope(self, scope: Scope) -> bool: - return scope[-2].calling_module_class_name == UpdateWeight.__name__ - - @staticmethod - def get_owning_module_scope_from_wq_scope(wq_scope: Scope) -> Scope: - retval = deepcopy(wq_scope) - retval.pop() - retval.pop() - retval.pop() - return retval - - def __init__( - self, - model: NNCFNetwork, - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - constraints: HardwareQuantizationConstraints, - ): - self._wq_affected_module_node_name_vs_qid_dict = {k.target_node_name: k for k in weight_quantizers} - self._quantizer_module_scope_vs_qid_dict: dict[Scope, WeightQuantizerId] = {} - self._skipped_quantized_weight_node_names = [] - self._skipped_weight_quantizers: dict[WeightQuantizerId, BaseQuantizer] = {} - self._weight_quantizers_in_execution_order_per_scope: dict[Scope, BaseQuantizer] = OrderedDict() - self._weight_quantizers_in_execution_order: dict[WeightQuantizerId, BaseQuantizer] = OrderedDict() - - quantization_types = [class_type.__name__ for class_type in QUANTIZATION_MODULES.registry_dict.values()] - weight_module_dict = model - quantizers_in_execution_order_per_scope = get_all_modules_by_type(weight_module_dict, quantization_types) - - for scope, quantizer in quantizers_in_execution_order_per_scope.items(): - if self.is_wq_scope(scope): - affected_module_scope = self.get_owning_module_scope_from_wq_scope(scope) - affected_module_node = model.nncf.get_original_graph().get_op_nodes_in_scope(affected_module_scope)[0] - if affected_module_node.node_name in self._wq_affected_module_node_name_vs_qid_dict: - qid = self._wq_affected_module_node_name_vs_qid_dict[affected_module_node.node_name] - if len(constraints.get_all_unique_bitwidths(qid)) != 1: - self._weight_quantizers_in_execution_order_per_scope[scope] = quantizer - self._weight_quantizers_in_execution_order[qid] = quantizer - else: - self._skipped_quantized_weight_node_names.append(affected_module_node.node_name) - self._skipped_weight_quantizers[qid] = quantizer - - def get_skipped_quantized_weight_node_names(self) -> list[NNCFNodeName]: - return self._skipped_quantized_weight_node_names - - def get_all_weight_quantizers_in_execution_order_per_scope(self) -> dict[Scope, BaseQuantizer]: - return self._weight_quantizers_in_execution_order_per_scope - - def get_weight_quantizers_in_execution_order_per_id(self) -> dict[WeightQuantizerId, BaseQuantizer]: - return self._weight_quantizers_in_execution_order - - def get_quantizer_id_by_scope(self, scope: Scope) -> QuantizerId: - affected_module_scope = self.get_owning_module_scope_from_wq_scope(scope) - return self._wq_affected_module_node_name_vs_qid_dict[affected_module_scope] - - def get_skipped_weight_quantizers_per_id(self) -> dict[QuantizerId, BaseQuantizer]: - return self._skipped_weight_quantizers diff --git a/src/nncf/torch/quantization/precision_init/bitwidth_graph.py b/src/nncf/torch/quantization/precision_init/bitwidth_graph.py deleted file mode 100644 index a3b4fbe4daa..00000000000 --- a/src/nncf/torch/quantization/precision_init/bitwidth_graph.py +++ /dev/null @@ -1,175 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import defaultdict - -import networkx as nx - -from nncf.common.graph import NNCFGraph -from nncf.common.graph import NNCFNode -from nncf.common.logging import nncf_logger -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.torch.layers import NNCFConv2d -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.algo import QuantizationController -from nncf.torch.quantization.precision_init.adjacent_quantizers import GroupsOfAdjacentQuantizers -from nncf.torch.quantization.structs import NonWeightQuantizerInfo - - -class BitwidthGraph: - def __init__( - self, - algo_ctrl: QuantizationController, - model: NNCFNetwork, - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - add_flops=False, - ): - nncf_graph = model.nncf.get_graph() - self._nx_graph = nncf_graph.get_graph_for_structure_analysis() - if add_flops: - flops_per_module = model.nncf.get_flops_per_module() - - flops_vs_node_group: dict[int, tuple[int, set[NNCFNode]]] = defaultdict(set) - for idx, module_node_name_and_flops in enumerate(flops_per_module.items()): - module_node_name, flops = module_node_name_and_flops - node_set = set(nncf_graph.get_op_nodes_in_scope(nncf_graph.get_scope_by_node_name(module_node_name))) - flops_vs_node_group[idx] = (flops, node_set) - - grouped_mode = bool(groups_of_adjacent_quantizers) - for node_key, node in nncf_graph.nodes.items(): - color = "" - operator_name = node.node_type - module = model.nncf.get_containing_module(node.node_name) - if isinstance(module, NNCFConv2d): - color = "lightblue" - if module.groups == module.in_channels and module.in_channels > 1: - operator_name = "DW_Conv2d" - color = "purple" - kernel_size = "x".join(map(str, module.kernel_size)) - operator_name += f"_k{kernel_size}" - padding_values = set(module.padding) - padding_enabled = len(padding_values) >= 1 and padding_values.pop() - if padding_enabled: - operator_name += "_PAD" - if add_flops: - matches = [ - f_nodes_tpl for idx, f_nodes_tpl in flops_vs_node_group.items() if node in f_nodes_tpl[1] - ] - assert len(matches) == 1 - flops, affected_nodes = next(iter(matches)) - operator_name += f"_FLOPS:{str(flops)}" - if len(affected_nodes) > 1: - node_ids = sorted([n.node_id for n in affected_nodes]) - operator_name += "(shared among nodes {})".format( - ",".join([str(node_id) for node_id in node_ids]) - ) - operator_name += f"_#{node.node_id}" - target_node_to_draw = self._nx_graph.nodes[node_key] - target_node_to_draw["label"] = operator_name - target_node_to_draw["style"] = "filled" - if color: - target_node_to_draw["color"] = color - - non_weight_quantizers = algo_ctrl.non_weight_quantizers - bitwidth_color_map = {2: "purple", 4: "red", 8: "green", 6: "orange"} - for quantizer_id, quantizer_info in non_weight_quantizers.items(): - self._paint_activation_quantizer_node( - nncf_graph, quantizer_id, quantizer_info, bitwidth_color_map, groups_of_adjacent_quantizers - ) - for wq_id, wq_info in algo_ctrl.weight_quantizers.items(): - nodes = [nncf_graph.get_node_by_name(tp.target_node_name) for tp in wq_info.affected_insertions] - if not nodes: - msg = f"Failed to get affected nodes for quantized module node: {wq_id.target_node_name}" - raise AttributeError(msg) - preds = [nncf_graph.get_previous_nodes(node) for node in nodes] - wq_nodes = [] - for pred_list in preds: - for pred_node in pred_list: - if "UpdateWeight" in pred_node.node_name: - wq_nodes.append(pred_node) - assert len(wq_nodes) == 1 - - node = wq_nodes[0] - node_id = node.node_id - key = nncf_graph.get_node_key_by_id(node_id) - nx_node_to_draw_upon = self._nx_graph.nodes[key] - quantizer = wq_info.quantizer_module_ref - bitwidths = quantizer.num_bits - nx_node_to_draw_upon["label"] = f"WFQ_[{quantizer.get_quantizer_config()}]_#{str(node_id)}" - if grouped_mode: - group_id_str = "UNDEFINED" - group_id = groups_of_adjacent_quantizers.get_group_id_for_quantizer(wq_id) - if group_id is None: - nncf_logger.debug(f"No group for weight quantizer for: {wq_id}") - else: - group_id_str = str(group_id) - nx_node_to_draw_upon["label"] += "_G" + group_id_str - nx_node_to_draw_upon["color"] = bitwidth_color_map[bitwidths] - nx_node_to_draw_upon["style"] = "filled" - - def _paint_activation_quantizer_node( - self, - nncf_graph: NNCFGraph, - quantizer_id: NonWeightQuantizerId, - quantizer_info: NonWeightQuantizerInfo, - bitwidth_color_map: dict[int, str], - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - ): - affected_insertion_points_list = quantizer_info.affected_insertions - - for target_point in affected_insertion_points_list: - nncf_node_name = target_point.target_node_name - nncf_node = nncf_graph.get_node_by_name(nncf_node_name) - node_id = nncf_node.node_id - - input_port_id = target_point.input_port_id - - if input_port_id is None: - # Post-hooking used for activation quantization - # Currently only a single post-hook can immediately follow an operation - succs = list(nncf_graph.get_next_nodes(nncf_node)) - assert len(succs) == 1 - target_nncf_node_key = nncf_graph.get_node_key_by_id(succs[0].node_id) - else: - # Pre-hooking used for activation quantization - previous_nodes = nncf_graph.get_previous_nodes(nncf_node) - target_node = None - for prev_node in previous_nodes: - prev_edge = nncf_graph.get_nx_edge(prev_node, nncf_node) - if prev_edge[NNCFGraph.INPUT_PORT_ID_EDGE_ATTR] == input_port_id: - target_node = prev_node - break - - assert target_node is not None, "Could not find a pre-hook quantizer node for a specific input port!" - target_nncf_node_id = target_node.node_id - target_nncf_node_key = nncf_graph.get_node_key_by_id(target_nncf_node_id) - - activation_fq_node = self._nx_graph.nodes[target_nncf_node_key] - bitwidth = quantizer_info.quantizer_module_ref.num_bits - activation_fq_node["color"] = bitwidth_color_map[bitwidth] - activation_fq_node["style"] = "filled" - node_id = activation_fq_node[NNCFNode.ID_NODE_ATTR] - - activation_fq_node["label"] = ( - f"AFQ_[{quantizer_info.quantizer_module_ref.get_quantizer_config()}]_#{str(node_id)}" - ) - grouped_mode = bool(groups_of_adjacent_quantizers) - if grouped_mode: - group_id_str = "UNDEFINED" - group_id = groups_of_adjacent_quantizers.get_group_id_for_quantizer(quantizer_id) - if node_id is None: - nncf_logger.debug(f"No group for activation quantizer: {target_nncf_node_key}") - else: - group_id_str = str(group_id) - activation_fq_node["label"] += "_G" + group_id_str - - def get(self) -> nx.DiGraph: - return self._nx_graph diff --git a/src/nncf/torch/quantization/precision_init/compression_ratio.py b/src/nncf/torch/quantization/precision_init/compression_ratio.py deleted file mode 100644 index c428b3725db..00000000000 --- a/src/nncf/torch/quantization/precision_init/compression_ratio.py +++ /dev/null @@ -1,57 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from nncf.common.graph import NNCFNodeName -from nncf.common.quantization.quantizer_setup import QuantizationPointId -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup - - -class CompressionRatioCalculator: - """ - Calculates compression ratio - ratio between bits complexity of fully INT8 model and mixed-precision lower-bit one. - Bit complexity of the model is a sum of bit complexities for each quantized layer, which are a multiplication of - FLOPS for the layer by number of bits for its quantization. The compression ratio can be used for estimation of - performance boost for quantized model. - """ - - DEFAULT_NUMBER_OF_BITS = 8 - - def __init__( - self, - flops_per_weighted_module_node: dict[NNCFNodeName, int], - quantizer_setup: SingleConfigQuantizerSetup, - weight_qp_id_per_activation_qp_id: dict[QuantizationPointId, QuantizationPointId], - ): - self._weight_qp_id_per_activation_qp_id = weight_qp_id_per_activation_qp_id - self._flops_per_weight_qp_id: dict[QuantizationPointId, float] = {} - for qp_id, qp in quantizer_setup.quantization_points.items(): - if qp.is_weight_quantization_point(): - target_node_name = qp.insertion_point.target_node_name - self._flops_per_weight_qp_id[qp_id] = flops_per_weighted_module_node[target_node_name] - self.maximum_bits_complexity = sum(self._flops_per_weight_qp_id.values()) * self.DEFAULT_NUMBER_OF_BITS - - def run_for_quantizer_setup(self, quantizer_setup: SingleConfigQuantizerSetup) -> float: - """ - Calculates compression ratio for a given quantizer setup with - :param: quantizer_setup: setup with information quantization points - :returns: compression ratio of mixed-precision model by relation to fully INT8 - """ - quantization_points = quantizer_setup.quantization_points - weight_qps = list(filter(lambda pair: pair[1].is_weight_quantization_point(), quantization_points.items())) - bits_complexity = 0 - for w_qp_id, w_qp in weight_qps: - wq_num_bits = w_qp.qconfig.num_bits - a_qp_id = self._weight_qp_id_per_activation_qp_id[w_qp_id] - a_qp = quantization_points[a_qp_id] - aq_num_bits = a_qp.qconfig.num_bits - num_bits = max(wq_num_bits, aq_num_bits) - bits_complexity += num_bits * self._flops_per_weight_qp_id[w_qp_id] - return self.maximum_bits_complexity / bits_complexity diff --git a/src/nncf/torch/quantization/precision_init/definitions.py b/src/nncf/torch/quantization/precision_init/definitions.py deleted file mode 100644 index 66e9837f97c..00000000000 --- a/src/nncf/torch/quantization/precision_init/definitions.py +++ /dev/null @@ -1,15 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from nncf.common.quantization.structs import QuantizerConfig - -QConfigSequenceForHAWQToEvaluate = list[QuantizerConfig] -CoveringQConfigSequenceForQuantNoiseCalculation = list[QuantizerConfig] diff --git a/src/nncf/torch/quantization/precision_init/hawq_debug.py b/src/nncf/torch/quantization/precision_init/hawq_debug.py deleted file mode 100644 index bd88fde56a4..00000000000 --- a/src/nncf/torch/quantization/precision_init/hawq_debug.py +++ /dev/null @@ -1,224 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from collections import OrderedDict -from pathlib import Path - -import torch -from torch import Tensor - -from nncf.common.logging import nncf_logger -from nncf.common.utils.decorators import skip_if_dependency_unavailable -from nncf.common.utils.dot_file_rw import write_dot_graph -from nncf.torch.graph.transformations.commands import ExtraCompressionModuleType -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.adjust_padding import add_adjust_padding_nodes -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.precision_init.adjacent_quantizers import GroupsOfAdjacentQuantizers -from nncf.torch.quantization.precision_init.definitions import QConfigSequenceForHAWQToEvaluate -from nncf.torch.quantization.precision_init.perturbations import PerturbationObserver -from nncf.torch.quantization.precision_init.perturbations import Perturbations -from nncf.torch.quantization.precision_init.traces_order import TracesPerLayer -from nncf.torch.utils import get_all_modules_by_type - - -class HAWQDebugger: - def __init__( - self, - weight_qconfig_sequences_in_trace_order: list["QConfigSequenceForHAWQToEvaluate"], - perturbations: Perturbations, - weight_observers_for_each_covering_configuration: list[list[PerturbationObserver]], - traces_per_layer: TracesPerLayer, - bitwidths: list[int], - ): - self._weight_qconfig_sequences_in_trace_order = weight_qconfig_sequences_in_trace_order - self._num_weights = len(traces_per_layer.traces_order) - self._perturbations = perturbations - - from nncf.common.utils.debug import DEBUG_LOG_DIR - - self._dump_dir = Path(DEBUG_LOG_DIR) / Path("hawq_dumps") - self._dump_dir.mkdir(parents=True, exist_ok=True) - - self._traces_order = traces_per_layer.traces_order - self._traces_per_layer = traces_per_layer.get_all() - - num_of_weights = [] - norm_of_weights = [] - for i in range(self._num_weights): - trace_index = self._traces_order.get_execution_index_by_traces_index(i) - num_of_weights.append(weight_observers_for_each_covering_configuration[0][trace_index].get_numels()) - norm_of_weights.append(weight_observers_for_each_covering_configuration[0][trace_index].get_input_norm()) - self._num_weights_per_layer = torch.Tensor(num_of_weights) - self._norm_weights_per_layer = torch.Tensor(norm_of_weights) - - bits_in_megabyte = 2**23 - self._model_sizes = [] - for qconfig_sequence in self._weight_qconfig_sequences_in_trace_order: - size = ( - torch.sum( - torch.Tensor([qconfig.num_bits for qconfig in qconfig_sequence]) * self._num_weights_per_layer - ).item() - / bits_in_megabyte - ) - self._model_sizes.append(size) - self._bitwidths = bitwidths - - @staticmethod - def get_all_quantizers_per_full_scope(model): - all_quantizations = OrderedDict() - for class_type in QUANTIZATION_MODULES.registry_dict.values(): - quantization_type = class_type.__name__ - all_quantizations.update( - get_all_modules_by_type( - model.nncf.get_compression_modules_by_type(ExtraCompressionModuleType.EXTERNAL_QUANTIZER), - quantization_type, - ) - ) - all_quantizations.update(get_all_modules_by_type(model, quantization_type)) - all_quantizations = OrderedDict(sorted(all_quantizations.items(), key=lambda x: str(x[0]))) - return all_quantizations - - @skip_if_dependency_unavailable(dependencies=["matplotlib.pyplot"]) - def dump_avg_traces(self): - import matplotlib.pyplot as plt - - dump_file = os.path.join(self._dump_dir, "avg_traces_per_layer") - torch.save(self._traces_per_layer, dump_file) - fig = plt.figure() - fig.suptitle("Average Hessian Trace") - ax = fig.add_subplot(2, 1, 1) - ax.set_yscale("log") - ax.set_xlabel("weight quantizers") - ax.set_ylabel("average hessian trace") - ax.plot(self._traces_per_layer.cpu().numpy()) - plt.savefig(dump_file) - - @skip_if_dependency_unavailable(dependencies=["matplotlib.pyplot"]) - def dump_metric_MB(self, metric_per_qconfig_sequence: list[Tensor]): - import matplotlib.pyplot as plt - - list_to_plot = [cm.item() for cm in metric_per_qconfig_sequence] - fig = plt.figure() - fig.suptitle("Pareto Frontier") - ax = fig.add_subplot(2, 1, 1) - ax.set_yscale("log") - ax.set_xlabel("Model Size (MB)") - ax.set_ylabel("Metric value (total perturbation)") - ax.scatter(self._model_sizes, list_to_plot, s=20, facecolors="none", edgecolors="r") - cm = torch.Tensor(metric_per_qconfig_sequence) - cm_m = cm.median().item() - qconfig_index = metric_per_qconfig_sequence.index(cm_m) - ms_m = self._model_sizes[qconfig_index] - ax.scatter(ms_m, cm_m, s=30, facecolors="none", edgecolors="b", label="median from all metrics") - ax.legend() - plt.savefig(os.path.join(self._dump_dir, "Pareto_Frontier")) - nncf_logger.debug( - f"Distribution of HAWQ metrics: " - f"min_value={cm.min().item():.3f}, " - f"max_value={cm.max().item():.3f}, " - f"median_value={cm_m:.3f}, " - f"median_index={qconfig_index}, " - f"total_number={len(metric_per_qconfig_sequence)}" - ) - - @skip_if_dependency_unavailable(dependencies=["matplotlib.pyplot"]) - def dump_metric_flops( - self, metric_per_qconfig_sequence: list[Tensor], flops_per_config: list[float], chosen_qconfig_index: int - ): - import matplotlib.pyplot as plt - - list_to_plot = [cm.item() for cm in metric_per_qconfig_sequence] - fig = plt.figure() - fig.suptitle("Pareto Frontier") - ax = fig.add_subplot(1, 1, 1) - ax.set_xlabel("Compression ratio: total INT8 Bits Complexity / total MIXED INT Bits Complexity") - ax.set_ylabel("Metric value (total perturbation)") - ax.scatter(flops_per_config, list_to_plot, s=10, alpha=0.3) # s=20, facecolors='none', edgecolors='r') - flops_per_config = [torch.Tensor([v]) for v in flops_per_config] - cm = torch.Tensor(flops_per_config) - cm_m = cm.median().item() - configuration_index = flops_per_config.index(cm_m) - ms_m = metric_per_qconfig_sequence[configuration_index].item() - ax.scatter(cm_m, ms_m, s=30, facecolors="none", edgecolors="b", label="median from all metrics") - cm_c = metric_per_qconfig_sequence[chosen_qconfig_index].item() - fpc_c = flops_per_config[chosen_qconfig_index].item() - ax.scatter(fpc_c, cm_c, s=30, facecolors="none", edgecolors="r", label="chosen config") - - ax.legend() - plt.savefig(os.path.join(self._dump_dir, "Pareto_Frontier_compress_ratio")) - - @skip_if_dependency_unavailable(dependencies=["matplotlib.pyplot"]) - def dump_density_of_quantization_noise(self): - noise_per_config: list[Tensor] = [] - for qconfig_sequence in self._weight_qconfig_sequences_in_trace_order: - qnoise = 0 - for i in range(self._num_weights): - execution_index = self._traces_order.get_execution_index_by_traces_index(i) - qnoise += self._perturbations.get(layer_id=execution_index, qconfig=qconfig_sequence[i]) - noise_per_config.append(qnoise) - - list_to_plot = [cm.item() for cm in noise_per_config] - import matplotlib.pyplot as plt - - fig = plt.figure() - fig.suptitle("Density of quantization noise") - ax = fig.add_subplot(2, 1, 1) - ax.set_yscale("log") - ax.set_xlabel("Blocks") - ax.set_ylabel("Noise value") - ax.scatter(self._model_sizes, list_to_plot, s=20, alpha=0.3) - ax.legend() - plt.savefig(os.path.join(self._dump_dir, "Density_of_quantization_noise")) - - @skip_if_dependency_unavailable(dependencies=["matplotlib.pyplot"]) - def dump_perturbations_ratio(self): - import matplotlib.pyplot as plt - - fig = plt.figure() - fig.suptitle("Quantization noise vs Average Trace") - ax = fig.add_subplot(2, 1, 1) - ax.set_xlabel("Blocks") - ax.set_yscale("log") - perturbations_per_layer_id = list(self._perturbations.get_all().values()) - perturb = [] - max_bitwidths = [] - for perturbations_for_all_observed_qconfig_sequence_in_current_layer in perturbations_per_layer_id: - qconfig_sequence = perturbations_for_all_observed_qconfig_sequence_in_current_layer.keys() - max_bitwidth_qconfig = max(qconfig_sequence, key=lambda x: x.num_bits) - perturb.append(perturbations_for_all_observed_qconfig_sequence_in_current_layer[max_bitwidth_qconfig]) - max_bitwidths.append(max_bitwidth_qconfig.num_bits) - ax.plot( - [ - (p / m / n).cpu().numpy() - for p, m, n in zip(perturb, self._num_weights_per_layer, self._norm_weights_per_layer) - ], - label="normalized n-bit noise", - ) - ax.plot([x.cpu().numpy() for x in perturb], label="n-bit noise") - ax.plot(max_bitwidths, label="n") - ax.plot(self._traces_per_layer.cpu().numpy(), label="trace") - ax.plot([(n * p).cpu().numpy() for n, p in zip(self._traces_per_layer, perturb)], label="trace * noise") - ax.legend() - plt.savefig(os.path.join(self._dump_dir, "Quantization_noise_vs_Average_Trace")) - - def dump_bitwidth_graph( - self, - algo_ctrl: "QuantizationController", # noqa: F821 - model: NNCFNetwork, - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - ): - from nncf.torch.quantization.precision_init.bitwidth_graph import BitwidthGraph - - bw_graph = BitwidthGraph(algo_ctrl, model, groups_of_adjacent_quantizers).get() - nx_graph = add_adjust_padding_nodes(bw_graph, model) - write_dot_graph(nx_graph, self._dump_dir / Path("bitwidth_graph.dot")) diff --git a/src/nncf/torch/quantization/precision_init/hawq_init.py b/src/nncf/torch/quantization/precision_init/hawq_init.py deleted file mode 100644 index 0800609363e..00000000000 --- a/src/nncf/torch/quantization/precision_init/hawq_init.py +++ /dev/null @@ -1,823 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import itertools -import json -from bisect import bisect_left -from collections import OrderedDict -from copy import deepcopy -from enum import Enum -from operator import itemgetter -from pathlib import Path -from typing import Any, Callable, NamedTuple - -import torch -from torch import Tensor -from torch import nn -from torch.nn.modules.loss import _Loss - -import nncf -from nncf.common.graph import NNCFNodeName -from nncf.common.logging import nncf_logger -from nncf.common.quantization.quantizer_setup import QuantizationPointId -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup -from nncf.common.quantization.structs import QuantizerConfig -from nncf.common.quantization.structs import QuantizerId -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.common.utils.debug import is_debug -from nncf.common.utils.os import safe_open -from nncf.config.schemata.defaults import HAWQ_COMPRESSION_RATIO -from nncf.config.schemata.defaults import HAWQ_DUMP_INIT_PRECISION_DATA -from nncf.config.schemata.defaults import HAWQ_ITER_NUMBER -from nncf.config.schemata.defaults import HAWQ_NUM_DATA_POINTS -from nncf.config.schemata.defaults import HAWQ_TOLERANCE -from nncf.config.schemata.defaults import PRECISION_INIT_BITWIDTHS -from nncf.torch.quantization.hessian_trace import HessianTraceEstimator -from nncf.torch.quantization.layers import QuantizersSwitcher -from nncf.torch.quantization.precision_constraints import HardwareQuantizationConstraints -from nncf.torch.quantization.precision_init.adjacent_quantizers import GroupsOfAdjacentQuantizers -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitializer -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitParams -from nncf.torch.quantization.precision_init.compression_ratio import CompressionRatioCalculator -from nncf.torch.quantization.precision_init.definitions import CoveringQConfigSequenceForQuantNoiseCalculation -from nncf.torch.quantization.precision_init.definitions import QConfigSequenceForHAWQToEvaluate -from nncf.torch.quantization.precision_init.hawq_debug import HAWQDebugger -from nncf.torch.quantization.precision_init.perturbations import PerturbationObserver -from nncf.torch.quantization.precision_init.perturbations import Perturbations -from nncf.torch.quantization.precision_init.traces_order import TracesOrder -from nncf.torch.quantization.precision_init.traces_order import TracesPerLayer -from nncf.torch.quantization.structs import WeightQuantizerInfo -from nncf.torch.structures import QuantizationPrecisionInitArgs -from nncf.torch.utils import get_model_device - - -class BitwidthAssignmentMode(Enum): - STRICT = "strict" - LIBERAL = "liberal" - - -class HAWQPrecisionInitParams(BasePrecisionInitParams): - def __init__( - self, - user_init_args: QuantizationPrecisionInitArgs, - bitwidths: list[int] = None, - bitwidth_per_scope: list[list] = None, - traces_per_layer_path: str = None, - num_data_points: int = None, - iter_number: int = None, - tolerance: float = None, - compression_ratio: float = None, - dump_hawq_data: bool = None, - bitwidth_assignment_mode: BitwidthAssignmentMode = None, - ): - super().__init__(user_init_args) - self.bitwidths = bitwidths - self.bitwidth_per_scope = bitwidth_per_scope - self.traces_per_layer_path = traces_per_layer_path - self.num_data_points = num_data_points - self.iter_number = iter_number - self.tolerance = tolerance - self.compression_ratio = compression_ratio - self.dump_hawq_data = dump_hawq_data - self.bitwidth_assignment_mode = bitwidth_assignment_mode - - @classmethod - def from_config( - cls, hawq_init_config_dict: dict, user_init_args: QuantizationPrecisionInitArgs - ) -> "HAWQPrecisionInitParams": - return cls( - user_init_args=user_init_args, - bitwidths=hawq_init_config_dict.get("bits", PRECISION_INIT_BITWIDTHS), - traces_per_layer_path=hawq_init_config_dict.get("traces_per_layer_path"), - num_data_points=hawq_init_config_dict.get("num_data_points", HAWQ_NUM_DATA_POINTS), - iter_number=hawq_init_config_dict.get("iter_number", HAWQ_ITER_NUMBER), - tolerance=hawq_init_config_dict.get("tolerance", HAWQ_TOLERANCE), - compression_ratio=hawq_init_config_dict.get("compression_ratio", HAWQ_COMPRESSION_RATIO), - dump_hawq_data=hawq_init_config_dict.get("dump_init_precision_data", HAWQ_DUMP_INIT_PRECISION_DATA), - bitwidth_assignment_mode=BitwidthAssignmentMode( - hawq_init_config_dict.get("bitwidth_assignment_mode", BitwidthAssignmentMode.LIBERAL.value) - ), - ) - - -class TraceOrderBitwidthMatcher: - def __init__(self, available_bitwidths: list[int], traces_order: TracesOrder): - self._available_bitwidths = available_bitwidths - self._traces_order = traces_order - self._bitwidth_sequences = self.get_all_non_decreasing_bitwidth_sequences() - - def get_all_non_decreasing_bitwidth_sequences(self) -> list[list[int]]: - sequences = [] - bitwidths_ = deepcopy(self._available_bitwidths) - seq_len = len(self._traces_order) - if seq_len == 0: - return sequences - bitwidths = sorted(bitwidths_) - m = len(bitwidths) - L = seq_len - for j in range(1, m + 1): - for combo_bitwidths in itertools.combinations(bitwidths, j): - for combo_partitions in itertools.combinations(list(range(1, L)), j - 1): - bit_config = [] - prev_p = 0 - for p, b in zip(combo_partitions + (L,), combo_bitwidths): - bit_config += [b] * (p - prev_p) - prev_p = p - sequences.append(bit_config) - return sequences - - @staticmethod - def _select_first_closest_bitwidth_qconfig( - qconf_list: list[QuantizerConfig], target_bitwidth: int - ) -> QuantizerConfig: - bw_diffs = [abs(qc.num_bits - target_bitwidth) for qc in qconf_list] - _, min_idx = min((val, idx) for (idx, val) in enumerate(bw_diffs)) - return qconf_list[min_idx] - - def _deduplicate( - self, qconf_sequences_to_search: list[QConfigSequenceForHAWQToEvaluate] - ) -> list[QConfigSequenceForHAWQToEvaluate]: - tupled_sequence = [tuple(seq) for seq in qconf_sequences_to_search] - odict = OrderedDict.fromkeys(tupled_sequence) - deduped_tupled_sequence = list(odict.keys()) - return [list(tup) for tup in deduped_tupled_sequence] - - @staticmethod - def _generate_covering_qconfig_sequences(observed_qconfs: list[dict[QuantizerConfig, QuantizerConfig]]): - covering_qconfig_sequences: list[CoveringQConfigSequenceForQuantNoiseCalculation] = [] - # For each index, put the largest qconf subset that only varies in bitwidth on top - # so that the associated covering configurations would not require model regeneration - optimized_observed_qconfs: list[list[QuantizerConfig]] = [] - for qconf_observed_set in observed_qconfs: - variants: list[list[QuantizerConfig]] = [] - for qconf in qconf_observed_set: - variants.append(list(filter(qconf.is_a_bitwidth_variant, qconf_observed_set.keys()))) - max_bw_varying_variant = max(variants, key=len) - other_qconfs = list(filter(lambda x: x not in max_bw_varying_variant, qconf_observed_set.keys())) - optimized_observed_qconfs.append(max_bw_varying_variant + other_qconfs) - - max_depth = max([len(qconfs_for_trace_idx) for qconfs_for_trace_idx in optimized_observed_qconfs]) - for i in range(max_depth): - covering_conf: CoveringQConfigSequenceForQuantNoiseCalculation = [] - for qconfs_for_trace_idx in optimized_observed_qconfs: - if i < len(qconfs_for_trace_idx): - covering_conf.append(qconfs_for_trace_idx[i]) - else: - covering_conf.append(qconfs_for_trace_idx[-1]) - covering_qconfig_sequences.append(covering_conf) - return covering_qconfig_sequences - - def get_qconfig_sequences_constrained_by_trace_order( - self, - possible_qconfigs_sequence_in_trace_order: list[list[QuantizerConfig]], - indices_for_bitwidth_adjustment_only: set[int], - ) -> tuple[list[QConfigSequenceForHAWQToEvaluate], list[CoveringQConfigSequenceForQuantNoiseCalculation]]: - """ - The 'constraint' is so that the each qconfig sequence should have non-decreasing bitwidths. It - might be impossible to apply this constraint for a given qconfig space (consider [[2], [6, 8], [4]]). - In such a case, for trace order index positions where it was impossible to select a bitwidth so that the entire - sequence is non-decreasing, the bitwidth closest to this target will be chosen instead. - """ - if len(possible_qconfigs_sequence_in_trace_order) != len(self._traces_order): - msg = "The size of the qconfig space and the traces do not match!" - raise ValueError(msg) - retval: list[QConfigSequenceForHAWQToEvaluate] = [] - observed_qconfs_in_retval = [OrderedDict() for _ in range(len(self._traces_order))] - for bitwidth_sequence in self._bitwidth_sequences: - current_qconfig_sequence_in_trace_order: QConfigSequenceForHAWQToEvaluate = [] - for trace_idx, bitwidth in enumerate(bitwidth_sequence): - if trace_idx in indices_for_bitwidth_adjustment_only: - bitwidth_adjusted_default_qconfig = deepcopy( - possible_qconfigs_sequence_in_trace_order[trace_idx][0] - ) - bitwidth_adjusted_default_qconfig.num_bits = bitwidth - qconfig = bitwidth_adjusted_default_qconfig - else: - # TODO: do a selection based on strategy ("exhaustive" = add all available configurations, - # "preset" = do a selection based on a certain preset, "first" = select first match (as below), - # "custom" = use a custom selection function to be passed as arg to the HAWQ initializer - # OR: do non-bitwidth disambiguation higher up the stack, make sure that the qconfig - # space at this spot only has 1 qconfig option for each bitwidth. - possible_qconfigs_for_current_trace_idx = possible_qconfigs_sequence_in_trace_order[trace_idx] - first_closest_qconfig = self._select_first_closest_bitwidth_qconfig( - possible_qconfigs_for_current_trace_idx, bitwidth - ) - qconfig = deepcopy(first_closest_qconfig) - - current_qconfig_sequence_in_trace_order.append(qconfig) - observed_qconfs_in_retval[trace_idx][qconfig] = qconfig - retval.append(current_qconfig_sequence_in_trace_order) - return self._deduplicate(retval), self._generate_covering_qconfig_sequences(observed_qconfs_in_retval) - - -class HAWQPrecisionInitializer(BasePrecisionInitializer): - def __init__( - self, - algo: "ExperimentalQuantizationController", # noqa: F821 - params: HAWQPrecisionInitParams, - hw_precision_constraints: HardwareQuantizationConstraints, - ): - self._groups_of_adjacent_quantizers = algo.groups_of_adjacent_quantizers - self._bitwidth_assignment_mode = params.bitwidth_assignment_mode - if self._bitwidth_assignment_mode == BitwidthAssignmentMode.STRICT: - hw_precision_constraints = self._merge_constraints_for_adjacent_quantizers( - self._groups_of_adjacent_quantizers, hw_precision_constraints - ) - super().__init__(algo, params, hw_precision_constraints) - init_args = params.user_init_args - self._criterion_fn = init_args.criterion_fn - self._criterion = init_args.criterion - self._data_loader = init_args.data_loader - self._traces_per_layer_path = params.traces_per_layer_path - self._num_data_points = params.num_data_points - self._iter_number = params.iter_number - self._tolerance = params.tolerance - self._compression_ratio = params.compression_ratio - self._bitwidths = ( - self._hw_precision_constraints.get_all_unique_bitwidths() - if self._hw_precision_constraints - else params.bitwidths - ) - self._init_device = init_args.device - if self._init_device is None: - self._init_device = get_model_device(self._model) - current_quantizer_setup = self._algo.get_quantizer_setup_for_current_state() - flops_per_module = self._model.nncf.get_flops_per_module() - self._compression_ratio_calculator = CompressionRatioCalculator( - flops_per_module, - current_quantizer_setup, - self._groups_of_adjacent_quantizers.weight_qp_id_per_activation_qp_id, - ) - self._dump_hawq_data = params.dump_hawq_data - self._original_qp_id_vs_quantizer_module_id_dict = deepcopy(algo.setup_to_module_id_translation_dict) - - def apply_init(self) -> SingleConfigQuantizerSetup: - if not self._weight_quantizations_by_execution_order: - return self._algo.get_quantizer_setup_for_current_state() - - original_device = get_model_device(self._model) - self._model.to(self._init_device) - - traces_per_layer = self._calc_traces(self._criterion_fn, self._criterion, self._iter_number, self._tolerance) - if not traces_per_layer: - msg = "Failed to calculate hessian traces!" - raise nncf.InternalError(msg) - - traces_order = traces_per_layer.traces_order - ( - weight_qconfig_sequences_in_trace_order, - covering_qconfig_sequences, - ) = self.get_qconfig_sequences_constrained_by_traces_order(traces_order) - - weight_quantizer_ids_in_execution_order = list(self._weight_quantizations_by_execution_order.keys()) - - if not weight_qconfig_sequences_in_trace_order: - nncf_logger.error("All bitwidths configurations are incompatible with HW Config!") - return None - - weight_qconfig_sequences_in_trace_order = self._filter_qconfig_sequences_by_excessive_bitwidth( - weight_qconfig_sequences_in_trace_order - ) - - if self._bitwidth_assignment_mode == BitwidthAssignmentMode.STRICT: - weight_qconfig_sequences_in_trace_order = self._filter_qconfig_sequences_by_grouped_weight_quantizers( - weight_qconfig_sequences_in_trace_order, - weight_quantizer_ids_in_execution_order, - self._groups_of_adjacent_quantizers, - traces_order, - ) - if not weight_qconfig_sequences_in_trace_order: - nncf_logger.error( - "No bitwidths configurations are left after removing inconsistent groups of " - "weight quantizers with adjacent activation quantizers!" - ) - return self._algo.get_quantizer_setup_for_current_state() - - compression_ratio_per_qconfig = self.get_compression_ratio_per_qconfig_sequence( - weight_qconfig_sequences_in_trace_order, traces_order - ) - min_ratio = min(compression_ratio_per_qconfig) - max_ratio = max(compression_ratio_per_qconfig) - if not min_ratio <= self._compression_ratio <= max_ratio: - msg = ( - f"Invalid compression ratio={self._compression_ratio}." - f" Should be within range [{min_ratio:.3f}, {max_ratio:.3f}]" - ) - raise AttributeError(msg) - - perturbations, weight_observers = self.calc_quantization_noise(covering_qconfig_sequences, traces_order) - - metric_per_qconfig_sequence = self.calc_hawq_metric_per_qconfig_sequence( - weight_qconfig_sequences_in_trace_order, perturbations, traces_per_layer, self._init_device - ) - - qconfig_sequence_index = self.choose_qconfig_sequence( - metric_per_qconfig_sequence, compression_ratio_per_qconfig, self._compression_ratio - ) - chosen_qconfig_sequence_in_traces_order = weight_qconfig_sequences_in_trace_order[qconfig_sequence_index] - chosen_qconfig_sequence_in_execution_order = traces_order.get_execution_order_configs( - chosen_qconfig_sequence_in_traces_order - ) - bitwidth_sequence = [qconfig.num_bits for qconfig in chosen_qconfig_sequence_in_execution_order] - nncf_logger.info( - f"Chosen HAWQ bitwidth sequence with ratio={compression_ratio_per_qconfig[qconfig_sequence_index]:.2f}, " - f"bitwidth per weightable layer={bitwidth_sequence}" - ) - nncf_logger.debug( - f"Order of the weightable layers in the HAWQ bitwidth sequence " - f"(in descending order of average Hessian traces) = {traces_order}" - ) - - final_quantizer_setup = self.get_quantizer_setup_for_qconfig_sequence( - chosen_qconfig_sequence_in_traces_order, traces_order - ) - if is_debug() or self._dump_hawq_data: - hawq_debugger = HAWQDebugger( - weight_qconfig_sequences_in_trace_order, - perturbations, - weight_observers, - traces_per_layer, - self._bitwidths, - ) - hawq_debugger.dump_metric_MB(metric_per_qconfig_sequence) - hawq_debugger.dump_metric_flops( - metric_per_qconfig_sequence, compression_ratio_per_qconfig, qconfig_sequence_index - ) - hawq_debugger.dump_avg_traces() - hawq_debugger.dump_density_of_quantization_noise() - hawq_debugger.dump_perturbations_ratio() - new_ctrl, new_model = self._algo.apply_new_quantizer_setup(final_quantizer_setup) - groups_of_adjacent_quantizers = new_ctrl.groups_of_adjacent_quantizers - hawq_debugger.dump_bitwidth_graph(new_ctrl, new_model, groups_of_adjacent_quantizers) - bitwidth_per_scope = self.get_bitwidth_per_scope(final_quantizer_setup) - from nncf.common.utils.debug import DEBUG_LOG_DIR - - Path(DEBUG_LOG_DIR).mkdir(parents=True, exist_ok=True) - with safe_open(Path(DEBUG_LOG_DIR) / "bitwidth_per_scope.json", "w") as outfile: - json.dump({"bitwidth_per_scope": bitwidth_per_scope}, outfile, indent=4, sort_keys=False) - self._model.to(original_device) - return final_quantizer_setup - - @staticmethod - def _merge_constraints_for_adjacent_quantizers( - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - hw_precision_constraints: HardwareQuantizationConstraints, - ) -> HardwareQuantizationConstraints: - if not hw_precision_constraints: - return None - retval = deepcopy(hw_precision_constraints) - for group in groups_of_adjacent_quantizers: - all_bitwidths_sets = [] - quantizer_ids = [] - all_quantizers = group.weight_quantizers + group.activation_quantizers - for quantizer_id, _ in all_quantizers: - bitwidths_vs_qconfig_sequence = retval.get_bitwidth_vs_qconfigs_dict(quantizer_id) - bitwidths = set(bitwidths_vs_qconfig_sequence.keys()) - all_bitwidths_sets.append(bitwidths) - quantizer_ids.append(quantizer_id) - minimal_set_bitwidths = set.intersection(*all_bitwidths_sets) - if not minimal_set_bitwidths: - msg = ( - "No bitwidths configurations are left after removing inconsistent groups of weight quantizers" - " with adjacent activation quantizers!" - ) - raise nncf.InternalError(msg) - for quantizer_id in quantizer_ids: - qconfig_sequence = retval.get(quantizer_id) - filtered_qconfig_sequence = [] - for qconf in qconfig_sequence: - if qconf.num_bits in minimal_set_bitwidths: - filtered_qconfig_sequence.append(qconf) - retval.replace(quantizer_id, filtered_qconfig_sequence) - return retval - - def get_compression_ratio_per_qconfig_sequence( - self, qconfig_sequences_in_trace_order: list[QConfigSequenceForHAWQToEvaluate], traces_order: TracesOrder - ) -> list[float]: - compression_ratio_per_qconfig = [] - for qconfig_sequence in qconfig_sequences_in_trace_order: - quantizer_setup = self.get_quantizer_setup_for_qconfig_sequence(qconfig_sequence, traces_order) - compression_ratio = self._compression_ratio_calculator.run_for_quantizer_setup(quantizer_setup) - compression_ratio_per_qconfig.append(compression_ratio) - return compression_ratio_per_qconfig - - class ParamsToRestore(NamedTuple): - originally_disabled_gradients: list[str] - skipped_gradients_to_enable: list[tuple[nn.Module, str]] - - @staticmethod - def disable_all_gradients_except_weights_of_quantized_modules( - quantizers_switcher: QuantizersSwitcher, - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - model: nn.Module, - skipped_quantized_weight_node_names: list[NNCFNodeName] = None, - ) -> ParamsToRestore: - """ - Disables gradients of all parameters, except for layers that have quantizers for weights, which wasn't skipped - because of single precision constraints. - :param quantizers_switcher: object that is responsible for enabling and disabling quantizers - :param weight_quantizers: modules with quantized weights per scope - :param model: model to access all parameters - :param skipped_quantized_weight_node_names: list of weighted nodes that have a single precision - constraint and which weights should be skipped from bitwidth initialization - :return: list of names of the parameters that were originally disabled - """ - originally_disabled_gradients = [] - skipped_gradients_to_enable = [] - - # Some quantizers can be disabled in a staged scenario on creation of staged scheduler - # Need to save originally disabled quantizers for restoring their state after initialization - quantizers_switcher.disable_quantizers() - - # remember gradients of quantized modules that were enabled - gradients_to_enable = [] - for wq_id, wq_info in weight_quantizers.items(): - quantized_module = wq_info.quantized_module - target_node_name = wq_id.target_node_name - is_skipped = False - for skipped_wt_node_name in skipped_quantized_weight_node_names: - if skipped_wt_node_name == target_node_name: - is_skipped = True - break - for param_name, param in quantized_module.named_parameters(): - if param.requires_grad: - # disable gradients for skipped module for optimization of Hessian Trace search - if is_skipped: - skipped_gradients_to_enable.append((quantized_module, param_name)) - param.requires_grad = False - else: - gradients_to_enable.append((quantized_module, param_name)) - - # disable all gradients, except already disabled - for param_name, param in model.named_parameters(): - if not param.requires_grad: - originally_disabled_gradients.append(param_name) - else: - param.requires_grad = False - - # enable gradients of quantized modules that were disabled - for wq_id in weight_quantizers.values(): - quantized_module = wq_id.quantized_module - for param_name, param in quantized_module.named_parameters(): - if (quantized_module, param_name) in gradients_to_enable and "bias" not in param_name: - param.requires_grad = True - return HAWQPrecisionInitializer.ParamsToRestore(originally_disabled_gradients, skipped_gradients_to_enable) - - def _calc_traces( - self, - criterion_fn: Callable[[Any, Any, _Loss], torch.Tensor], - criterion: _Loss, - iter_number: int, - tolerance: float, - ) -> TracesPerLayer: - if self._traces_per_layer_path: - return TracesPerLayer(torch.load(self._traces_per_layer_path, weights_only=False).to(self._init_device)) - - quantizers_switcher = QuantizersSwitcher(list(self._all_quantizers_per_scope.values())) - params_to_restore = self.disable_all_gradients_except_weights_of_quantized_modules( - quantizers_switcher, - self._algo.weight_quantizers, - self._model, - self._quantizers_handler.get_skipped_quantized_weight_node_names(), - ) - - trace_estimator = HessianTraceEstimator( - self._model, criterion_fn, criterion, self._init_device, self._data_loader, self._num_data_points - ) - try: - avg_traces = trace_estimator.get_average_traces(max_iter=iter_number, tolerance=tolerance) - except RuntimeError as error: - if "cuda out of memory" in error.args[0].lower(): - msg = ( - "Failed to estimate average Hessian traces within precision initialization. Specify " - "a smaller batch size via --batch-size-init option in the NNCF samples or register " - "a data loader with a smaller batch size. Refer to " - "`NNCFConfig.register_extra_structs` and the `QuantizationPrecisionInitArgs`" - " class" - ) - raise nncf.InternalError(msg) from error - raise error - - self.restore_disabled_gradients( - quantizers_switcher, self._model, self._algo.weight_quantizers, params_to_restore - ) - - return TracesPerLayer(avg_traces) - - @staticmethod - def restore_disabled_gradients( - quantizers_switcher: QuantizersSwitcher, - model: nn.Module, - weight_quantizers: dict[WeightQuantizerId, WeightQuantizerInfo], - params_to_restore: ParamsToRestore, - ): - """ - Restore requires_grad property of all parameters back, except for ones that were originally disabled - :param quantizers_switcher: object that is responsible for enabling and disabling quantizers - :param model: model to access all parameters - :param weight_quantizers: modules with quantized weights per scope - :param params_to_restore: storage names of the parameters that should restore requires_grad property - """ - for wq_info in weight_quantizers.values(): - quantized_module = wq_info.quantized_module - for param_name, param in quantized_module.named_parameters(): - if (quantized_module, param_name) in params_to_restore.skipped_gradients_to_enable: - param.requires_grad = True - - for param_name, param in model.named_parameters(): - if param_name not in params_to_restore.originally_disabled_gradients: - param.requires_grad = True - quantizers_switcher.enable_quantizers() - - def get_qconfig_sequences_constrained_by_traces_order( - self, traces_order: TracesOrder - ) -> tuple[list[QConfigSequenceForHAWQToEvaluate], list[CoveringQConfigSequenceForQuantNoiseCalculation]]: - possible_qconfigs_sequence_in_trace_order: list[list[QuantizerConfig]] = [] - trace_order_indices_of_defaulted_qconfig_sequence: set[int] = set() - quantizer_ids_in_exec_order = list(self._weight_quantizations_by_execution_order.keys()) - assert len(quantizer_ids_in_exec_order) == len(traces_order) - for trace_idx in range(len(traces_order)): - exec_idx = traces_order.get_execution_index_by_traces_index(trace_idx) - qid = quantizer_ids_in_exec_order[exec_idx] - default_qconfig = self._weight_quantizations_by_execution_order[qid].get_quantizer_config() - qconfig_constraints = [] - if self._hw_precision_constraints: - qconfig_constraints = self._hw_precision_constraints.get(qid) - if qconfig_constraints: - possible_qconfigs_sequence_in_trace_order.append(qconfig_constraints) - else: - possible_qconfigs_sequence_in_trace_order.append([default_qconfig]) - trace_order_indices_of_defaulted_qconfig_sequence.add(trace_idx) - - matcher = TraceOrderBitwidthMatcher(self._bitwidths, traces_order) - return matcher.get_qconfig_sequences_constrained_by_trace_order( - possible_qconfigs_sequence_in_trace_order, trace_order_indices_of_defaulted_qconfig_sequence - ) - - def _get_weight_qp_ids_in_trace_order(self, traces_order: TracesOrder) -> list[set[QuantizationPointId]]: - quant_module_ids = list(self._weight_quantizations_by_execution_order.keys()) - qp_ids_in_trace_order = [] - for trace_idx in range(len(traces_order)): - exec_idx = traces_order.get_execution_index_by_traces_index(trace_idx) - quant_module_id = quant_module_ids[exec_idx] - qp_ids_in_trace_order.append(self._algo.module_id_to_qp_id_translation_dict[quant_module_id]) - return qp_ids_in_trace_order - - @staticmethod - def _apply_qconfig_sequence_to_quantizer_setup( - qconfig_sequence: CoveringQConfigSequenceForQuantNoiseCalculation, - qp_ids_in_trace_order: list[set[QuantizationPointId]], - quantizer_setup: SingleConfigQuantizerSetup, - ) -> SingleConfigQuantizerSetup: - retval = deepcopy(quantizer_setup) - assert len(qconfig_sequence) == len(qp_ids_in_trace_order) - for trace_idx, qp_id_set in enumerate(qp_ids_in_trace_order): - for qp_id in qp_id_set: - retval.quantization_points[qp_id].qconfig = deepcopy(qconfig_sequence[trace_idx]) - return retval - - def calc_quantization_noise( - self, qconfig_sequences_to_run: list[CoveringQConfigSequenceForQuantNoiseCalculation], traces_order: TracesOrder - ) -> tuple[Perturbations, list[list[PerturbationObserver]]]: - perturbations = Perturbations() - qp_ids_in_trace_order = self._get_weight_qp_ids_in_trace_order(traces_order) - ctrl = self._algo - observers_for_all_qconfig_sequences: list[list[PerturbationObserver]] = [] - for qconfig_sequence in qconfig_sequences_to_run: - quantizer_setup_to_run = self._apply_qconfig_sequence_to_quantizer_setup( - qconfig_sequence, qp_ids_in_trace_order, ctrl.get_quantizer_setup_for_current_state() - ) - ctrl, model = ctrl.apply_new_quantizer_setup(quantizer_setup_to_run) - - hook_handles = [] - observers = [] - for qp_id_set in qp_ids_in_trace_order: - for qp_id in qp_id_set: - wq_id = ctrl.setup_to_module_id_translation_dict[qp_id] - wq_module = ctrl.weight_quantizers[wq_id].quantizer_module_ref - observer = PerturbationObserver(self._init_device) - hook_handles.append(wq_module.register_forward_hook(observer.calc_perturbation)) - observers.append(observer) - - model.nncf.do_dummy_forward(force_eval=True) - - for i, observer in enumerate(observers): - perturbations.add( - layer_id=traces_order.get_execution_index_by_traces_index(i), - qconfig=qconfig_sequence[i], - perturbation=observer.get_observation().to(self._init_device), - ) - - for handle in hook_handles: - handle.remove() - observers_for_all_qconfig_sequences.append(observers) - - return perturbations, observers_for_all_qconfig_sequences - - @staticmethod - def calc_hawq_metric_per_qconfig_sequence( - qconfig_sequences_in_trace_order: list[QConfigSequenceForHAWQToEvaluate], - perturbations: Perturbations, - traces_per_layer: TracesPerLayer, - device, - ) -> list[Tensor]: - metric_per_qconfig_sequence = [] - for qconfig_sequence_in_trace_order in qconfig_sequences_in_trace_order: - hawq_metric = torch.Tensor([0]).to(device) - for trace_index, qconfig in enumerate(qconfig_sequence_in_trace_order): - execution_index = traces_per_layer.traces_order.get_execution_index_by_traces_index(trace_index) - hawq_metric += traces_per_layer.get_by_trace_index(trace_index) * perturbations.get( - layer_id=execution_index, qconfig=qconfig - ) - metric_per_qconfig_sequence.append(hawq_metric) - return metric_per_qconfig_sequence - - @staticmethod - def choose_qconfig_sequence( - metric_per_qconfig_sequences: list[Tensor], compression_ratio_per_qconfig: list[float], compression_ratio - ) -> int: - num_qconfig_sequences = len(metric_per_qconfig_sequences) - - sorted_compression_ratio_per_qconfig = sorted(compression_ratio_per_qconfig) - indexes_of_sorted_compression_ratio = [ - x[0] for x in sorted(enumerate(compression_ratio_per_qconfig), reverse=False, key=lambda x: x[1]) - ] - - boundary_index = bisect_left(sorted_compression_ratio_per_qconfig, compression_ratio) - indexes_to_check = [ - indexes_of_sorted_compression_ratio[i] for i in range(boundary_index, num_qconfig_sequences) - ] - best_metric = min(list(itemgetter(*indexes_to_check)(metric_per_qconfig_sequences))) - best_qconfig_sequence_index = metric_per_qconfig_sequences.index(best_metric) - return best_qconfig_sequence_index - - def get_quantizer_setup_for_qconfig_sequence( - self, qconfig_sequence_in_traces_order: QConfigSequenceForHAWQToEvaluate, traces_order: TracesOrder - ) -> SingleConfigQuantizerSetup: - wqp_ids_in_trace_order = self._get_weight_qp_ids_in_trace_order(traces_order) - ctrl = self._algo - - quantizer_setup_to_set = self._apply_qconfig_sequence_to_quantizer_setup( - qconfig_sequence_in_traces_order, wqp_ids_in_trace_order, ctrl.get_quantizer_setup_for_current_state() - ) - - assert quantizer_setup_to_set.shared_input_operation_set_groups - for group in quantizer_setup_to_set.shared_input_operation_set_groups.values(): - weight_qp_ids = [] - act_qp_ids = [] - for qp_id in group: - qp = quantizer_setup_to_set.quantization_points[qp_id] - if qp.is_weight_quantization_point(): - weight_qp_ids.append(qp_id) - elif qp.is_activation_quantization_point(): - act_qp_ids.append(qp_id) - weight_qps = [quantizer_setup_to_set.quantization_points[qp_id] for qp_id in weight_qp_ids] - weight_bitwidth_set = {weight_qp.qconfig.num_bits for weight_qp in weight_qps} - - if self._bitwidth_assignment_mode == BitwidthAssignmentMode.STRICT: - quantizer_setup_to_set = self._set_activations_bitwidth_strictly( - quantizer_setup_to_set, act_qp_ids, weight_bitwidth_set - ) - else: - quantizer_setup_to_set = self._set_activation_bitwidth_liberally( - quantizer_setup_to_set, act_qp_ids, weight_bitwidth_set - ) - - return quantizer_setup_to_set - - def _set_activation_bitwidth_liberally( - self, - quantizer_setup_to_set: SingleConfigQuantizerSetup, - act_qp_ids: list[QuantizationPointId], - weight_bitwidth_set: set[int], - ) -> SingleConfigQuantizerSetup: - for act_qp_id in act_qp_ids: - original_quant_module_id = self._original_qp_id_vs_quantizer_module_id_dict[act_qp_id] - activation_bitwidths_vs_qconfig_sequence = self._hw_precision_constraints.get_bitwidth_vs_qconfigs_dict( - original_quant_module_id - ) - activation_bitwidth_set = set(activation_bitwidths_vs_qconfig_sequence.keys()) - intersection = activation_bitwidth_set.intersection(weight_bitwidth_set) - target_qp = quantizer_setup_to_set.quantization_points[act_qp_id] - if activation_bitwidth_set.__len__() == 1: - target_bitwidth = activation_bitwidth_set.pop() - elif intersection: - target_bitwidth = min(intersection) - elif activation_bitwidth_set: - target_bitwidth = min(activation_bitwidth_set) - elif weight_bitwidth_set: - target_bitwidth = min(weight_bitwidth_set) - else: - continue - - if activation_bitwidths_vs_qconfig_sequence: - target_qp.qconfig = deepcopy(activation_bitwidths_vs_qconfig_sequence[target_bitwidth][0]) - else: - # The activation has no constraints, so the config in the setup was defaulted - # and we can simply adjust the bitwidth - target_qp.qconfig.num_bits = target_bitwidth - - return quantizer_setup_to_set - - def _set_activations_bitwidth_strictly( - self, - quantizer_setup_to_set: SingleConfigQuantizerSetup, - act_qp_ids: list[QuantizationPointId], - weight_bitwidth_set: set[int], - ) -> SingleConfigQuantizerSetup: - if len(weight_bitwidth_set) > 1: - msg = "Invalid grouping of weight quantizers" - raise nncf.InternalError(msg) - all_constraints = set() - original_quant_module_ids = [ - self._original_qp_id_vs_quantizer_module_id_dict[act_qp_id] for act_qp_id in act_qp_ids - ] - for act_quant_module_id in original_quant_module_ids: - all_constraints.update(self._hw_precision_constraints.get_all_unique_bitwidths(act_quant_module_id)) - common_constraints = set(all_constraints) - for act_quant_module_id in original_quant_module_ids: - constraint = self._hw_precision_constraints.get_all_unique_bitwidths(act_quant_module_id) - common_constraints = common_constraints.intersection(constraint) - if weight_bitwidth_set: - common_constraints = common_constraints.intersection(weight_bitwidth_set) - if not common_constraints: - msg = "No hardware compatible bitwidth for activation quantizers" - raise nncf.InternalError(msg) - for act_qp_id in act_qp_ids: - quant_id = self._original_qp_id_vs_quantizer_module_id_dict[act_qp_id] - target_bitwidth = sorted(list(common_constraints))[0] - bitwidths_vs_qconfig_sequence = self._hw_precision_constraints.get_bitwidth_vs_qconfigs_dict(quant_id) - qconfig_to_select = bitwidths_vs_qconfig_sequence[target_bitwidth][0] - quantizer_setup_to_set.quantization_points[act_qp_id].qconfig = qconfig_to_select - - return quantizer_setup_to_set - - @staticmethod - def _filter_qconfig_sequences_by_grouped_weight_quantizers( - trace_ordered_qconfig_sequences: list[QConfigSequenceForHAWQToEvaluate], - weight_quantization_ids_by_execution_order: list[QuantizerId], - groups_of_adjacent_quantizers: GroupsOfAdjacentQuantizers, - traces_order: TracesOrder, - ) -> list[QConfigSequenceForHAWQToEvaluate]: - """ - Removes configs where adjacent weight quantizers have different bitwidth. Adjacency is defined by common - activation quantizers - """ - filtered_qconfig_sequences = [] - all_grouped_indexes = [] - for group_of_adjacent_quantizers in groups_of_adjacent_quantizers: - wqs = group_of_adjacent_quantizers.weight_quantizers - if len(wqs) > 1: - indexes_of_grouped_wq = [] - for quantizer_id, _ in wqs: - if quantizer_id in weight_quantization_ids_by_execution_order: - index_by_execution_order = weight_quantization_ids_by_execution_order.index(quantizer_id) - indexes_of_grouped_wq.append(index_by_execution_order) - all_grouped_indexes.append(indexes_of_grouped_wq) - - if not all_grouped_indexes: - return trace_ordered_qconfig_sequences - - for qconfig_sequence in trace_ordered_qconfig_sequences: - execution_ordered_qconfig_sequence = traces_order.get_execution_order_configs(qconfig_sequence) - bitwidth_sequence = [qconfig.num_bits for qconfig in execution_ordered_qconfig_sequence] - keep_config = True - for indexes_of_grouped_wq in all_grouped_indexes: - grouped_bits = [bitwidth_sequence[index] for index in indexes_of_grouped_wq] - if grouped_bits[1:] != grouped_bits[:-1]: - keep_config = False - break - if keep_config: - filtered_qconfig_sequences.append(qconfig_sequence) - - return filtered_qconfig_sequences - - def _filter_qconfig_sequences_by_excessive_bitwidth( - self, weight_qconfig_sequences_in_trace_order: list[QConfigSequenceForHAWQToEvaluate] - ) -> list[QConfigSequenceForHAWQToEvaluate]: - result = weight_qconfig_sequences_in_trace_order - if self._hw_precision_constraints: - all_weight_bitwidths = set() - for wq_id in self._algo.weight_quantizers: - all_weight_bitwidths.update(self._hw_precision_constraints.get_all_unique_bitwidths(wq_id)) - - all_activation_bitwidths = set() - for aq_id in self._algo.non_weight_quantizers: - all_activation_bitwidths.update(self._hw_precision_constraints.get_all_unique_bitwidths(aq_id)) - - excessive_weight_bitwidths = all_weight_bitwidths - all_activation_bitwidths - - def filter_fn(qconfig_sequence: QConfigSequenceForHAWQToEvaluate): - all_qconfig_bitwidths = set(map(lambda qconfig: qconfig.num_bits, qconfig_sequence)) - return any(map(lambda x: x not in all_qconfig_bitwidths, excessive_weight_bitwidths)) - - if excessive_weight_bitwidths: - result = list(filter(filter_fn, weight_qconfig_sequences_in_trace_order)) - return result diff --git a/src/nncf/torch/quantization/precision_init/manual_init.py b/src/nncf/torch/quantization/precision_init/manual_init.py deleted file mode 100644 index e4d8e6802cf..00000000000 --- a/src/nncf/torch/quantization/precision_init/manual_init.py +++ /dev/null @@ -1,68 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup -from nncf.torch.quantization.precision_constraints import HardwareQuantizationConstraints -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitializer -from nncf.torch.quantization.precision_init.base_init import BasePrecisionInitParams -from nncf.torch.structures import QuantizationPrecisionInitArgs - - -class ManualPrecisionInitParams(BasePrecisionInitParams): - def __init__(self, user_init_args: QuantizationPrecisionInitArgs = None, bitwidth_per_scope: list[list] = None): - super().__init__(user_init_args) - self.bitwidth_per_scope = bitwidth_per_scope - - @classmethod - def from_config(cls, manual_init_params_dict: dict): - return cls(user_init_args=None, bitwidth_per_scope=manual_init_params_dict.get("bitwidth_per_scope", [])) - - -class ManualPrecisionInitializer(BasePrecisionInitializer): - def __init__( - self, - algo: "ExperimentalQuantizationController", # noqa: F821 - params: ManualPrecisionInitParams, - hw_precision_constraints: HardwareQuantizationConstraints = None, - ): - super().__init__(algo, params, hw_precision_constraints) - self._bitwidth_per_scope = params.bitwidth_per_scope - - def apply_init(self) -> SingleConfigQuantizerSetup: - quantizer_setup = self._algo.get_quantizer_setup_for_current_state() - for pair in self._bitwidth_per_scope: - bitwidth, scope_name = pair - is_matched = False - msg = ( - "Failed to assign bitwidth={} to `{}`,\n" - "because it is incompatible for the specified target hardware\n" - "Supported quantization configs: {}" - ) - for qp_id, qp in quantizer_setup.quantization_points.items(): - if scope_name in str(qp.insertion_point): - if self._hw_precision_constraints: - q_id = self._algo.setup_to_module_id_translation_dict[qp_id] - q_configs = self._hw_precision_constraints.get(q_id) - matched_q_configs = list(filter(lambda x: x.num_bits == bitwidth, q_configs)) - if not matched_q_configs: - raise ValueError(msg.format(bitwidth, scope_name, list(map(str, q_configs)))) - qp.qconfig = matched_q_configs[0] - else: - qp.qconfig.num_bits = bitwidth - is_matched = True - break - if not is_matched: - msg = ( - f"Could not find a quantization point at scope name `{scope_name}`," - f" failed to assign bitwidth {bitwidth} to it" - ) - raise ValueError(msg) - return quantizer_setup diff --git a/src/nncf/torch/quantization/precision_init/perturbations.py b/src/nncf/torch/quantization/precision_init/perturbations.py deleted file mode 100644 index ee99ca23a8f..00000000000 --- a/src/nncf/torch/quantization/precision_init/perturbations.py +++ /dev/null @@ -1,62 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import torch -from torch import Tensor - -from nncf.common.quantization.structs import QuantizerConfig -from nncf.torch.dynamic_graph.context import no_nncf_trace - - -class PerturbationObserver: - def __init__(self, device): - super().__init__() - self.device = device - self.perturbation = None - self.numels = None - - def calc_perturbation(self, module, inputs: Tensor, output: Tensor): - input_ = inputs[0] if isinstance(inputs, tuple) else inputs - with no_nncf_trace(): - self.perturbation = torch.norm(input_ - output, p=2) ** 2 - self.numels = input_.size().numel() - self.input_norm = torch.norm(input_, p=2) ** 2 - - def reset(self): - self.perturbation = None - self.numels = None - - def get_observation(self): - return self.perturbation - - def get_numels(self): - return self.numels - - def get_input_norm(self): - return self.input_norm - - -class Perturbations: - def __init__(self): - self._perturbations: dict[int, dict[QuantizerConfig, Tensor]] = {} - - def add(self, layer_id: int, qconfig: QuantizerConfig, perturbation: Tensor): - if layer_id in self._perturbations: - self._perturbations[layer_id].update({qconfig: perturbation}) - else: - self._perturbations[layer_id] = {qconfig: perturbation} - - def get(self, layer_id: int, qconfig: QuantizerConfig) -> Tensor: - layer_perturbations = self._perturbations[layer_id] - return layer_perturbations[qconfig] - - def get_all(self) -> dict[int, dict[QuantizerConfig, Tensor]]: - return self._perturbations diff --git a/src/nncf/torch/quantization/precision_init/traces_order.py b/src/nncf/torch/quantization/precision_init/traces_order.py deleted file mode 100644 index 7b133cce408..00000000000 --- a/src/nncf/torch/quantization/precision_init/traces_order.py +++ /dev/null @@ -1,70 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from torch import Tensor - - -class TracesOrder: - def __init__(self, execution_indexes_of_weights_ordered_by_traces: list[int]): - self._index_by_traces_to_execution_index = execution_indexes_of_weights_ordered_by_traces - self._num_weights = len(execution_indexes_of_weights_ordered_by_traces) - self._index_by_execution_to_index_by_traces = [ - execution_indexes_of_weights_ordered_by_traces.index(i) for i in range(self._num_weights) - ] - - def get_execution_order_configs(self, trace_ordered_configuration: list) -> list: - if len(trace_ordered_configuration) != self._num_weights: - msg = "Incompatible configuration size!" - raise ValueError(msg) - execution_order_config = [None] * self._num_weights - for i, config in enumerate(trace_ordered_configuration): - execution_order_config[self._index_by_traces_to_execution_index[i]] = config - return execution_order_config - - def get_traces_order_configs(self, execution_ordered_configuration: list) -> list: - if len(execution_ordered_configuration) != self._num_weights: - msg = "Incompatible configuration size!" - raise ValueError(msg) - traces_order_config = [None] * self._num_weights - for i, config in enumerate(execution_ordered_configuration): - traces_order_config[self._index_by_execution_to_index_by_traces[i]] = config - return traces_order_config - - def get_execution_index_by_traces_index(self, traces_index: int): - return self._index_by_traces_to_execution_index[traces_index] - - def __bool__(self): - return bool(self._index_by_traces_to_execution_index) - - def __len__(self): - return len(self._index_by_execution_to_index_by_traces) - - -class TracesPerLayer: - def __init__(self, traces_per_layer_by_execution: Tensor): - self._traces_per_layer_by_execution = traces_per_layer_by_execution - execution_indexes_of_weights_in_descending_order_of_traces = [ - i[0] for i in sorted(enumerate(traces_per_layer_by_execution), reverse=False, key=lambda x: x[1]) - ] - self.traces_order = TracesOrder(execution_indexes_of_weights_in_descending_order_of_traces) - - def get_by_execution_index(self, execution_index: int) -> Tensor: - return self._traces_per_layer_by_execution[execution_index] - - def get_by_trace_index(self, trace_index: int) -> Tensor: - execution_index = self.traces_order.get_execution_index_by_traces_index(trace_index) - return self._traces_per_layer_by_execution[execution_index] - - def get_all(self) -> Tensor: - return self._traces_per_layer_by_execution - - def __bool__(self): - return bool(self.traces_order) diff --git a/src/nncf/torch/quantization/schedulers.py b/src/nncf/torch/quantization/schedulers.py deleted file mode 100644 index 2c8dfdf5506..00000000000 --- a/src/nncf/torch/quantization/schedulers.py +++ /dev/null @@ -1,81 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging - -from nncf.api.compression import CompressionStage -from nncf.common.schedulers import BaseCompressionScheduler -from nncf.common.utils.registry import Registry -from nncf.config.schemata.defaults import ACTIVATIONS_QUANT_START_EPOCH -from nncf.config.schemata.defaults import WEIGHTS_QUANT_START_EPOCH - -logger = logging.getLogger(__name__) - -QUANTIZATION_SCHEDULERS = Registry("quantization_schedulers") - - -@QUANTIZATION_SCHEDULERS.register("staged") -class StagedQuantizationScheduler(BaseCompressionScheduler): - def __init__(self, quantization_ctrl: "QuantizationController", params=None): # noqa: F821 - super().__init__() - if params is None: - params = {} - self.algo = quantization_ctrl - self.activations_quant_start_epoch = params.get("activations_quant_start_epoch", ACTIVATIONS_QUANT_START_EPOCH) - self.weights_quant_start_epoch = params.get("weights_quant_start_epoch", WEIGHTS_QUANT_START_EPOCH) - self._set_quantization_status() - - def epoch_step(self, next_epoch=None): - super().epoch_step(next_epoch) - should_call_init = False - if self.current_epoch == self.activations_quant_start_epoch: - logger.info("Enabled quantization of activations") - self.algo.enable_activation_quantization() - should_call_init = True - - if self.current_epoch == self.weights_quant_start_epoch: - logger.info("Enabled quantization of weights") - self.algo.enable_weight_quantization() - should_call_init = True - - if should_call_init: - self.algo.init_range() - - def load_state(self, state): - super().load_state(state) - # Just enables/disables quantizers without calling initialization of ranges, because it's called on epoch_step - # in the end of previous epoch before saving the scheduler's state dict. - self._set_quantization_status() - - def _set_quantization_status(self): - if max(self.current_epoch, 0) >= self.activations_quant_start_epoch: - self.algo.enable_activation_quantization() - logger.info("Enabled quantization of activations") - else: - self.algo.disable_activation_quantization() - logger.info("Disabled quantization of activations") - if max(self.current_epoch, 0) >= self.weights_quant_start_epoch: - self.algo.enable_weight_quantization() - logger.info("Enabled quantization of weights") - else: - self.algo.disable_weight_quantization() - logger.info("Disabled quantization of weights") - - def _calc_density_level(self): - raise NotImplementedError - - def compression_stage(self): - is_activations_enabled = self.current_epoch >= self.activations_quant_start_epoch - is_weights_enabled = self.current_epoch >= self.weights_quant_start_epoch - if is_activations_enabled and is_weights_enabled: - return CompressionStage.FULLY_COMPRESSED - if not is_activations_enabled and not is_weights_enabled: - return CompressionStage.UNCOMPRESSED - return CompressionStage.PARTIALLY_COMPRESSED diff --git a/src/nncf/torch/quantization/strip.py b/src/nncf/torch/quantization/strip.py index 40d7bd0f511..6b46c537e56 100644 --- a/src/nncf/torch/quantization/strip.py +++ b/src/nncf/torch/quantization/strip.py @@ -17,17 +17,6 @@ from torch.quantization.fake_quantize import FakeQuantize import nncf -from nncf.common.graph.transformations.commands import TargetType -from nncf.common.graph.transformations.layout import TransformationLayout -from nncf.parameters import StripFormat -from nncf.torch.graph.transformations.commands import ExtraCompressionModuleType -from nncf.torch.graph.transformations.commands import PTSharedFnInsertionCommand -from nncf.torch.graph.transformations.commands import PTTargetPoint -from nncf.torch.model_graph_manager import get_const_data -from nncf.torch.model_graph_manager import get_module_by_name -from nncf.torch.model_graph_manager import split_const_name -from nncf.torch.model_transformer import PTModelTransformer -from nncf.torch.nncf_network import NNCFNetwork from nncf.torch.quantization.layers import AsymmetricQuantizer from nncf.torch.quantization.layers import BaseQuantizer from nncf.torch.quantization.layers import INT4AsymmetricWeightsDecompressor @@ -40,50 +29,6 @@ SUPPORTED_NUM_BITS_FOR_STRIP_MODEL = [8] -def replace_quantizer_to_torch_native_module(model: NNCFNetwork) -> NNCFNetwork: - """ - Replace NNCF quantizer modules to PyTorch FakeQuantizer module and remove unused quantizer operators. - - :param model: Target model. - :return: The modified NNCF network. - """ - compression_module_type = ExtraCompressionModuleType.EXTERNAL_QUANTIZER - if model.nncf.is_compression_module_registered(compression_module_type): - external_quantizers = model.nncf.get_compression_modules_by_type(compression_module_type) - for key in external_quantizers: - if external_quantizers[key].is_enabled_quantization(): - external_quantizers[key] = convert_to_torch_fakequantizer(external_quantizers[key]) - - for node in model.nncf.get_original_graph().get_all_nodes(): - if node.node_type in ["nncf_model_input", "nncf_model_output"]: - continue - - nncf_module = model.nncf.get_containing_module(node.node_name) - - if hasattr(nncf_module, "pre_ops"): - for key in list(nncf_module.pre_ops.keys()): - op = nncf_module.get_pre_op(key) - if isinstance(op.op, BaseQuantizer) and op.op.is_enabled_quantization(): - if op.op.is_half_range or op.op.narrow_range: - # Half range and narrow_range require to clamp weights of module - # Note: Half range and narrow_range used only for weight. - input_low, input_high = op.op.get_input_low_input_high() - - data = nncf_module.weight.data - data = torch.min(torch.max(data, input_low), input_high) - data = op.op.quantize(data, execute_traced_op_as_identity=False) - nncf_module.weight.data = data - op.op = convert_to_torch_fakequantizer(op.op) - - if hasattr(nncf_module, "post_ops"): - for key in list(nncf_module.post_ops.keys()): - op = nncf_module.get_post_ops(key) - if isinstance(op.op, BaseQuantizer) and op.op.is_enabled_quantization(): - op.op = convert_to_torch_fakequantizer(op.op) - - return model - - def convert_to_torch_fakequantizer(nncf_quantizer: BaseQuantizer) -> FakeQuantize: """ Convert BaseQuantizer module to FakeQuantize. @@ -140,66 +85,6 @@ def convert_to_torch_fakequantizer(nncf_quantizer: BaseQuantizer) -> FakeQuantiz return fakequantizer -def remove_disabled_quantizers(model: NNCFNetwork) -> NNCFNetwork: - """ - Remove all unused quantizer operators from the model. - - :param model: Compressed model. - :return: The modified NNCF network. - """ - compression_module_type = ExtraCompressionModuleType.EXTERNAL_QUANTIZER - if model.nncf.is_compression_module_registered(compression_module_type): - external_quantizers = model.nncf.get_compression_modules_by_type(compression_module_type) - for key in list(external_quantizers.keys()): - op = external_quantizers[key] - if isinstance(op, BaseQuantizer) and not op.is_enabled_quantization(): - external_quantizers.pop(key) - - if not model.nncf.replace_modules: - return model - - for node in model.nncf.get_original_graph().get_all_nodes(): - if node.node_type in ["nncf_model_input", "nncf_model_output"]: - continue - - nncf_module = model.nncf.get_containing_module(node.node_name) - - if hasattr(nncf_module, "pre_ops"): - for key in list(nncf_module.pre_ops.keys()): - op = nncf_module.get_pre_op(key) - if isinstance(op, BaseQuantizer) and not op.is_enabled_quantization(): - nncf_module.remove_pre_forward_operation(key) - - if hasattr(nncf_module, "post_ops"): - for key in list(nncf_module.post_ops.keys()): - op = nncf_module.post_ops(key) - if isinstance(op, BaseQuantizer) and not op.is_enabled_quantization(): - nncf_module.remove_post_forward_operation(key) - - return model - - -def strip_quantized_model(model: NNCFNetwork, strip_format: StripFormat = StripFormat.NATIVE): - """ - Removes auxiliary layers and operations added during the quantization process, - resulting in a clean quantized model ready for deployment. The functionality of the model object is still preserved - as a compressed model. - - :param model: Compressed model. - :param strip format: Describes the format in which model is saved after strip. - :return: The modified NNCF network. - """ - if strip_format == StripFormat.DQ: - model = replace_with_decompressors(model) - elif strip_format == StripFormat.NATIVE: - model = replace_quantizer_to_torch_native_module(model) - model = remove_disabled_quantizers(model) - else: - msg = f"Unsupported strip format: {strip_format}" - raise nncf.ParameterNotSupportedError(msg) - return model - - def asym_fq_to_decompressor( quantizer: AsymmetricQuantizer, weight: torch.Tensor ) -> tuple[Union[INT8AsymmetricWeightsDecompressor, INT4AsymmetricWeightsDecompressor], torch.Tensor]: @@ -301,79 +186,3 @@ def sym_fq_to_decompressor( result_dtype=weight_dtype, ) return decompressor, q_weight - - -def replace_with_decompressors(model: NNCFNetwork) -> NNCFNetwork: - """ - Performs transformation from fake quantize format (FQ) to dequantization one (DQ). - The former takes floating-point input, quantizes and dequantizes, and returns a floating-point value, - while the latter takes a quantized integer representation, dequantizes it, and outputs a floating-point result. - - Mathematically, both methods lead to the same outcome, but due to differences in the order of operations and - rounding errors, the actual results may differ. In particular, this error can occur for values - that are located in the midpoint between two quantized values ("quants"). - - The FQ format may round these values to one "quant", while the DQ format rounds them to another "quant". - To avoid these issues, the compressed representation should be provided not by directly quantizing the input, - but by quantizing a pre-processed, fake-quantized, floating-point representation. - - :param model: Compressed model with Decompressors. - :return: The modified NNCF network. - """ - transformation_layout = TransformationLayout() - transformations = model.nncf.transformation_layout().transformations - model = model.nncf.get_clean_shallow_copy() - graph = model.nncf.get_graph() - for command in transformations: - quantizer = command.fn - if not isinstance(quantizer, (SymmetricQuantizer, AsymmetricQuantizer)): - # strip is only applied to Fake Quantizers, skip all other modules, e.g. SQMultiply for AWQ - transformation_layout.register(command) - continue - - msg = "" - if quantizer._qspec.half_range or quantizer._qspec.narrow_range: - msg += "Unexpected parameters of quantizers on strip: half_range and narrow_range should be False.\n" - if quantizer.num_bits not in [4, 8]: - msg += f"Unsupported number of bits {quantizer.num_bits} for the quantizer {quantizer}.\n" - if len(command.target_points) > 1: - msg += "Command contains more than one target point." - if msg: - raise nncf.ValidationError(msg) - - tp = command.target_points[0] - weight_node = graph.get_node_by_name(tp.target_node_name) - if weight_node is None: - msg = "FQ is not assigned to weight. Strip to DQ format is not supported for FQ on activation." - raise nncf.UnsupportedModelError(msg) - weight_name = weight_node.layer_attributes.name - weight = get_const_data(weight_node, model) - - convert_fn = asym_fq_to_decompressor if isinstance(quantizer, AsymmetricQuantizer) else sym_fq_to_decompressor - decompressor, q_weight = convert_fn(quantizer, weight) - - packed_tensor = decompressor.pack_weight(q_weight) - - # sets compressed tensor - # TODO:(AlexanderDokuchaev): update set_const_data - module_name, weight_attr_name = split_const_name(weight_name) - module = get_module_by_name(module_name, model) - weight = getattr(module, weight_attr_name) - - if not isinstance(weight, torch.nn.Parameter): - msg = f"Weight is not a torch.nn.Parameter in the model by name {weight_name}." - raise nncf.InternalError(msg) - - weight.requires_grad = False - weight.data = packed_tensor - - decompressor_name = f"weights_decompressor_{weight_node.node_name.replace('.', '_')}" - transformation_layout.register( - PTSharedFnInsertionCommand( - [PTTargetPoint(TargetType.OPERATOR_POST_HOOK, target_node_name=weight_node.node_name)], - decompressor, - decompressor_name, - ) - ) - - return PTModelTransformer(model).transform(transformation_layout) diff --git a/src/nncf/torch/quantization/structs.py b/src/nncf/torch/quantization/structs.py deleted file mode 100644 index dcf02a5148a..00000000000 --- a/src/nncf/torch/quantization/structs.py +++ /dev/null @@ -1,32 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from dataclasses import dataclass - -import torch - -from nncf.torch.graph.transformations.commands import PTTargetPoint -from nncf.torch.quantization.layers import BaseQuantizer - - -@dataclass -class QuantizerInfo: - quantizer_module_ref: BaseQuantizer - affected_insertions: list[PTTargetPoint] - - -@dataclass -class NonWeightQuantizerInfo(QuantizerInfo): - pass - - -@dataclass -class WeightQuantizerInfo(QuantizerInfo): - quantized_module: torch.nn.Module diff --git a/src/nncf/torch/strip.py b/src/nncf/torch/strip.py index b1d5efef132..5e71694552b 100644 --- a/src/nncf/torch/strip.py +++ b/src/nncf/torch/strip.py @@ -15,7 +15,6 @@ from torch import nn -from nncf.common.check_features import is_torch_tracing_by_patching from nncf.parameters import StripFormat TModel = TypeVar("TModel", bound=nn.Module) @@ -37,9 +36,6 @@ def strip( :param example_input: An example input tensor to be used for tracing the model. :return: The stripped model. """ - if is_torch_tracing_by_patching(): - return model.nncf.strip(do_copy, strip_format) - from nncf.torch.function_hook.strip import strip_model model = deepcopy(model) if do_copy else model diff --git a/tests/cross_fw/install/install_checks_torch.py b/tests/cross_fw/install/install_checks_torch.py index 72756773f06..4b8267bc486 100644 --- a/tests/cross_fw/install/install_checks_torch.py +++ b/tests/cross_fw/install/install_checks_torch.py @@ -26,7 +26,7 @@ import nncf # noqa: F401, E402 -from nncf.torch import create_compressed_model # noqa: F401, E402 +import nncf.torch # noqa: F401, E402 input_low_tensor = torch.zeros([1]) input_tensor = torch.ones([1, 1, 1, 1]) diff --git a/tests/torch/accuracy_aware_training/test_accuracy_aware_config.py b/tests/torch/accuracy_aware_training/test_accuracy_aware_config.py deleted file mode 100644 index 3b0de8ab711..00000000000 --- a/tests/torch/accuracy_aware_training/test_accuracy_aware_config.py +++ /dev/null @@ -1,69 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest - -import nncf -from nncf.common.accuracy_aware_training import create_accuracy_aware_training_loop -from nncf.torch.initialization import register_default_init_args -from tests.torch.helpers import LeNet -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_ones_mock_dataloader -from tests.torch.quantization.quantization_helpers import get_quantization_config_without_range_init - -pytestmark = pytest.mark.legacy - - -@pytest.mark.parametrize( - ("aa_config", "must_raise"), - ( - ( - { - "compression": { - "algorithm": "quantization", - }, - }, - True, - ), - ), -) -def test_accuracy_aware_config(aa_config, must_raise): - def mock_validate_fn(model): - pass - - config = get_quantization_config_without_range_init(LeNet.INPUT_SIZE[-1]) - - config.update( - { - "accuracy_aware_training": { - "mode": "adaptive_compression_level", - "params": { - "maximal_relative_accuracy_degradation": 1, - "initial_training_phase_epochs": 1, - "patience_epochs": 10, - }, - } - } - ) - - config.update(aa_config) - - train_loader = create_ones_mock_dataloader(config, num_samples=10) - model = LeNet() - - config = register_default_init_args(config, train_loader=train_loader, model_eval_fn=mock_validate_fn) - model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - if must_raise: - with pytest.raises(nncf.ValidationError): - _ = create_accuracy_aware_training_loop(config, compression_ctrl, 0, dump_checkpoints=False) - else: - _ = create_accuracy_aware_training_loop(config, compression_ctrl, 0, dump_checkpoints=False) diff --git a/tests/torch/helpers.py b/tests/torch/helpers.py index 24d87f7e2e1..518ccbc799e 100644 --- a/tests/torch/helpers.py +++ b/tests/torch/helpers.py @@ -14,8 +14,6 @@ from abc import ABC from abc import abstractmethod from collections import defaultdict -from copy import deepcopy -from pathlib import Path from typing import Any, Callable, TypeVar, Union import numpy as np @@ -42,10 +40,8 @@ from nncf.torch.graph.transformations.commands import PTInsertionCommand from nncf.torch.graph.transformations.commands import PTSharedFnInsertionCommand from nncf.torch.initialization import PTInitializingDataLoader -from nncf.torch.initialization import register_default_init_args from nncf.torch.layer_utils import StatefulModuleInterface from nncf.torch.layers import NNCF_MODULES_MAP -from nncf.torch.model_creation import create_compressed_model from nncf.torch.module_operations import UpdateWeight from nncf.torch.nncf_module_replacement import get_original_module_scope_from_nncf_module_scope from nncf.torch.nncf_network import NNCFNetwork @@ -401,27 +397,6 @@ def _to_numpy(cls, tensor: TensorType) -> Union[np.ndarray, numbers.Number]: raise Exception(msg) -def create_compressed_model_and_algo_for_test( - model: Module, - config: NNCFConfig = None, - dummy_forward_fn: Callable[[Module], Any] = None, - wrap_inputs_fn: Callable[[tuple, dict], tuple[tuple, dict]] = None, - compression_state: dict[str, Any] = None, -) -> tuple[NNCFNetwork, PTCompressionAlgorithmController]: - if config is not None: - assert isinstance(config, NNCFConfig) - NNCFConfig.validate(config) - algo, model = create_compressed_model( - model, - config, - dump_graphs=False, - dummy_forward_fn=dummy_forward_fn, - wrap_inputs_fn=wrap_inputs_fn, - compression_state=compression_state, - ) - return model, algo - - def create_nncf_model_and_single_algo_builder( model: Module, config: NNCFConfig, @@ -453,12 +428,6 @@ def create_nncf_model_and_single_algo_builder( return compressed_model, builder -def create_initialized_compressed_model(model: nn.Module, config: NNCFConfig, train_loader: DataLoader) -> nn.Module: - config = register_default_init_args(deepcopy(config), train_loader, nn.MSELoss) - model, _compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - return model - - class MockModel(nn.Module): def __init__(self): super().__init__() @@ -658,16 +627,6 @@ def create_dataloader_object_detection(*args, **kwargs): return create_dataloader_object_detection -def load_exported_onnx_version( - nncf_config: NNCFConfig, model: torch.nn.Module, path_to_storage_dir: Path, save_format: str = None -) -> onnx.ModelProto: - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - onnx_checkpoint_path = path_to_storage_dir / "model.onnx" - compression_ctrl.export_model(str(onnx_checkpoint_path), save_format=save_format) - model_proto = onnx.load_model(str(onnx_checkpoint_path)) - return model_proto - - HookType = TypeVar("HookType") diff --git a/tests/torch/modules/test_rnn.py b/tests/torch/modules/test_rnn.py deleted file mode 100644 index a8262c4ba27..00000000000 --- a/tests/torch/modules/test_rnn.py +++ /dev/null @@ -1,789 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import copy -import logging -import os -import sys -from collections import namedtuple -from functools import partial - -import onnx -import pytest -import torch -import torch.nn.functional as F -from torch import nn -from torch.autograd import Variable -from torch.nn.utils.rnn import PackedSequence - -from nncf.torch import nncf_model_input -from nncf.torch.dynamic_graph.context import TracingContext -from nncf.torch.dynamic_graph.io_handling import wrap_nncf_model_outputs_with_objwalk -from nncf.torch.layers import ITERATION_MODULES -from nncf.torch.layers import NNCF_RNN -from nncf.torch.layers import LSTMCellNNCF -from nncf.torch.model_creation import create_compressed_model -from nncf.torch.nncf_module_replacement import collect_modules_and_scopes_by_predicate -from nncf.torch.utils import get_model_device -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import get_empty_config -from tests.torch.helpers import get_grads -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.modules.seq2seq.gnmt import GNMT - -pytestmark = pytest.mark.legacy - - -def replace_lstm(model): - def replace_fn(module_): - if not isinstance(module_, nn.LSTM): - return module_ - device = get_model_device(module_) - custom_lstm = NNCF_RNN( - "LSTM", - input_size=module_.input_size, - hidden_size=module_.hidden_size, - num_layers=module_.num_layers, - bidirectional=module_.bidirectional, - batch_first=module_.batch_first, - dropout=module_.dropout, - bias=module_.bias, - ) - - def get_param_names(bias: bool) -> list[str]: - suffixes = ["ih", "hh"] - names = ["weight_" + suffix for suffix in suffixes] - if bias: - names += ["bias_" + suffix for suffix in suffixes] - return names - - for layer_idx in range(custom_lstm.num_layers): - for d in range(custom_lstm.num_directions): - for name in get_param_names(custom_lstm.bias): - suffix = "_reverse" if d == 1 else "" - param_name = name + f"_l{layer_idx}{suffix}" - param = getattr(module_, param_name) - getattr(custom_lstm, param_name).data.copy_(param.data) - custom_lstm.to(device) - return custom_lstm - - lstm_modules = collect_modules_and_scopes_by_predicate(model, lambda x: isinstance(x, nn.LSTM)) - if isinstance(model, nn.LSTM): - return replace_fn(model) - from nncf.torch.nncf_module_replacement import _replace_module_by_scope - - for module, scope_set in lstm_modules.items(): - replaced_module = replace_fn(module) - for scope in scope_set: - _replace_module_by_scope(model, scope, replaced_module) - return model - - -def clone_test_data(data_list) -> list[torch.Tensor]: - results = [] - x = data_list[0] - result = x if isinstance(x, PackedSequence) else x.clone() - results.append(result) - for tensor_list in data_list[1:]: - result = () - for tensor in tensor_list: - if isinstance(tensor, Variable): - sub_result = tensor.data.clone() - sub_result = Variable(sub_result, requires_grad=True) - else: - sub_result = tensor.clone() - result += (sub_result,) - results.append(result) - return results - - -LSTMTestSizes = namedtuple("LSTMTestSizes", ["input_size", "hidden_size", "batch", "seq_length"]) -LSTMTestData = namedtuple("LSTMTestData", ["x", "h0", "c0", "weight_ih", "weight_hh", "bias_ih", "bias_hh"]) - - -@pytest.mark.parametrize( - "sizes", - [LSTMTestSizes(512, 768, 128, 50), LSTMTestSizes(3, 3, 3, 3), LSTMTestSizes(1, 1, 1, 1)], - ids=lambda val: "[{}]".format("-".join([str(v) for v in val])), -) -class TestLSTMCell: - @staticmethod - def generate_lstm_data( - p: LSTMTestSizes, - num_layers: int = 1, - num_directions: int = 1, - variable_length: bool = False, - sorted_: bool = True, - batch_first: bool = True, - use_cuda: bool = False, - bias: bool = True, - empty_initial: bool = False, - is_backward: bool = False, - ) -> LSTMTestData: - num_chunks = 4 - seq_list = [] - if variable_length: - seq_lens = torch.IntTensor(p.batch).random_(1, p.seq_length + 1) - if sorted_: - seq_lens = torch.sort(seq_lens, descending=True).values - for seq_size in seq_lens: - seq_list.append(torch.randn(seq_size.item(), p.input_size)) - padded_seq_batch = torch.nn.utils.rnn.pad_sequence(seq_list, batch_first=batch_first) - x_data = torch.nn.utils.rnn.pack_padded_sequence( - padded_seq_batch, lengths=seq_lens, batch_first=batch_first, enforce_sorted=sorted_ - ) - - else: - size = (p.seq_length, p.batch, p.input_size) - if batch_first: - size = (p.batch, p.seq_length, p.input_size) - x_data = torch.randn(*size) - - def wrap_tensor(tensor): - wrapped = tensor - if use_cuda: - wrapped = wrapped.cuda() - if is_backward: - wrapped = Variable(wrapped, requires_grad=True) - return wrapped - - if use_cuda: - x_data = x_data.cuda() - h0, c0, wih, whh, bih, bhh = ([] for _ in range(6)) - for layer_ in range(num_layers): - for _ in range(num_directions): - layer_input_size = p.input_size if layer_ == 0 else p.hidden_size * num_directions - if not empty_initial: - h0.append(wrap_tensor(torch.randn(p.batch, p.hidden_size))) - c0.append(wrap_tensor(torch.randn(p.batch, p.hidden_size))) - wih.append(wrap_tensor(torch.rand(num_chunks * p.hidden_size, layer_input_size))) - whh.append(wrap_tensor(torch.rand(num_chunks * p.hidden_size, p.hidden_size))) - if bias: - bih.append(wrap_tensor(torch.rand(num_chunks * p.hidden_size))) - bhh.append(wrap_tensor(torch.rand(num_chunks * p.hidden_size))) - result = LSTMTestData(x_data, h0, c0, wih, whh, bih, bhh) - return result - - @staticmethod - def set_weights(cell: nn.LSTMCell, data: LSTMTestData): - for name in TestLSTM.get_param_names(bias=True): - param = getattr(data, name) - if param: - getattr(cell, name).data.copy_(param[0].data) - - def test_forward_lstm_cell(self, sizes, _seed): - p = sizes - ref_data = TestLSTMCell.generate_lstm_data(p, batch_first=False) - test_data = LSTMTestData(*clone_test_data(ref_data)) - - ref_rnn = nn.LSTMCell(p.input_size, p.hidden_size) - TestLSTMCell.set_weights(ref_rnn, ref_data) - test_rnn = LSTMCellNNCF(p.input_size, p.hidden_size) - TestLSTMCell.set_weights(test_rnn, test_data) - - for i in range(p.seq_length): - ref_result = ref_rnn(ref_data.x[i], (ref_data.h0[0], ref_data.c0[0])) - test_result = test_rnn(test_data.x[i], (test_data.h0[0], test_data.c0[0])) - for ref, test in list(zip(ref_result, test_result)): - torch.testing.assert_close(test, ref) - - def test_backward_lstm_cell(self, sizes, _seed): - p = sizes - ref_data = TestLSTMCell.generate_lstm_data(p, batch_first=False, is_backward=True) - with torch.no_grad(): - test_data = LSTMTestData(*clone_test_data(ref_data)) - - ref_rnn = nn.LSTMCell(p.input_size, p.hidden_size) - TestLSTMCell.set_weights(ref_rnn, ref_data) - test_rnn = LSTMCellNNCF(p.input_size, p.hidden_size) - TestLSTMCell.set_weights(test_rnn, test_data) - - for i in range(p.seq_length): - ref_result = ref_rnn(ref_data.x[i], (ref_data.h0[0], ref_data.c0[0])) - test_result = test_rnn(test_data.x[i], (test_data.h0[0], test_data.c0[0])) - ref_result[0].sum().backward() - test_result[0].sum().backward() - ref_grads = get_grads([ref_data.h0[0], ref_data.c0[0]]) - ref_grads += get_grads([ref_rnn.weight_ih, ref_rnn.weight_hh, ref_rnn.bias_ih, ref_rnn.bias_hh]) - test_grads = get_grads([ref_data.h0[0], ref_data.c0[0]]) - test_grads += get_grads([test_rnn.weight_ih, test_rnn.weight_hh, test_rnn.bias_ih, test_rnn.bias_hh]) - for ref, test in list(zip(test_grads, ref_grads)): - torch.testing.assert_close(test, ref) - - -def test_export_lstm_cell(tmp_path): - config = get_empty_config(model_size=1, input_sample_sizes=[1, 1]) - config["compression"] = {"algorithm": "quantization"} - register_bn_adaptation_init_args(config) - - model, algo = create_compressed_model_and_algo_for_test(LSTMCellNNCF(1, 1), config) - - test_path = str(tmp_path.joinpath("test.onnx")) - # Exporting the operator ::chunk to ONNX opset version 9 is not supported. - # Support for this operator was added in version 11 - algo.export_model(test_path, save_format="onnx_11") - assert os.path.exists(test_path) - - onnx_num = 0 - model = onnx.load(test_path) - - for node in model.graph.node: - if node.op_type == "FakeQuantize": - onnx_num += 1 - assert onnx_num == 11 - - -@pytest.mark.parametrize( - "sizes", - [LSTMTestSizes(512, 324, 128, 50), LSTMTestSizes(3, 3, 3, 3), LSTMTestSizes(1, 1, 1, 1)], - ids=lambda val: "[{}]".format("-".join([str(v) for v in val])), -) -@pytest.mark.parametrize("bidirectional", (True, False), ids=("bi", "uni")) -@pytest.mark.parametrize("bias", [True, False], ids=["bias", "no_bias"]) -@pytest.mark.parametrize("num_layers", [1, 2], ids=["single_layer", "stacked"]) -@pytest.mark.parametrize("batch_first", [True, False], ids=["batch_first", "seq_first"]) -@pytest.mark.parametrize( - ("variable_length", "sorted_"), - ([True, True], [True, False], [False, False]), - ids=["packed_sorted", "packed_unsorted", "not_packed"], -) -@pytest.mark.parametrize("empty_initial", [True, False], ids=["no_initial", "with_initial"]) -# TODO: dropout gives different result. Looks like different random seed on CPU -# @pytest.mark.parametrize('dropout', [0, 0.9], ids=['no_dropout', 'with_dropout']) -@pytest.mark.parametrize("dropout", [0], ids=["no_dropout"]) -class TestLSTM: - def test_forward_lstm( - self, - sizes, - bidirectional, - num_layers, - bias, - batch_first, - variable_length, - sorted_, - use_cuda, - empty_initial, - dropout, - _seed, - ): - if not torch.cuda.is_available() and use_cuda is True: - pytest.skip("Skipping CUDA test cases for CPU only setups") - num_directions = 2 if bidirectional else 1 - p = sizes - - ref_data = TestLSTMCell.generate_lstm_data( - p, num_layers, num_directions, variable_length, sorted_, batch_first, use_cuda, bias, empty_initial - ) - - ref_rnn = nn.LSTM( - input_size=p.input_size, - hidden_size=p.hidden_size, - num_layers=num_layers, - bidirectional=bidirectional, - batch_first=batch_first, - bias=bias, - dropout=dropout, - ) - self.set_ref_lstm_weights(ref_data, ref_rnn, num_layers, num_directions, bias) - ref_hidden = None if empty_initial else self.get_ref_lstm_hidden(ref_data) - - test_data = LSTMTestData(*clone_test_data(ref_data)) - - class ModelWrapper(nn.Module): - def __init__(self, lstm): - super().__init__() - self.lstm = lstm - - def forward(self, *input_): - return self.lstm(*input_) - - wrapped_ref_rnn = ModelWrapper(ref_rnn) - wrapped_test_rnn = replace_lstm(copy.deepcopy(wrapped_ref_rnn)) - test_rnn = wrapped_test_rnn.lstm - test_hidden = None if empty_initial else self.get_test_lstm_hidden(test_data) - - if use_cuda: - ref_rnn.cuda() - test_rnn.cuda() - ref_output, (ref_hn, ref_cn) = ref_rnn(ref_data.x, ref_hidden) - test_output, (test_hn, test_cn) = test_rnn(test_data.x, test_hidden) - - torch.testing.assert_close(test_hn[0], ref_hn[0], rtol=1e-3, atol=1e-4) - torch.testing.assert_close(test_cn[0], ref_cn[0], rtol=1e-3, atol=1e-4) - if variable_length: - torch.testing.assert_close(test_output.batch_sizes, ref_output.batch_sizes) - torch.testing.assert_close(test_output.data, ref_output.data, rtol=1e-2, atol=1e-3) - if not sorted_: - torch.testing.assert_close(test_output.sorted_indices, ref_output.sorted_indices) - torch.testing.assert_close(test_output.unsorted_indices, ref_output.unsorted_indices) - else: - torch.testing.assert_close(test_output, ref_output, rtol=9e-2, atol=15e-4) - - def test_backward_lstm( - self, - sizes, - bidirectional, - num_layers, - bias, - batch_first, - variable_length, - sorted_, - use_cuda, - empty_initial, - dropout, - _seed, - ): - if not torch.cuda.is_available() and use_cuda is True: - pytest.skip("Skipping CUDA test cases for CPU only setups") - num_directions = 2 if bidirectional else 1 - - p = sizes - - ref_data = TestLSTMCell.generate_lstm_data( - p, num_layers, num_directions, variable_length, sorted_, batch_first, use_cuda, bias, empty_initial, True - ) - - ref_rnn = nn.LSTM( - input_size=p.input_size, - hidden_size=p.hidden_size, - num_layers=num_layers, - bidirectional=bidirectional, - batch_first=batch_first, - bias=bias, - dropout=dropout, - ) - self.set_ref_lstm_weights(ref_data, ref_rnn, num_layers, num_directions, bias) - ref_hidden = None if empty_initial else self.get_ref_lstm_hidden(ref_data) - - test_data = LSTMTestData(*clone_test_data(ref_data)) - test_rnn = replace_lstm(copy.deepcopy(ref_rnn)) - test_hidden = None if empty_initial else self.get_test_lstm_hidden(test_data) - - if use_cuda: - ref_rnn.cuda() - test_rnn.cuda() - - ref_output, _ = ref_rnn(ref_data.x, ref_hidden) - test_output, _ = test_rnn(test_data.x, test_hidden) - - ref_output[0].sum().backward() - test_output[0].sum().backward() - - ref_grads = get_grads(self.flatten_nested_lists(ref_rnn.all_weights)) - test_grads = get_grads(self.flatten_nested_lists(test_rnn.all_weights)) - if not empty_initial: - # TODO: compare gradient of all hidden - ref_grads += get_grads([ref_data.h0[0], ref_data.c0[0]]) - test_grads += get_grads([test_hidden[0][0], test_hidden[1][0]]) - for ref, test in list(zip(test_grads, ref_grads)): - torch.testing.assert_close(test, ref, rtol=1e-1, atol=1e-1) - - @classmethod - def flatten_nested_lists(cls, nested_list: list) -> list[torch.Tensor]: - return [tensor for tensor_tuple in nested_list for tensor in tensor_tuple] - - @classmethod - def get_test_lstm_hidden(cls, data: LSTMTestData) -> list[tuple[torch.Tensor, ...]]: - result = [] - hidden_names = ["h0", "c0"] - for name in hidden_names: - hidden_list = getattr(data, name) - element = () - num_hidden = len(hidden_list) - for i in range(num_hidden): - element += (hidden_list[i],) - result.append(element) - return result - - @classmethod - def get_ref_lstm_hidden(cls, data: LSTMTestData) -> tuple[torch.Tensor, torch.Tensor]: - hidden = cls.get_test_lstm_hidden(data) - hidden_states = [torch.unsqueeze(tensor, dim=0) for tensor in hidden[0]] - cell_states = [torch.unsqueeze(tensor, dim=0) for tensor in hidden[1]] - return (torch.cat(hidden_states, dim=0), torch.cat(cell_states, dim=0)) - - @classmethod - def set_ref_lstm_weights( - cls, data: LSTMTestData, nn_lstm: nn.LSTM, num_layers: int, num_directions: int, bias: bool - ): - for layer_idx in range(num_layers): - for d in range(num_directions): - i = layer_idx * num_directions + d - for name in cls.get_param_names(bias): - suffix = "_reverse" if d == 1 else "" - param = getattr(data, name) - param_name = name + f"_l{layer_idx}{suffix}" - getattr(nn_lstm, param_name).data.copy_(param[i].data) - - @classmethod - def get_param_names(cls, bias: bool) -> list[str]: - suffixes = ["ih", "hh"] - names = ["weight_" + suffix for suffix in suffixes] - if bias: - names += ["bias_" + suffix for suffix in suffixes] - return names - - -def test_export_stacked_bi_lstm(tmp_path): - p = LSTMTestSizes(3, 3, 3, 3) - config = get_empty_config(input_sample_sizes=[1, p.hidden_size, p.input_size]) - config["compression"] = {"algorithm": "quantization"} - register_bn_adaptation_init_args(config) - - # TODO: batch_first=True fails with building graph: ambiguous call to mul or sigmoid - test_rnn = NNCF_RNN( - "LSTM", input_size=p.input_size, hidden_size=p.hidden_size, num_layers=2, bidirectional=True, batch_first=False - ) - model, algo = create_compressed_model_and_algo_for_test(test_rnn, config) - - test_path = str(tmp_path.joinpath("test.onnx")) - # Exporting the operator ::chunk to ONNX opset version 9 is not supported. - # Support for this operator was added in version 11 - algo.export_model(test_path, save_format="onnx_11") - assert os.path.exists(test_path) - - onnx_num = 0 - - model = onnx.load(test_path) - for node in model.graph.node: - if node.op_type == "FakeQuantize": - onnx_num += 1 - assert onnx_num == 42 - - -class TestNumberOfNodes: - logging.basicConfig(level=logging.INFO, stream=sys.stdout) - - def test_number_of_calling_fq_for_lstm(self): - p = LSTMTestSizes(1, 1, 1, 5) - num_layers = 2 - bidirectional = True - num_directions = 2 if bidirectional else 1 - bias = True - batch_first = False - config = get_empty_config(input_sample_sizes=[p.seq_length, p.batch, p.input_size]) - config["compression"] = {"algorithm": "quantization", "quantize_inputs": True} - register_bn_adaptation_init_args(config) - - test_data = TestLSTMCell.generate_lstm_data(p, num_layers, num_directions, bias=bias, batch_first=batch_first) - - test_rnn = NNCF_RNN( - "LSTM", - input_size=p.input_size, - hidden_size=p.hidden_size, - num_layers=num_layers, - bidirectional=bidirectional, - bias=bias, - batch_first=batch_first, - ) - TestLSTM.set_ref_lstm_weights(test_data, test_rnn, num_layers, num_directions, bias) - test_hidden = TestLSTM.get_test_lstm_hidden(test_data) - - model, algo = create_compressed_model_and_algo_for_test(test_rnn, config) - - class Counter: - def __init__(self): - self.count = 0 - - def next(self): - self.count += 1 - - def hook(model, input_, counter): - counter.next() - - counters = {} - counter_for_input_quantizer = None - inter_layer_reset_point_post_aq_counters = {} - for name, quantizer in algo.all_quantizations.items(): - counter = Counter() - quantizer.register_forward_pre_hook(partial(hook, counter=counter)) - if str(name) == "/nncf_model_input_0|OUTPUT": - counter_for_input_quantizer = counter - continue - if "RNNResetPoint" in str(name): - inter_layer_reset_point_post_aq_counters[name] = counter - continue - counters[name] = counter - _ = model(test_data.x, test_hidden) - - # NB: below may always fail in debug due to superfluous 'cat' nodes - assert model.nncf.get_graph().get_nodes_count() == 120 - assert len(counters) + 2 == 42 # 8 WQ + 32 AQ + 1 input AQ + 1 reset point AQ - for counter in counters.values(): - assert counter.count == p.seq_length - assert counter_for_input_quantizer.count == 1 - for counter in inter_layer_reset_point_post_aq_counters.values(): - assert counter.count == 1 - - @pytest.mark.skip(reason="Sporadic failures") - def test_number_of_calling_fq_for_gnmt(self): - if torch.cuda.is_available(): - torch.cuda.set_device(0) - device = torch.device("cuda") - else: - device = torch.device("cpu") - batch_first = False - vocab_size = 32000 - model_config = { - "hidden_size": 100, - "vocab_size": vocab_size, - "num_layers": 4, - "dropout": 0.2, - "batch_first": batch_first, - "share_embedding": True, - } - batch_size = 128 - sequence_size = 50 - input_sample_size = [batch_size, sequence_size] if batch_first else [sequence_size, batch_size] - config = get_empty_config(input_sample_sizes=input_sample_size) - config["compression"] = {"algorithm": "quantization", "quantize_inputs": True} - config["scopes_without_shape_matching"] = [ - "GNMT/ResidualRecurrentDecoder[decoder]/RecurrentAttention[att_rnn]/BahdanauAttention[attn]", - ] - register_bn_adaptation_init_args(config) - - model = GNMT(**model_config) - model = replace_lstm(model) - model.to(device) - - def dummy_forward_fn(model, seq_len=sequence_size): - def gen_packed_sequence(): - seq_list = [] - seq_lens = torch.LongTensor(batch_size).random_(1, seq_len + 1) - seq_lens = torch.sort(seq_lens, descending=True).values - for seq_size in seq_lens: - seq_list.append(torch.LongTensor(seq_size.item()).random_(1, vocab_size).to(device)) - padded_seq_batch = torch.nn.utils.rnn.pad_sequence(seq_list, batch_first=batch_first) - return padded_seq_batch, seq_lens - - x_data, seq_lens = gen_packed_sequence() - input_encoder = x_data - input_enc_len = seq_lens.to(device) - input_decoder = gen_packed_sequence()[0] - wrap_nncf_model_outputs_with_objwalk(model(input_encoder, input_enc_len, input_decoder)) - - def gnmt_wrap_inputs_fn(model_args, model_kwargs): - # Assuming 3 args to wrap: input_encoder, input_enc_len, input_decoder, and 0 kwargs to wrap - model_args = ( - nncf_model_input(model_args[0]), - nncf_model_input(model_args[1]), - nncf_model_input(model_args[2]), - ) - return model_args, model_kwargs - - algo, model = create_compressed_model( - model, config, dummy_forward_fn=dummy_forward_fn, wrap_inputs_fn=gnmt_wrap_inputs_fn, dump_graphs=False - ) - model.to(device) - - class Counter: - def __init__(self): - self.count = 0 - - def next(self): - self.count += 1 - - def hook(model, input_, counter): - counter.next() - - counters = {} - for name, quantizer in algo.all_quantizations.items(): - counter = Counter() - counters[str(name)] = counter - quantizer.register_forward_pre_hook(partial(hook, counter=counter)) - dummy_forward_fn(model) - - assert ( - model.nncf.get_graph().get_nodes_count() == 370 - ) # NB: may always fail in debug due to superfluous 'cat' nodes - assert len(counters) == 136 - ref_call_counts = { - "cell": sequence_size, - "LSTMCellForwardNNCF": sequence_size, - # embedding module is shared between the decoder and encoder, - # associated weight quantizer will be called twice - "embedding": 2, - # unified scales for 4 FQ - "NNCF_RNN[0]/StackedRNN[rnn_impl]/StackedRNNResetPoint/cat_0|OUTPUT": 4, - } - for name, counter in counters.items(): - print(name, counter.count) - for ref_key, ref_count in ref_call_counts.items(): - if ref_key in name: - assert counter.count == ref_count, name - break - new_seq_len = int(sequence_size / 2) - dummy_forward_fn(model, new_seq_len) - - ref_call_counts = { - "cell": sequence_size + new_seq_len, - "LSTMCellForwardNNCF": sequence_size + new_seq_len, - "embedding": 4, - "NNCF_RNN[0]/StackedRNN[rnn_impl]/StackedRNNResetPoint/cat_0|OUTPUT": 8, - } - assert model.nncf.get_graph().get_nodes_count() == 370 - assert len(counters) == 136 - for name, counter in counters.items(): - for ref_key, ref_count in ref_call_counts.items(): - if ref_key in name: - assert counter.count == ref_count, name - break - - def test_number_of_nodes_for_module_in_loop(self): - num_iter = 5 - - class LoopModule(nn.Module): - @ITERATION_MODULES.register("Inner") - class Inner(nn.Module): - def __init__(self): - super().__init__() - self.operator1 = torch.sigmoid - self.operator2 = torch.tanh - - def forward(self, x): - s = self.operator1(x) - t = self.operator2(x) - result = t + s - return result - - @staticmethod - def nodes_number(): - return 3 - - def __init__(self): - super().__init__() - self.inner = self.Inner() - - def forward(self, x): - for _ in range(num_iter): - x = self.inner(x) - return x - - def nodes_number(self): - return self.inner.nodes_number() - - test_module = LoopModule() - context = TracingContext() - context.enable_trace_dynamic_graph() - with context as ctx: - _ = test_module(torch.zeros(1)) - assert ctx.graph.get_nodes_count() == test_module.nodes_number() - - def test_number_of_nodes_for_module_in_loop__not_input_node(self): - num_iter = 5 - - class LoopModule(nn.Module): - class Inner(nn.Module): - def forward(self, x): - s = F.sigmoid(x) - t = F.tanh(x) - result = F.sigmoid(x) * t + F.tanh(x) * s - return result - - @staticmethod - def nodes_number(): - return 7 - - def __init__(self): - super().__init__() - self.inner = self.Inner() - - def forward(self, x): - for _ in range(num_iter): - x = self.inner(F.relu(x)) - return x - - def nodes_number(self): - return self.inner.nodes_number() + num_iter - - test_module = LoopModule() - context = TracingContext() - context.enable_trace_dynamic_graph() - with context as ctx: - _ = test_module(torch.zeros(1)) - assert ctx.graph.get_nodes_count() == test_module.nodes_number() - - def test_number_of_nodes_for_module_with_nested_loops(self): - num_iter = 5 - - class TestIterModule(nn.Module): - @ITERATION_MODULES.register() - class TestIterModule_ResetPoint(nn.Module): - def __init__(self, loop_module): - super().__init__() - self.loop_module = loop_module - - def forward(self, x): - return self.loop_module(F.relu(x)) - - def __init__(self): - super().__init__() - self.loop_module = self.LoopModule2() - self.reset_point = self.TestIterModule_ResetPoint(self.loop_module) - - def forward(self, x): - for _ in range(num_iter): - x = self.reset_point(x) - return x - - class LoopModule2(nn.Module): - @ITERATION_MODULES.register() - class LoopModule2_ResetPoint(nn.Module): - def __init__(self, inner): - super().__init__() - self.inner = inner - - def forward(self, x): - return self.inner(F.relu(x)) - - def __init__(self): - super().__init__() - self.inner = self.Inner() - self.reset_helper = self.LoopModule2_ResetPoint(self.inner) - - def forward(self, x): - for _ in range(num_iter): - self.reset_helper(x) - return x - - class Inner(nn.Module): - def forward(self, x): - s = F.sigmoid(x) - t = F.tanh(x) - result = t + s - return result - - test_module = TestIterModule() - context = TracingContext() - context.enable_trace_dynamic_graph() - with context as ctx: - _ = test_module(torch.zeros(1)) - assert ctx.graph.get_nodes_count() == num_iter - - def test_number_of_nodes_for_repeated_module(self): - class LoopModule(nn.Module): - def __init__(self): - super().__init__() - self.operator = F.relu - self.layers = nn.ModuleList([nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)]) - - def forward(self, x): - for layer in self.layers: - x = F.relu(layer(x)) - return x - - test_module = LoopModule() - context = TracingContext() - context.enable_trace_dynamic_graph() - with context as ctx: - x = test_module(torch.zeros(1, 1, 1, 1)) - assert ctx.graph.get_nodes_count() == 4 # NB: may always fail in debug due to superfluous 'cat' nodes - _ = test_module(x) - assert ctx.graph.get_nodes_count() == 8 # NB: may always fail in debug due to superfluous 'cat' nodes diff --git a/tests/torch/nncf_network/test_nncf_network.py b/tests/torch/nncf_network/test_nncf_network.py index cedf444941f..426bcfa50e4 100644 --- a/tests/torch/nncf_network/test_nncf_network.py +++ b/tests/torch/nncf_network/test_nncf_network.py @@ -22,7 +22,6 @@ from torch import nn from torch.nn.utils import weight_norm -import nncf from nncf import nncf_logger from nncf.common.graph import NNCFNode from nncf.common.graph.operator_metatypes import UnknownMetatype @@ -31,8 +30,6 @@ from nncf.torch.dynamic_graph.io_handling import ExampleInputInfo from nncf.torch.dynamic_graph.io_handling import FillerInputElement from nncf.torch.dynamic_graph.io_handling import FillerInputInfo -from nncf.torch.dynamic_graph.operation_address import OperationAddress -from nncf.torch.dynamic_graph.scope import Scope from nncf.torch.dynamic_graph.trace_tensor import TracedTensor from nncf.torch.graph.graph import PTNNCFGraph from nncf.torch.graph.graph_builder import GraphBuilder @@ -45,7 +42,6 @@ from nncf.torch.nncf_module_replacement import replace_modules_by_nncf_modules from nncf.torch.nncf_network import NNCFNetwork from nncf.torch.nncf_network import PTInsertionPoint -from nncf.torch.nncf_network import PTInsertionType from tests.torch.helpers import BasicConvTestModel from tests.torch.helpers import TwoConvTestModel from tests.torch.helpers import check_correct_nncf_modules_replacement @@ -466,15 +462,6 @@ def forward(self, x): return x1, x2 -def test_insertion_point_target_point_translation(): - op_address = OperationAddress("dummy", Scope(), 0) - for target_type in [PTInsertionType.NNCF_MODULE_POST_OP, TargetType.AFTER_LAYER]: - with pytest.raises(nncf.InternalError): - PTInsertionPoint(target_type, op_address) - target_type = TargetType.POST_LAYER_OPERATION - assert PTInsertionPoint(target_type, op_address).insertion_type == PTInsertionType.NNCF_MODULE_POST_OP - - class IndirectModuleCaller(nn.Module): def __init__(self, module_for_indirection: torch.nn.Module): super().__init__() diff --git a/tests/torch/pytorch_patch_isolated.py b/tests/torch/pytorch_patch_isolated.py deleted file mode 100644 index 3929edd9dd5..00000000000 --- a/tests/torch/pytorch_patch_isolated.py +++ /dev/null @@ -1,115 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import contextlib -import inspect -import os -import re - -import pytest -import torch - -from tests.cross_fw.shared.isolation_runner import ISOLATION_RUN_ENV_VAR - - -def clean_source_code(code_source): - # clean source code from comments and annotation - patterns = [ - r"\s*#.*", - r": Callable\[P, R\]", - r" -> Callable\[P, R\]", - r": P.args", - r": P.kwargs", - r" -> R", - ] - for pattern in patterns: - code_source = re.sub(pattern, "", code_source) - # remove empty lines - code_source = re.sub(r"\n\s*\n", "\n", code_source, flags=re.MULTILINE) - return code_source - - -@pytest.mark.skipif(ISOLATION_RUN_ENV_VAR not in os.environ, reason="Should be run via isolation proxy") -def test_jit_if_tracing_script_source_equals(): - # Get original torch.jit._script_if_tracing source - torch_source = clean_source_code(inspect.getsource(torch.jit._script_if_tracing)) - - import nncf.torch # noqa: F401 - - # Get torch.jit._script_if_tracing source after patching was performed - nncf_source = clean_source_code(inspect.getsource(torch.jit._script_if_tracing)) - - # Check that the two versions are essentially the same - nncf_source_corrected = nncf_source.replace("def torch_jit_script_if_tracing", "def _script_if_tracing").replace( - "torch.jit.script", "script" - ) - assert torch_source == nncf_source_corrected - - -class DummyModel(torch.nn.Module): - def forward(self, x): - return x - - -@pytest.mark.skipif(ISOLATION_RUN_ENV_VAR not in os.environ, reason="Should be run via isolation proxy") -def test_jit_script_exception_preserves_patching_isolated(): - from nncf import NNCFConfig - from nncf.torch import create_compressed_model - - _, compressed_model = create_compressed_model( - DummyModel(), - NNCFConfig.from_dict( - {"input_info": {"sample_size": [1, 3, 32, 32]}, "compression": {"algorithm": "quantization"}} - ), - ) - - with contextlib.suppress(Exception): - torch.jit.script(compressed_model) # supposed to fail since torch.jit.script does not support NNCF models - - # torch.nn.Module.__call__ is one of the fundamental patched functions, if the code object points to NNCF code, - # then it means patching is still present - assert "nncf" in torch.nn.Module.__call__.__code__.co_filename - - -def compile_and_run_test_model(compile_forward: bool) -> torch.Tensor: - class TestModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv = torch.nn.Conv2d(3, 3, 3) - - def forward(self, x): - return self.conv(x) - - model = TestModel() - - torch.manual_seed(0) - state_dict = {} - for k, v in model.state_dict().items(): - state_dict[k] = torch.rand(v.shape) - model.load_state_dict(state_dict) - - if compile_forward: - compiled_model = model - compiled_model.forward = torch.compile(model.forward) - else: - compiled_model = torch.compile(model) - assert "_torchdynamo_orig_callable" in compiled_model.forward.__dict__ - return compiled_model(torch.rand([1, 3, 5, 5])) - - -@pytest.mark.skipif(ISOLATION_RUN_ENV_VAR not in os.environ, reason="Should be run via isolation proxy") -def test_compile(): - compile_forward = os.environ.get("COMPILE_FORWARD", None) == "1" - before_nncf = compile_and_run_test_model(compile_forward) - import nncf.torch # noqa: F401 - - after_nncf = compile_and_run_test_model(compile_forward) - assert torch.allclose(before_nncf, after_nncf) diff --git a/tests/torch/quantization/test_adjust_padding.py b/tests/torch/quantization/test_adjust_padding.py deleted file mode 100644 index 83a19a62557..00000000000 --- a/tests/torch/quantization/test_adjust_padding.py +++ /dev/null @@ -1,237 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os - -import pytest -import torch - -from nncf.common.quantization.quantizer_propagation.solver import QuantizerPropagationRule -from nncf.common.quantization.quantizer_propagation.solver import QuantizerPropagationSolver -from nncf.torch.hardware.config import PTHWConfig -from nncf.torch.layers import NNCFConv2d -from nncf.torch.module_operations import UpdatePaddingValue -from nncf.torch.module_operations import UpdateWeight -from nncf.torch.quantization.adjust_padding import CalculatePaddingAdjustment -from nncf.torch.quantization.layers import SymmetricQuantizer -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_conv -from tests.torch.helpers import get_empty_config -from tests.torch.helpers import load_exported_onnx_version -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.quantization.test_hawq_precision_init import check_bitwidth_graph -from tests.torch.test_compressed_graph import GeneralModelDesc -from tests.torch.test_models.synthetic import MultiBranchesModel - -pytestmark = pytest.mark.legacy - - -class MultiBranchesModelDesc(GeneralModelDesc): - NUM_WEIGHTS = 5 - NUM_ACTIVATIONS = 2 - - def __init__(self, name: str): - super().__init__(input_sample_sizes=[2, 3, 4, 4], model_name=name, model_builder=MultiBranchesModel) - self._config = get_empty_config(input_sample_sizes=self.input_sample_sizes) - self._config_update = { - "compression": { - "algorithm": "quantization", - "scope_overrides": { - "activations": {"MultiBranchesModel/NNCFConv2d[conv_a]/conv2d_0": {"per_channel": True}} - }, - } - } - self._hw_config = False - self.custom_hw_config_dict = None - self.propagation_strategy = QuantizerPropagationRule.MERGE_ALL_IN_ONE - - def requant_prop_strategy(self): - self.propagation_strategy = QuantizerPropagationRule.MERGE_WITH_POTENTIAL_REQUANTIZATION - return self - - @staticmethod - def _get_scopes(): - w_scopes = [ - "MultiBranchesModel/NNCFConv2d[conv_a]/conv2d_0|WEIGHT", - "MultiBranchesModel/NNCFConv2d[conv_b]/conv2d_0|WEIGHT", - "MultiBranchesModel/NNCFConv2d[conv_c]/conv2d_0|WEIGHT", - "MultiBranchesModel/NNCFConv2d[conv_d]/conv2d_0|WEIGHT", - ] - a_scopes = [ - "MultiBranchesModel/NNCFConv2d[conv_a]/conv2d_0|INPUT0", - "MultiBranchesModel/MaxPool2d[max_pool_b]/max_pool2d_0|INPUT0", - "MultiBranchesModel/NNCFConv2d[conv_c]/conv2d_0|INPUT0", - "MultiBranchesModel/NNCFConv2d[conv_d]/conv2d_0|INPUT0", - ] - return w_scopes, a_scopes - - def trial(self, num_bits_for_weights: int = 8, num_bits_for_activations: int = 8): - self._config_update["target_device"] = "TRIAL" - trial_config = { - "activations": { - "mode": "symmetric", - "bits": num_bits_for_activations, - "per_channel": False, - }, - "weights": { - "mode": "symmetric", - "bits": num_bits_for_weights, - "per_channel": False, - }, - } - self._config_update["compression"].update(trial_config) - return self - - def npu(self): - self._config_update["target_device"] = "NPU" - return self - - def custom_hw(self): - custom_hw_config_dict = { - "target_device": "NPU", - "config": { - "quantization": { - "q4": {"bits": 4, "mode": "symmetric", "granularity": "pertensor", "narrow_range": False}, - } - }, - "operations": [ - {"type": "Convolution", "quantization": {"activations": "q4", "weights": "q4"}}, - { - "type": "DepthWiseConvolution", - "attributes": {"adjust_padding": True}, - "quantization": {"activations": "q4", "weights": "q4"}, - }, - ], - } - self.custom_hw_config_dict = custom_hw_config_dict - # The common scope overrides conflict with the custom HW config here: - del self._config_update["compression"]["scope_overrides"] - return self - - def manual_precision(self, num_bits_for_weights: list[int], num_bits_for_activations: list[int]): - scopes_factory = self._get_scopes - w_scopes, a_scopes = scopes_factory() - bitwidth_per_scope = list(map(list, zip(num_bits_for_weights, w_scopes))) - bitwidth_per_scope.extend(list(map(list, zip(num_bits_for_activations, a_scopes)))) - init_config = {"initializer": {"precision": {"type": "manual", "bitwidth_per_scope": bitwidth_per_scope}}} - self._config_update["compression"].update(init_config) - return self - - def get_config(self): - self._config.update(self._config_update) - self._config["compression"].update() - return self._config - - -ADJUST_PAD_DESC_LIST = [ - MultiBranchesModelDesc(name="npu_all_int8").npu(), - MultiBranchesModelDesc(name="npu_all_weights_int8").npu().manual_precision([8, 8, 8, 8], [8, 4, 4, 4]), - MultiBranchesModelDesc(name="npu_all_activations_int8").npu().manual_precision([8, 4, 4, 4], [8, 8, 8, 4]), - MultiBranchesModelDesc(name="npu_bd_int8").npu().manual_precision([4, 4, 4, 4], [8, 8, 4, 8]), - MultiBranchesModelDesc(name="npu_max_int4").npu().manual_precision([4, 4, 4, 4], [8, 4, 4, 4]), - MultiBranchesModelDesc(name="npu_all_int8_requnt").npu().requant_prop_strategy(), - MultiBranchesModelDesc(name="npu_all_weights_int8_requnt") - .npu() - .manual_precision([8, 8, 8, 8], [8, 4, 4, 4]) - .requant_prop_strategy(), - MultiBranchesModelDesc(name="npu_all_activations_int8_requnt") - .npu() - .manual_precision([8, 4, 4, 4], [8, 8, 8, 4]) - .requant_prop_strategy(), - MultiBranchesModelDesc(name="npu_bd_int8_requnt") - .npu() - .manual_precision([4, 4, 4, 4], [8, 8, 4, 8]) - .requant_prop_strategy(), - MultiBranchesModelDesc(name="npu_max_int4_requnt") - .npu() - .manual_precision([4, 4, 4, 4], [8, 4, 4, 4]) - .requant_prop_strategy(), - MultiBranchesModelDesc(name="custom").custom_hw(), -] - - -@pytest.mark.parametrize("desc", ADJUST_PAD_DESC_LIST, ids=[m.model_name for m in ADJUST_PAD_DESC_LIST]) -def test_adjust_padding_on_synthetic_models(desc: MultiBranchesModelDesc, mocker, monkeypatch): - if desc.propagation_strategy == QuantizerPropagationRule.MERGE_WITH_POTENTIAL_REQUANTIZATION: - pytest.xfail(reason="Ticket: 175018") - model = desc.get_model() - config = desc.get_config() - register_bn_adaptation_init_args(config) - - if desc.custom_hw_config_dict: - hw_config_from_json = mocker.patch("nncf.common.hardware.config.HWConfig.from_json") - hw_config_from_json.return_value = PTHWConfig.from_dict(desc.custom_hw_config_dict) - - monkeypatch.setattr(QuantizerPropagationSolver, "DEFAULT_PROPAGATION_STRATEGY", desc.propagation_strategy) - - model, algo_ctrl = create_compressed_model_and_algo_for_test(model, config) - - check_bitwidth_graph(algo_ctrl, model, desc.get_dot_filename(), os.path.join("quantized", "adjust_paddings")) - - -def test_onnx_export_to_fake_quantize_with_adjust_pad(tmp_path): - desc = MultiBranchesModelDesc(name="npu_max_int4").npu().manual_precision([4, 4, 4, 4], [8, 4, 4, 4]) - model = desc.get_model() - nncf_config = desc.get_config() - register_bn_adaptation_init_args(nncf_config) - - onnx_model_proto = load_exported_onnx_version( - nncf_config, model, path_to_storage_dir=tmp_path, save_format="onnx_10" - ) - num_fq = 0 - num_model_nodes = 0 - num_adjust_pad_nodes = 0 - num_other_nodes = 0 - - for node in onnx_model_proto.graph.node: - op_type = node.op_type - if op_type == "FakeQuantize": - num_fq += 1 - elif op_type in ["Conv", "Constant", "Relu", "MaxPool"]: - num_model_nodes += 1 - elif op_type in ["Pad"]: - pad_value_attr = node.attribute[2] - assert pad_value_attr.f == 0.5 - num_adjust_pad_nodes += 1 - else: - num_other_nodes += 1 - print(op_type) - assert num_fq == 8 - assert num_other_nodes == 0 - - -def test_adjust_padding_via_mixin_module(mocker): - input_ = torch.ones([1, 1, 1, 1]) - ref_output_without_pre_ops = torch.Tensor([[[[4]]]]) - ref_output_with_update_weight = torch.Tensor([[[[3]]]]) - ref_output_with_update_weight_and_pad = torch.Tensor([[[[23]]]]) - - conv = create_conv(in_channels=1, out_channels=1, kernel_size=3, weight_init=1, bias_init=2, padding=1) - nncf_conv = NNCFConv2d.from_module(conv) - assert nncf_conv.get_padding_value_ref().item() == 0 - - act_output = nncf_conv(input_) - assert torch.all(torch.eq(act_output, ref_output_without_pre_ops)) - - uw = UpdateWeight(lambda x: torch.ones([1, 1, 3, 3])) - nncf_conv.register_pre_forward_operation(uw) - act_output = nncf_conv(input_) - assert torch.all(torch.eq(act_output, ref_output_with_update_weight)) - - quantizer_stub = mocker.MagicMock(spec=SymmetricQuantizer) - quantizer_stub.scale = torch.Tensor([4]) - quantizer_stub.eps = torch.Tensor([1]) - ap = CalculatePaddingAdjustment(quantizer_stub) - upv = UpdatePaddingValue(ap) - nncf_conv.register_pre_forward_operation(upv) - act_output = nncf_conv(input_) - assert nncf_conv.get_padding_value_ref().item() == 2.5 - assert torch.all(torch.eq(act_output, ref_output_with_update_weight_and_pad)) diff --git a/tests/torch/quantization/test_algo_quantization.py b/tests/torch/quantization/test_algo_quantization.py deleted file mode 100644 index 6196ebd6499..00000000000 --- a/tests/torch/quantization/test_algo_quantization.py +++ /dev/null @@ -1,898 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from collections import Counter -from copy import deepcopy - -import pytest -import torch -import torch.nn.functional as F -import torch.utils.data -from torch import autocast -from torch import nn -from torchvision.models import resnet50 -from torchvision.models import squeezenet1_1 - -from nncf import NNCFConfig -from nncf.api.compression import CompressionScheduler -from nncf.common.hardware.config import HWConfigType -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.common.utils.debug import nncf_debug -from nncf.torch import create_compressed_model -from nncf.torch import register_default_init_args -from nncf.torch import register_module -from nncf.torch.checkpoint_loading import load_state -from nncf.torch.compression_method_api import PTCompressionLoss -from nncf.torch.dynamic_graph.scope import Scope -from nncf.torch.dynamic_graph.scope import ScopeElement -from nncf.torch.graph.transformations.commands import ExtraCompressionModuleType -from nncf.torch.layers import NNCFConv2d -from nncf.torch.model_creation import create_compression_algorithm_builder -from nncf.torch.module_operations import UpdateInputs -from nncf.torch.module_operations import UpdateWeight -from nncf.torch.quantization.algo import QuantizationBuilder -from nncf.torch.quantization.algo import QuantizationController -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.layers import AsymmetricQuantizer -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import PTQuantizerSpec -from nncf.torch.quantization.layers import SymmetricQuantizer -from nncf.torch.utils import get_all_modules_by_type -from nncf.torch.utils import get_model_device -from tests.torch.helpers import BasicConvTestModel -from tests.torch.helpers import LeNet -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_ones_mock_dataloader -from tests.torch.helpers import create_random_mock_dataloader -from tests.torch.helpers import get_empty_config -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.quantization.quantization_helpers import get_quantization_config_without_range_init -from tests.torch.quantization.quantization_helpers import get_squeezenet_quantization_config - -pytestmark = pytest.mark.legacy - - -def compare_qspecs(qspec: PTQuantizerSpec, quantizer: BaseQuantizer): - assert qspec.narrow_range == quantizer.narrow_range - assert qspec.num_bits == quantizer.num_bits - assert isinstance(quantizer, QUANTIZATION_MODULES.get(qspec.mode)) - assert qspec.scale_shape == quantizer.scale_shape - - assert qspec.signedness_to_force == quantizer._signedness_to_force - - -def test_quantization_configs__with_defaults(): - model = BasicConvTestModel() - config = get_quantization_config_without_range_init() - config["compression"]["overflow_fix"] = "disable" - register_bn_adaptation_init_args(config) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - assert isinstance(compression_ctrl, QuantizationController) - weight_quantizers = compression_ctrl.weight_quantizers - activation_quantizer_infos = compression_ctrl.non_weight_quantizers - - ref_weight_qspec = PTQuantizerSpec( - num_bits=8, - mode=QuantizationMode.SYMMETRIC, - signedness_to_force=True, - narrow_range=True, - half_range=False, - scale_shape=model.wq_scale_shape_per_channel, - logarithm_scale=False, - ) - for wq_info in weight_quantizers.values(): - compare_qspecs(ref_weight_qspec, wq_info.quantizer_module_ref) - - ref_activation_qspec = PTQuantizerSpec( - num_bits=8, - mode=QuantizationMode.SYMMETRIC, - signedness_to_force=None, - narrow_range=False, - half_range=False, - scale_shape=(1,), - logarithm_scale=False, - ) - for aq_info in activation_quantizer_infos.values(): - compare_qspecs(ref_activation_qspec, aq_info.quantizer_module_ref) - - -def test_quantization_configs__custom(): - model = BasicConvTestModel() - - config = get_quantization_config_without_range_init() - config["compression"].update( - { - "weights": {"mode": "asymmetric", "per_channel": True, "bits": 4}, - "activations": { - "mode": "asymmetric", - "bits": 4, - "signed": True, - }, - } - ) - config["target_device"] = "TRIAL" - register_bn_adaptation_init_args(config) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - assert isinstance(compression_ctrl, QuantizationController) - weight_quantizers = compression_ctrl.weight_quantizers - activation_quantizer_infos = compression_ctrl.non_weight_quantizers - - ref_weight_qspec = PTQuantizerSpec( - num_bits=4, - mode=QuantizationMode.ASYMMETRIC, - signedness_to_force=None, - scale_shape=model.wq_scale_shape_per_channel, - narrow_range=False, - half_range=False, - logarithm_scale=False, - ) - for wq_info in weight_quantizers.values(): - compare_qspecs(ref_weight_qspec, wq_info.quantizer_module_ref) - - ref_activation_qspec = PTQuantizerSpec( - num_bits=4, - mode=QuantizationMode.ASYMMETRIC, - signedness_to_force=True, - scale_shape=(1,), - narrow_range=False, - half_range=False, - logarithm_scale=False, - ) - - for aq_info in activation_quantizer_infos.values(): - compare_qspecs(ref_activation_qspec, aq_info.quantizer_module_ref) - - -def compare_weights_activation_quantizers_pairs( - actual_pairs: list[tuple[list[WeightQuantizerId], NonWeightQuantizerId]], algo, ref_pair_names, model_name -): - def get_wq_name(name): - return "/".join([model_name, name]) - - def get_aq_name(name): - if name == "/nncf_model_input_0": - return name + "|OUTPUT" - return "/".join([model_name, name]) + "|OUTPUT" - - all_quantizations = {str(key): quantizer for key, quantizer in algo.all_quantizations.items()} - assert len(actual_pairs) == len(ref_pair_names) - for (wq_ids, aq_id), (wqs_names, aq_name) in zip(actual_pairs, ref_pair_names): - wqs = [algo.all_quantizations[wq_id] for wq_id in wq_ids] - aq = algo.all_quantizations[aq_id] - assert not aq.narrow_range - assert aq == all_quantizations[get_aq_name(aq_name)] - ref_weight_quantizers = [all_quantizations[get_wq_name(name)] for name in wqs_names] - for weight_quantizer in wqs: - assert weight_quantizer.narrow_range - assert weight_quantizer in ref_weight_quantizers - - -def test_can_load_quant_algo__with_defaults(): - model = BasicConvTestModel() - config = get_quantization_config_without_range_init() - register_bn_adaptation_init_args(config) - builder = create_compression_algorithm_builder(config) - assert isinstance(builder, QuantizationBuilder) - - quant_model, _ = create_compressed_model_and_algo_for_test(deepcopy(model), config) - - model_conv = get_all_modules_by_type(model, "Conv2d") - quant_model_conv = get_all_modules_by_type(quant_model, "NNCFConv2d") - assert len(model_conv) == len(quant_model_conv) - - for module_scope in model_conv: - quant_scope: Scope = deepcopy(module_scope) - quant_scope.pop() - quant_scope.push(ScopeElement("NNCFConv2d", "conv")) - assert quant_scope in quant_model_conv - - store = [] - for op in quant_model_conv[quant_scope].pre_ops.values(): - if isinstance(op, (UpdateInputs, UpdateWeight)) and isinstance(op.operand, SymmetricQuantizer): - assert op.__class__.__name__ not in store - store.append(op.__class__.__name__) - assert UpdateWeight.__name__ in store - - -def test_can_create_quant_loss_and_scheduler(): - config = get_quantization_config_without_range_init() - register_bn_adaptation_init_args(config) - _, compression_ctrl = create_compressed_model_and_algo_for_test(BasicConvTestModel(), config) - - loss = compression_ctrl.loss - assert isinstance(loss, PTCompressionLoss) - - scheduler = compression_ctrl.scheduler - assert isinstance(scheduler, CompressionScheduler) - - -def get_path_to_keys(tmp_path, rank): - return f"{tmp_path}_{str(rank)}" - - -def activation_quantizers_dumping_worker(current_gpu, config, tmp_path): - model = resnet50(pretrained=False) - _, qctrl = create_compressed_model_and_algo_for_test(model, config) - path = get_path_to_keys(tmp_path, current_gpu) - print(path) - with open(path, "w", encoding="utf8") as f: - for aq_id in qctrl.non_weight_quantizers: - f.writelines(f"{str(aq_id)}\n") - - -@pytest.mark.cuda -def test_activation_quantizers_order_is_the_same__for_resnet50(tmp_path, runs_subprocess_in_precommit): - if not torch.cuda.is_available(): - pytest.skip("Skipping CUDA test cases for CPU only setups") - config = get_empty_config(input_sample_sizes=[1, 3, 224, 224]) - config["compression"] = {"algorithm": "quantization", "initializer": {"range": {"num_init_samples": 0}}} - register_bn_adaptation_init_args(config) - ngpus_per_node = torch.cuda.device_count() - - torch.multiprocessing.spawn( - activation_quantizers_dumping_worker, nprocs=ngpus_per_node, args=(config, tmp_path), join=True - ) - - with open(get_path_to_keys(tmp_path, 0), encoding="utf8") as f: - ref_list = f.readlines() - for i in range(1, ngpus_per_node): - with open(get_path_to_keys(tmp_path, i), encoding="utf8") as f: - curr_list = f.readlines() - assert curr_list == ref_list - - -def test_load_state_sets_initialized_flag(): - config = get_quantization_config_without_range_init() - register_bn_adaptation_init_args(config) - - model = TwoConvTestModel() - quant_model, qctrl = create_compressed_model_and_algo_for_test(model, config) - - load_state( - quant_model, - { - "module.features.0.0.pre_ops.0.op.signed_tensor": torch.tensor([1.0]), # quantizer of 1st conv's weights - "module.features.1.0.pre_ops.0.op.scale": torch.ones(1, 1, 1, 1), # quantizer of 2nd conv's weights - }, - ) - - for wq_info in qctrl.weight_quantizers.values(): - assert wq_info.quantizer_module_ref.initialized - - for aq_info in qctrl.non_weight_quantizers.values(): - assert not aq_info.quantizer_module_ref.initialized - - -def test_quantizers_have_proper_narrow_range_set(): - class Model(nn.Module): - def __init__(self, size=1): - super().__init__() - self.size = size - self.conv = nn.Conv2d(size, size, size) - - def forward(self, x): - return self.conv(x) - - model = Model() - config = get_quantization_config_without_range_init(model_size=2) - config["compression"]["overflow_fix"] = "disable" - register_bn_adaptation_init_args(config) - quant_model, _ = create_compressed_model_and_algo_for_test(model, config) - - for module in quant_model.modules(): - if isinstance(module, NNCFConv2d): - for op in module.pre_ops.values(): - assert isinstance(op, (UpdateWeight, UpdateInputs)) - assert op.operand.narrow_range == isinstance(op, UpdateWeight) - for aq in quant_model.nncf.get_compression_modules_by_type(ExtraCompressionModuleType.EXTERNAL_QUANTIZER).values(): - assert aq.narrow_range is False - - -@pytest.fixture(name="hw_config_type", params=HWConfigType) -def hw_config_type_(request): - return request.param - - -def test_hw_config_quantization_can_quantize_squeezenet(hw_config_type): - config = get_squeezenet_quantization_config() - config["target_device"] = hw_config_type.value - register_bn_adaptation_init_args(config) - model = squeezenet1_1() - create_compressed_model_and_algo_for_test(model, config) - - -class QuantizeInputsTestModel(nn.Module): - def __init__(self): - super().__init__() - self.conv1 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv2 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv3 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv4 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3) - self.conv5 = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=1) - self.conv6 = nn.Conv2d(in_channels=6, out_channels=3, kernel_size=2) - self.linear = nn.Linear(in_features=8, out_features=8) - - # (1) (2) (3) (4) (5) - # | | | | |-----\ - # (conv1) (MP) (MP) (MP) (MP) | - # | | | | | | - # | | (+) | | | - # | |--\ | | | | - # | | \ | | | | - # | (conv2) | (conv3) | | | - # | | | | \ / | - # | (AvP) \ | (cat) | - # | | \ | | | - # (conv4) (linear) \ | (conv6) | - # | | (cat) | | - # | | | (+)------/ - # | | (conv5) | - # (AvP) | | | - # | | (AvP) | - # \ | / | - # \---(cat)---------------/ - - def forward(self, input_1, input_2, input_3, input_4, input_5): - x_1 = self.conv1(input_1) - x_1 = self.conv4(x_1) - x_1 = F.adaptive_avg_pool2d(x_1, output_size=1) - x_1 = x_1.flatten(start_dim=1) - - x_2_br = F.max_pool2d(input_2, kernel_size=2) - x_2 = self.conv2(x_2_br) - x_2 = F.adaptive_avg_pool2d(x_2, output_size=1) - x_2 = x_2.flatten(start_dim=1) - x_2 = self.linear(x_2) - - x_3 = F.max_pool2d(input_3, kernel_size=2) - x_3 = x_3 + torch.ones_like(x_3) - x_3 = self.conv3(x_3) - x_3 = x_3.flatten(start_dim=1) - x_2_br = x_2_br.flatten(start_dim=1) - x_3 = torch.cat([x_2_br, x_3], dim=-1) - x_3 = self.conv5(x_3.unsqueeze(2).unsqueeze(3).transpose(1, 2)) - x_3 = F.adaptive_avg_pool2d(x_3, output_size=1) - x_3 = x_3.flatten(start_dim=1) - - x_4 = F.max_pool2d(input_4, kernel_size=2) - x_5 = F.max_pool2d(input_5, kernel_size=2) - x_45 = torch.cat([x_4, x_5], dim=1) - x_45 = self.conv6(x_45) - x_45 = x_45.flatten(start_dim=1) - in_5_flat = input_5.flatten(start_dim=1) - x_45 += F.pad(input_5.flatten(start_dim=1), [0, x_45.shape[1] - in_5_flat.shape[1]]) - - return torch.cat([x_1, x_2, x_3, x_45], dim=-1) - - -def test_quantize_inputs(): - model = QuantizeInputsTestModel() - config = get_quantization_config_without_range_init() - config["input_info"] = [ - { - "sample_size": [2, 3, 32, 32], - }, - { - "sample_size": [2, 3, 32, 32], - }, - { - "sample_size": [2, 3, 32, 32], - }, - { - "sample_size": [2, 3, 32, 32], - }, - { - "sample_size": [2, 3, 32, 32], - }, - ] - register_bn_adaptation_init_args(config) - - model, qctrl = create_compressed_model_and_algo_for_test(model, config) - REF_QUANTIZED_INPUT_MODULE_SCOPES = [ - "/nncf_model_input_0|OUTPUT", - "/nncf_model_input_1|OUTPUT", - "/nncf_model_input_2|OUTPUT", - "/nncf_model_input_3|OUTPUT", - "/nncf_model_input_4|OUTPUT", - ] - - actual_input_quantizer_str_scopes = [] - for aq_id, aq_info in qctrl.non_weight_quantizers.items(): - for target_point in aq_info.affected_insertions: - quantizer_target_node_name = str(target_point.target_node_name) - if "nncf_model_input" in quantizer_target_node_name: - actual_input_quantizer_str_scopes.append(quantizer_target_node_name) - - assert len(REF_QUANTIZED_INPUT_MODULE_SCOPES) == len(actual_input_quantizer_str_scopes) - for qinput_scope_str in actual_input_quantizer_str_scopes: - matches = set() - for aq_id, aq_info in qctrl.non_weight_quantizers.items(): - for target_point in aq_info.affected_insertions: - if qinput_scope_str in str(target_point.target_node_name): - matches.add(aq_id) - assert len(matches) == 1 - input_aq_id = next(iter(matches)) - quantizer = qctrl.non_weight_quantizers[input_aq_id].quantizer_module_ref - assert isinstance(quantizer, SymmetricQuantizer) - - -class QuantizeOutputsTestModel(nn.Module): - def __init__(self): - super().__init__() - self.conv1 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv2 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv3 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv4 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - self.conv5 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=3) - - def forward(self, x): - self.conv5(x) - return self.conv1(x), self.conv2(x), self.conv3(x), self.conv4(x) - - -def test_quantize_outputs(): - config = get_quantization_config_without_range_init() - config["input_info"] = [ - { - "sample_size": [2, 3, 32, 32], - } - ] - model = QuantizeOutputsTestModel() - config["compression"]["quantize_outputs"] = True - register_bn_adaptation_init_args(config) - model, qctrl = create_compressed_model_and_algo_for_test(model, config) - # The quantizers below will not have been set up due to quantizer propagation, - # and no configuration can be determined for them from the HW config. The - # configuration is also missing in this case in the NNCFConfig, so will - # set up a quantizer with default config. - REF_QUANTIZED_OUTPUT_MODULE_SCOPES = [ - "QuantizeOutputsTestModel/NNCFConv2d[conv1]/conv2d_0|OUTPUT", - "QuantizeOutputsTestModel/NNCFConv2d[conv2]/conv2d_0|OUTPUT", - "QuantizeOutputsTestModel/NNCFConv2d[conv3]/conv2d_0|OUTPUT", - "QuantizeOutputsTestModel/NNCFConv2d[conv4]/conv2d_0|OUTPUT", - ] - actual_output_quantizer_str_scopes = [ - str(aq_id) for aq_id in qctrl.non_weight_quantizers if "nncf_model_input" not in str(aq_id) - ] - assert len(REF_QUANTIZED_OUTPUT_MODULE_SCOPES) == len(actual_output_quantizer_str_scopes) - - for ref_qinput_scope_str in REF_QUANTIZED_OUTPUT_MODULE_SCOPES: - matches = [] - for aq_id in qctrl.non_weight_quantizers: - if str(aq_id) == ref_qinput_scope_str: - matches.append(aq_id) - assert len(matches) == 1 - quantizer = qctrl.non_weight_quantizers[matches[0]].quantizer_module_ref - assert isinstance(quantizer, SymmetricQuantizer) - - -def test_quantize_outputs_with_scope_overrides(): - config = get_quantization_config_without_range_init() - config["input_info"] = [ - { - "sample_size": [2, 3, 32, 32], - } - ] - model = QuantizeOutputsTestModel() - config["compression"]["quantize_outputs"] = True - config["target_device"] = "TRIAL" - config["compression"]["scope_overrides"] = { - "activations": { - "/nncf_model_output_0": { - "bits": 4, - "mode": "asymmetric", - } - } - } - register_bn_adaptation_init_args(config) - model, ctrl = create_compressed_model_and_algo_for_test(model, config) - output_quantizers = [q for qid, q in ctrl.all_quantizations.items() if isinstance(qid, NonWeightQuantizerId)] - for q in output_quantizers[1:]: - assert q.num_bits == 8 - assert isinstance(q, SymmetricQuantizer) - - assert output_quantizers[0].num_bits == 4 - assert isinstance(output_quantizers[0], AsymmetricQuantizer) - - -class IntermediateOutputModel(nn.Module): - """ - When quantized with "quantize_outputs": False (which is the default behaviour), - the activation quantizer of `conv2` shall not propagate to the output of `conv1`, - but shall stay as a pre-hook to the `conv2`, so as not to impact the - return value of `conv1` which is also an intermediate output of the model. - """ - - def __init__(self): - super().__init__() - self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=1) - self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=1) - - def forward(self, x): - x1 = self.conv1(x) - return x1, self.conv2(x1) - - -def test_intermediate_output_model(): - config = get_quantization_config_without_range_init() - config["input_info"] = [ - { - "sample_size": [2, 3, 32, 32], - } - ] - model = IntermediateOutputModel() - config["compression"]["quantize_outputs"] = False - register_bn_adaptation_init_args(config) - model, qctrl = create_compressed_model_and_algo_for_test(model, config) - activation_quantizer_scopes = [str(aq_id) for aq_id in qctrl.non_weight_quantizers] - assert Counter(activation_quantizer_scopes) == Counter( - [ - "/nncf_model_input_0|OUTPUT", # activation quantizer of conv1 - "IntermediateOutputModel/NNCFConv2d[conv2]/conv2d_0|INPUT0", - ] - ) # act. quant. of conv2 - - -def test_debug_mode(): - config = get_quantization_config_without_range_init() - register_bn_adaptation_init_args(config) - model = BasicConvTestModel() - with nncf_debug(): - model, _ = create_compressed_model_and_algo_for_test(model, config) - model.forward(torch.zeros(BasicConvTestModel.INPUT_SIZE, device=get_model_device(model))) - - -class SharedLayersModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.shared_conv = torch.nn.Conv2d(1, 1, 1) - - def forward(self, x): - x = self.shared_conv(x) - x = x + x - x = self.shared_conv(x) - x = x * x - return x - - -def test_shared_layers_are_weight_quantized_only_once(): - model = SharedLayersModel() - config = get_quantization_config_without_range_init(model_size=1) - register_bn_adaptation_init_args(config) - model, qctrl = create_compressed_model_and_algo_for_test(model, config) - assert len(qctrl.weight_quantizers) == 1 - - -TEST_QUANTIZATION_PRESET_STRUCT = [ - { - "preset": "performance", - "target_device": "CPU", - "overrided_param": {}, - "expected_weights_q": SymmetricQuantizer, - "expected_activations_q": SymmetricQuantizer, - }, - { - "preset": "mixed", - "target_device": "CPU", - "overrided_param": {}, - "expected_weights_q": SymmetricQuantizer, - "expected_activations_q": AsymmetricQuantizer, - }, - { - "preset": "performance", - "target_device": "GPU", - "overrided_param": {}, - "expected_weights_q": SymmetricQuantizer, - "expected_activations_q": SymmetricQuantizer, - }, - { - "preset": "mixed", - "target_device": "GPU", - "overrided_param": {}, - "expected_weights_q": SymmetricQuantizer, - "expected_activations_q": AsymmetricQuantizer, - }, - { - "preset": "performance", - "target_device": "CPU", - "overrided_param": {"weights": {"mode": "asymmetric"}}, - "expected_weights_q": AsymmetricQuantizer, - "expected_activations_q": SymmetricQuantizer, - }, -] - - -@pytest.mark.parametrize("data", TEST_QUANTIZATION_PRESET_STRUCT) -def test_quantization_preset(data): - model = BasicConvTestModel() - config = get_empty_config(input_sample_sizes=[1, 1, 4, 4]) - config["target_device"] = data["target_device"] - config["compression"] = {"algorithm": "quantization", "preset": data["preset"]} - config["compression"].update(data["overrided_param"]) - register_bn_adaptation_init_args(config) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - for wq_info in compression_ctrl.weight_quantizers.values(): - assert isinstance(wq_info.quantizer_module_ref, data["expected_weights_q"]) - - for aq_info in compression_ctrl.non_weight_quantizers.values(): - assert isinstance(aq_info.quantizer_module_ref, data["expected_activations_q"]) - - -def test_quantization_preset_with_scope_overrides(): - model = QuantizeOutputsTestModel() - config = get_empty_config(input_sample_sizes=[2, 3, 32, 32]) - config["target_device"] = "TRIAL" - config["compression"] = { - "algorithm": "quantization", - "preset": "mixed", - "scope_overrides": { - "weights": { - "QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": { - "mode": "asymmetric", - } - } - }, - } - register_bn_adaptation_init_args(config) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - for wq_info in compression_ctrl.weight_quantizers.values(): - if wq_info.affected_insertions[0].target_node_name != "QuantizeOutputsTestModel/NNCFConv2d[conv5]/conv2d_0": - assert isinstance(wq_info.quantizer_module_ref, SymmetricQuantizer) - else: - assert isinstance(wq_info.quantizer_module_ref, AsymmetricQuantizer) - - for aq_info in compression_ctrl.non_weight_quantizers.values(): - assert isinstance(aq_info.quantizer_module_ref, AsymmetricQuantizer) - - -def test_quantization_can_be_run_with_no_data_loaders_if_zero_init_samples(): - model = BasicConvTestModel() - # Should complete successfully even though no loaders have been registered into the config. - _, _ = create_compressed_model_and_algo_for_test( - model, - NNCFConfig.from_dict( - { - "input_info": {"sample_size": [1, 1, 4, 4]}, - "compression": { - "algorithm": "quantization", - "initializer": { - "range": {"num_init_samples": 0}, - "batchnorm_adaptation": {"num_bn_adaptation_samples": 0}, - }, - }, - } - ), - ) - - -class TestHalfPrecisionModels: - class RegularModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv_first = torch.nn.Conv2d(1, 1, 1) - self.conv_second = torch.nn.Conv2d(1, 1, 1) - - def forward(self, x): - y = self.conv_first(x) - y = self.conv_second(y) - return y - - class ModelWithInternalAutocast(torch.nn.Module): - def __init__(self): - super().__init__() - self.model = TestHalfPrecisionModels.RegularModel() - - def forward(self, x): - with autocast(device_type="cuda" if x.is_cuda else "cpu"): - y = self.model(x) - return y - - class ModelWithManualPartialHalfPrecision(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv_first = torch.nn.Conv2d(1, 1, 1) - self.conv_second = torch.nn.Conv2d(1, 1, 1).half() - self.conv_third = torch.nn.Conv2d(1, 1, 1) - - def forward(self, x): - y = self.conv_first(x) - y = y.half() - y = self.conv_second(y) - y = y.to(torch.float32) - y = self.conv_third(y) - return y - - @pytest.fixture() - def initializing_config(self): - config = get_quantization_config_without_range_init(model_size=1) - - # Make sure that both symmetric and asymmetric quantizers appear in the model - config["compression"]["scope_overrides"] = { - "activations": {"{re}.*conv_first.*": {"mode": "asymmetric"}, "{re}.*conv_second.*": {"mode": "symmetric"}}, - "weights": {"{re}.*conv_first.*": {"mode": "asymmetric"}, "{re}.*conv_second.*": {"mode": "symmetric"}}, - } - config["compression"]["initializer"] = { - "range": {"num_init_samples": 2}, - "batchnorm_adaptation": {"num_bn_adaptation_samples": 1}, - } - data_loader = create_ones_mock_dataloader(config) - config = register_default_init_args(config, data_loader) - return config - - def test_internal_autocast_model(self, initializing_config: NNCFConfig): - model = TestHalfPrecisionModels.ModelWithInternalAutocast() - inputs = torch.ones([1, 1, 1, 1]) - if torch.cuda.is_available(): - inputs = inputs.cuda() - model = model.cuda() - - compressed_model, _ = create_compressed_model_and_algo_for_test(model, initializing_config) - - # Should complete successfully, including init. - compressed_model(inputs) - - @pytest.mark.parametrize( - "device", - [pytest.param("cuda", marks=pytest.mark.cuda), pytest.param("cpu", marks=pytest.mark.skip(reason="CVS-86697"))], - ) - def test_manual_partial_half_precision_model(self, initializing_config: NNCFConfig, device: str): - model = TestHalfPrecisionModels.ModelWithManualPartialHalfPrecision() - inputs = torch.ones([1, 1, 1, 1]) - - if device == "cuda": - if torch.cuda.is_available(): - inputs = inputs.cuda() - model = model.cuda() - else: - pytest.skip("CUDA is not available.") - - compressed_model, _ = create_compressed_model_and_algo_for_test(model, initializing_config) - - # Should complete successfully, including init. - compressed_model(inputs) - - def test_external_autocast(self, initializing_config: NNCFConfig, use_cuda): - model = TestHalfPrecisionModels.RegularModel() - inputs = torch.ones([1, 1, 1, 1]) - if use_cuda: - if not torch.cuda.is_available(): - pytest.skip("CUDA not available") - inputs = inputs.cuda() - model = model.cuda() - - compressed_model, _ = create_compressed_model_and_algo_for_test(model, initializing_config) - - with autocast(device_type="cuda" if inputs.is_cuda else "cpu"): - # Should complete successfully. - result = compressed_model(inputs) - if torch.is_autocast_enabled(): # For torch <= 1.9.1 and CPU the autocast context won't have effect - assert result.dtype == torch.float16 - - -@pytest.mark.parametrize( - "update_config_info, should_ignore_quantizers", - [ - ({}, []), - ( - {"ignored_scopes": ["LeNet/relu_1"]}, - [], # ignoring second op in the pattern doesn't lead to exclusion of quantization first op - ), - ( - {"activations": {"ignored_scopes": ["LeNet/relu_1"]}}, - [], # ignoring second op in the pattern doesn't lead to exclusion of quantization first op - ), - ( - {"ignored_scopes": ["LeNet/NNCFConv2d[conv2]/conv2d_0"]}, - ["LeNet/relu_0", "LeNet/NNCFConv2d[conv2]/conv2d_0"], - ), - ({"activations": {"ignored_scopes": ["LeNet/NNCFConv2d[conv2]/conv2d_0"]}}, ["LeNet/relu_0"]), - ], -) -def test_activation_ignored_scope(update_config_info, should_ignore_quantizers): - model = LeNet() - all_quantization_names = [ - "LeNet/NNCFConv2d[conv1]/conv2d_0", - "LeNet/NNCFConv2d[conv2]/conv2d_0", - "LeNet/NNCFLinear[fc1]/linear_0", - "LeNet/NNCFLinear[fc2]/linear_0", - "LeNet/NNCFLinear[fc3]/linear_0", - "/nncf_model_input_0", - "LeNet/relu_0", - "LeNet/relu_1", - "LeNet/relu_2", - "LeNet/relu_3", - ] - ref_quantization_names = list(filter(lambda x: x not in should_ignore_quantizers, all_quantization_names)) - config = get_quantization_config_without_range_init(LeNet.INPUT_SIZE[-1]) - config["compression"].update(update_config_info) - train_loader = create_random_mock_dataloader(config, num_samples=10) - config = register_default_init_args(config, train_loader) - ctrl, _ = create_compressed_model(model, config) - assert Counter([item.target_node_name for item in ctrl.all_quantizations]) == Counter(ref_quantization_names) - - -def test_sync_of_level_ranges_and_signed_parameter(): - qspec = PTQuantizerSpec( - num_bits=4, - mode=QuantizationMode.SYMMETRIC, - signedness_to_force=None, - scale_shape=(1,), - narrow_range=False, - half_range=False, - logarithm_scale=False, - ) - - sq = SymmetricQuantizer(qspec) - # Check if the default values are different from the values to be loaded. - assert sq.signed is False - assert sq.level_low == 0 - - sq.signed = True - assert sq.signed is True - assert sq.level_low == -8 - - loaded_sq = SymmetricQuantizer(qspec) - loaded_sq.load_state_dict(sq.state_dict()) - assert loaded_sq.signed is True - assert loaded_sq.level_low == -8 - - -@register_module() -class UserModuleWithAddmm(torch.nn.Module): - def __init__(self): - super().__init__() - self.weight = torch.nn.Parameter(torch.ones([1, 1])) - self.bias = torch.nn.Parameter(torch.ones([1, 1])) - - def forward(self, x): - return torch.addmm(self.bias, x, self.weight) - - -class ModelWithUserModule(torch.nn.Module): - def __init__(self): - super().__init__() - self.user_module = UserModuleWithAddmm() - - def forward(self, x): - x = self.user_module(x) - return x - - -def test_can_quantize_user_module_with_addmm(): - nncf_config = NNCFConfig.from_dict( - {"input_info": {"sample_size": [1, 1]}, "compression": {"algorithm": "quantization"}} - ) - - train_loader = create_random_mock_dataloader(nncf_config, num_samples=10) - nncf_config = register_default_init_args(nncf_config, train_loader) - - # Should complete successfully without exceptions: - create_compressed_model_and_algo_for_test(ModelWithUserModule(), nncf_config) - - -@pytest.mark.nightly -@pytest.mark.cuda -def test_works_when_wrapped_with_dataparallel(): - if not torch.cuda.is_available() and torch.cuda.device_count() > 1: - pytest.xfail("The executing host must have > 1 CUDA GPU in order for this test to be relevant.") - - model = SharedLayersModel().cuda() - config = get_quantization_config_without_range_init(model_size=1) - register_bn_adaptation_init_args(config) - model, _ = create_compressed_model_and_algo_for_test(model, config) - model = torch.nn.DataParallel(model) - model(torch.ones([10, 1, 1, 1], device="cuda")) diff --git a/tests/torch/quantization/test_hawq_precision_init.py b/tests/torch/quantization/test_hawq_precision_init.py deleted file mode 100644 index 2648a086c1f..00000000000 --- a/tests/torch/quantization/test_hawq_precision_init.py +++ /dev/null @@ -1,824 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import itertools -import json -import math -import os -from collections import OrderedDict -from collections import namedtuple -from functools import partial -from pathlib import Path -from typing import Callable, NamedTuple - -import pytest -import torch -import torch.utils.data -from numpy.random import random_sample -from torch import nn -from torchvision.datasets import CIFAR10 -from torchvision.models import mobilenet_v2 -from torchvision.models import resnet50 -from torchvision.transforms import transforms - -import nncf -from nncf import NNCFConfig -from nncf.common.graph import NNCFNodeName -from nncf.common.hardware.config import HWConfigType -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup -from nncf.common.quantization.structs import QuantizerGroup -from nncf.common.utils.debug import set_debug_log_dir -from nncf.torch import register_default_init_args -from nncf.torch.checkpoint_loading import load_state -from nncf.torch.dynamic_graph.io_handling import FillerInputInfo -from nncf.torch.initialization import default_criterion_fn -from nncf.torch.quantization.adjust_padding import add_adjust_padding_nodes -from nncf.torch.quantization.hessian_trace import HessianTraceEstimator -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.layers import QuantizerConfig -from nncf.torch.quantization.layers import QuantizersSwitcher -from nncf.torch.quantization.precision_init.bitwidth_graph import BitwidthGraph -from nncf.torch.quantization.precision_init.compression_ratio import CompressionRatioCalculator -from nncf.torch.quantization.precision_init.hawq_debug import HAWQDebugger -from nncf.torch.quantization.precision_init.hawq_init import BitwidthAssignmentMode -from nncf.torch.quantization.precision_init.hawq_init import HAWQPrecisionInitializer -from nncf.torch.quantization.precision_init.hawq_init import TraceOrderBitwidthMatcher -from nncf.torch.quantization.precision_init.perturbations import PerturbationObserver -from nncf.torch.quantization.precision_init.perturbations import Perturbations -from nncf.torch.quantization.precision_init.traces_order import TracesOrder -from nncf.torch.quantization.precision_init.traces_order import TracesPerLayer -from nncf.torch.structures import QuantizationPrecisionInitArgs -from nncf.torch.utils import get_all_modules_by_type -from nncf.torch.utils import get_model_device -from nncf.torch.utils import safe_thread_call -from tests.cross_fw.shared.nx_graph import compare_nx_graph_with_reference -from tests.cross_fw.shared.paths import TEST_ROOT -from tests.torch.helpers import BasicConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_conv -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.quantization.quantization_helpers import compare_multi_gpu_dump -from tests.torch.quantization.quantization_helpers import create_rank_dataloader -from tests.torch.quantization.quantization_helpers import distributed_init_test_default -from tests.torch.quantization.quantization_helpers import get_quantization_config_without_range_init -from tests.torch.quantization.quantization_helpers import get_squeezenet_quantization_config -from tests.torch.quantization.quantization_helpers import post_compression_test_distr_init -from tests.torch.test_compressed_graph import get_full_path_to_the_graph -from tests.torch.test_models import inception_v3 -from tests.torch.test_models import squeezenet1_1 - -pytestmark = pytest.mark.legacy - - -def create_cifar(config, dataset_config, is_train, transform): - create_cifar_fn = None - if dataset_config == "cifar10": - create_cifar_fn = partial(CIFAR10, config.dataset_dir, train=is_train, transform=transform) - if create_cifar_fn: - return safe_thread_call(partial(create_cifar_fn, download=True), partial(create_cifar_fn, download=False)) - return None - - -def create_test_dataloaders(config: NNCFConfig, dataset_dir): - input_info = FillerInputInfo.from_nncf_config(config).elements[0] - image_size = input_info.shape[-1] - batch_size = input_info.shape[0] - normalize = transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)) - - train_transforms = transforms.Compose( - [ - transforms.CenterCrop(image_size), - transforms.ToTensor(), - normalize, - ] - ) - - dummy_config = type("dummy", (object,), {"dataset_dir": dataset_dir})() - train_dataset = create_cifar(dummy_config, dataset_config="cifar10", is_train=True, transform=train_transforms) - - # Do not set num_workers > 0 here - random hangs occur during pytest runs of this files - train_loader = torch.utils.data.DataLoader( - train_dataset, batch_size=batch_size, shuffle=False, pin_memory=True, drop_last=True - ) - return train_loader, train_dataset - - -def get_bitwidth_per_scope(model, all_quantizations=None): - if not all_quantizations: - all_quantizations = HAWQDebugger.get_all_quantizers_per_full_scope(model) - full_bitwidth_per_scope = [] - for scope, quantizer in all_quantizations.items(): - full_bitwidth_per_scope.append([quantizer.num_bits, str(scope)]) - return full_bitwidth_per_scope - - -def compare_with_ref_if_exists(actual_state, path_to_ref): - if os.path.exists(path_to_ref): - with open(path_to_ref, encoding="utf8") as f: - assert json.load(f) == actual_state - else: - with open(path_to_ref, "w", encoding="utf8") as f: - json.dump(actual_state, f) - - -class BaseConfigBuilder: - def __init__(self, config_creator_fn: Callable = None): - if config_creator_fn: - self._config = config_creator_fn() - self._options: dict[str, str] = OrderedDict() - self._extra_params: str = "" - - def with_ratio(self, ratio: float): - self._config["compression"]["initializer"]["precision"]["compression_ratio"] = ratio - self._options["ratio"] = str(ratio) - return self - - def with_sample_size(self, sample_size: list[int]): - self._config["input_info"]["sample_size"] = sample_size - return self - - def staged(self): - self._config["compression"]["params"] = {"activations_quant_start_epoch": 0, "weights_quant_start_epoch": 1} - self._extra_params += "staged" - return self - - def _set_target_device(self, config_type: str): - self._config["target_device"] = config_type - self._options["device"] = config_type - return self - - def for_npu(self): - return self._set_target_device(HWConfigType.NPU.value) - - def for_cpu(self): - return self._set_target_device(HWConfigType.CPU.value) - - def for_trial(self): - return self._set_target_device("TRIAL") - - def build(self): - return self._config - - def with_ignored_scope(self, ignored_scopes=list[str], target_group: QuantizerGroup = None): - if target_group is None: - self._config["compression"]["ignored_scopes"] = ignored_scopes - else: - if target_group.value not in self._config["compression"]: - self._config["compression"][target_group.value] = {} - self._config["compression"][target_group.value]["ignored_scopes"] = ignored_scopes - self._options["with"] = "ignored_scopes" - return self - - def with_target_scope(self, target_scopes=list[str]): - self._config["target_scopes"] = target_scopes - self._config["compression"]["target_scopes"] = target_scopes - self._options["with"] = "target_scopes" - return self - - def __str__(self): - if self._extra_params: - return "_".join([self.filename_suffix(), self._extra_params]) - return self.filename_suffix() - - def filename_suffix(self) -> str: - ordered_options = OrderedDict(sorted(self._options.items())) - return "__".join(["_".join([k, v]) for k, v in ordered_options.items()]) - - -class HAWQConfigBuilder(BaseConfigBuilder): - def __init__(self, config_creator_fn: Callable = None, batch_size=10, num_data_points=100, image_size=10): - super().__init__(config_creator_fn) - if not config_creator_fn: - self._config = self.create_hawq_test_config(batch_size, num_data_points, image_size) - self.num_data_points = num_data_points - self.compression_ratio = 0 - self.should_add_flops = False - - def _set_bitwidth_assignment_mode(self, mode: BitwidthAssignmentMode): - self._config["compression"]["initializer"]["precision"]["bitwidth_assignment_mode"] = mode.value - self._options["mode"] = str(mode.value) - return self - - def strict_mode(self): - return self._set_bitwidth_assignment_mode(BitwidthAssignmentMode.STRICT) - - def liberal_mode(self): - return self._set_bitwidth_assignment_mode(BitwidthAssignmentMode.LIBERAL) - - def build(self): - return self._config - - def for_npu(self): - super().for_npu() - return self.strict_mode() - - def check_compression_ratio(self, compression_ratio=1.5): - self.compression_ratio = compression_ratio - return self - - def add_flops(self): - self.should_add_flops = True - return self - - @staticmethod - def create_hawq_test_config(batch_size=10, num_data_points=100, image_size=10): - config = get_quantization_config_without_range_init() - config["input_info"] = { - "sample_size": [batch_size, 3, image_size, image_size], - } - config["batch_size"] = batch_size - config["compression"].update( - { - "initializer": { - "precision": { - "type": "hawq", - "bits": [4, 8, 6], - "num_data_points": num_data_points, - "iter_number": 1, - "tolerance": 1e-2, - }, - "range": {"num_init_samples": 1}, - "batchnorm_adaptation": {"num_bn_adaptation_samples": 0}, - } - } - ) - return config - - -def get_avg_traces(model, init_device: str): - num_layers = len(get_all_modules_by_type(model, ["Conv2d", "Linear"])) - return torch.randperm(num_layers).to(init_device) + 1 - - -def check_bitwidth_graph(algo_ctrl, model, path_to_dot, graph_dir, add_flops=False): - if torch.cuda.is_available(): - model = model.cuda() - all_quantizers_per_full_scope = HAWQDebugger.get_all_quantizers_per_full_scope(model) - quantizer_switcher = QuantizersSwitcher(list(all_quantizers_per_full_scope.values())) - # graph may not contain some quantizers (e.g. in staged scenario) - quantizer_switcher.enable_quantizers() - model.nncf.rebuild_graph() - groups_of_adjacent_quantizers = algo_ctrl.groups_of_adjacent_quantizers - graph = BitwidthGraph(algo_ctrl, model, groups_of_adjacent_quantizers, add_flops).get() - nx_graph = add_adjust_padding_nodes(graph, model) - path_to_dot = get_full_path_to_the_graph(path_to_dot, graph_dir) - compare_nx_graph_with_reference(nx_graph, path_to_dot) - - -class HAWQTestStruct(NamedTuple): - model_creator: Callable[[], nn.Module] = mobilenet_v2 - config_builder: HAWQConfigBuilder = HAWQConfigBuilder().for_npu() - filename_suffix: str = "hw_config_npu" - avg_traces_creator: Callable[[nn.Module, str], torch.Tensor] = get_avg_traces - - def __str__(self): - return "_".join([self.model_creator.__name__, str(self.config_builder)]) - - -INCV3_FLOPS_PER_MODULE = [83886080, 100663296, 117440512, 56623104, 56623104, 198180864, 50331648, 56623104, 56623104] - -# WARNING: BITWIDTH_PER_MODULE should be set as max(weight_bits, act_bits) since this is how compression -# ratio is calculated inside HAWQ - -# Currently the HAWQ sets up 4-bit weights but 8-bit activations for 117440512 module, -# so effective flops would be computed as if the module had 8-bit weights, therefor "[8, 8, 8" instead of "[8, 8, 4" -INCV3_BITWIDTH_PER_MODULE = [8, 8, 8, 8, 4, 4, 4, 4, 8] -INCV3_BITS_COMPLEXITY = map(lambda x, y: x * y, INCV3_FLOPS_PER_MODULE, INCV3_BITWIDTH_PER_MODULE) -INCV3_COMPRESSION_RATIO = sum(INCV3_FLOPS_PER_MODULE) * 8 / sum(INCV3_BITS_COMPLEXITY) - -HAWQ_TEST_PARAMS = ( - HAWQTestStruct(config_builder=HAWQConfigBuilder().staged()), - HAWQTestStruct(config_builder=HAWQConfigBuilder().for_trial()), - HAWQTestStruct(config_builder=HAWQConfigBuilder().for_cpu()), - HAWQTestStruct(config_builder=HAWQConfigBuilder().for_npu().liberal_mode().with_ratio(1.5)), - HAWQTestStruct(config_builder=HAWQConfigBuilder().with_ratio(1.02).for_npu()), - HAWQTestStruct( - model_creator=squeezenet1_1, config_builder=HAWQConfigBuilder().with_sample_size([1, 3, 224, 224]).for_npu() - ), - HAWQTestStruct(model_creator=resnet50, config_builder=HAWQConfigBuilder().with_ratio(1.11).for_npu()), - HAWQTestStruct(model_creator=resnet50, config_builder=HAWQConfigBuilder().for_npu().liberal_mode().with_ratio(1.5)), - HAWQTestStruct( - model_creator=inception_v3, - avg_traces_creator=lambda x, y: get_avg_traces(x, y)[:95], - config_builder=HAWQConfigBuilder().with_sample_size([2, 3, 299, 299]).for_npu().with_ratio(1), - ), - HAWQTestStruct( - model_creator=inception_v3, - avg_traces_creator=lambda x, y: get_avg_traces(x, y)[:94], - config_builder=HAWQConfigBuilder() - .with_sample_size([2, 3, 299, 299]) - .for_npu() - .liberal_mode() - .with_ignored_scope( - ["Inception3/BasicConv2d[Conv2d_2a_3x3]/NNCFConv2d[conv]/conv2d_0"], target_group=QuantizerGroup.WEIGHTS - ) - .with_ratio(1.5), - ), - HAWQTestStruct( - model_creator=inception_v3, - avg_traces_creator=lambda x, y: get_avg_traces(x, y)[:9], - config_builder=HAWQConfigBuilder() - .with_sample_size([2, 3, 299, 299]) - .for_npu() - .liberal_mode() - .with_target_scope([r"{re}.*InceptionE\[Mixed_7c\].*"]) - .with_ratio(1.3) - .check_compression_ratio(INCV3_COMPRESSION_RATIO) - .add_flops(), - ), - HAWQTestStruct( - model_creator=inception_v3, - avg_traces_creator=lambda x, y: get_avg_traces(x, y)[:95], - config_builder=HAWQConfigBuilder().with_sample_size([2, 3, 299, 299]).for_npu().liberal_mode().with_ratio(1.5), - ), -) - - -@pytest.mark.parametrize("params", HAWQ_TEST_PARAMS, ids=[str(p) for p in HAWQ_TEST_PARAMS]) -def test_hawq_precision_init(_seed, dataset_dir, tmp_path, mocker, params): - if str(params) in [ - "mobilenet_v2_device_CPU", - "mobilenet_v2__staged", - "mobilenet_v2_device_NPU__mode_strict__ratio_1.02", - ]: - pytest.xfail("Ticket: 175018") - config_builder = params.config_builder - config = config_builder.build() - - model = params.model_creator() - if torch.cuda.is_available(): - model = model.cuda() - pregen_device = "cuda" - else: - pregen_device = "cpu" - - pregen_traces_for_all_layers = params.avg_traces_creator(model, pregen_device) - criterion = nn.CrossEntropyLoss().cuda() - if not dataset_dir: - dataset_dir = str(tmp_path) - train_loader, _ = create_test_dataloaders(config, dataset_dir) - config = register_default_init_args(config, train_loader, criterion=criterion) - - mocked_trace = mocker.patch( - "nncf.torch.quantization.hessian_trace.HessianTraceEstimator.get_average_traces", autospec=True - ) - ratio_list_spy = mocker.spy(HAWQPrecisionInitializer, "get_compression_ratio_per_qconfig_sequence") - chosen_index_spy = mocker.spy(HAWQPrecisionInitializer, "choose_qconfig_sequence") - - # There may be less traces required to be calculated during HAWQ than there are weightable layers. - def side_effect_fn(self, max_iter=500, tolerance=1e-5): - return pregen_traces_for_all_layers[: len(self._parameter_handler.parameters)] - - mocked_trace.side_effect = side_effect_fn - model, ctrl = create_compressed_model_and_algo_for_test(model, config) - - path_to_dot = f"{params.model_creator.__name__}_{config_builder.filename_suffix()}.dot" - graph_dir = os.path.join("quantized", "hawq") - check_bitwidth_graph(ctrl, model, path_to_dot, graph_dir, add_flops=config_builder.should_add_flops) - if config_builder.compression_ratio: - ratio_list = ratio_list_spy.spy_return - index = chosen_index_spy.spy_return - assert config_builder.compression_ratio == ratio_list[index] - - -class RefRatios(NamedTuple): - target_ratio: int - expected_ratio: int - - def __str__(self): - return f"target_ratio:{str(self.target_ratio)}__expected_ratio:{str(self.expected_ratio)}" - - -TEST_REF_RATIOS = [RefRatios(1, 2), RefRatios(2, 2), RefRatios(3, 4), RefRatios(4, 4), RefRatios(5, 6), RefRatios(6, 6)] - - -@pytest.mark.parametrize("ratios", TEST_REF_RATIOS, ids=map(str, TEST_REF_RATIOS)) -def test_can_choose_pareto_optimal_sequence(ratios): - # (metric) - # 6| * - # 5| * - # 4| * - # 3| * - # 2| * - # 1| * - # _ _ _ _ _ _ - # 1 2 3 4 5 6 (ratio) - compression_ratio_per_qconfig = [1, 2, 2, 3, 4, 6] - metric_per_qconfig_sequences = [5, 1, 6, 3, 2, 4] - target_ratio, expected_ratio = ratios - metric_per_qconfig_sequences = list(map(lambda x: torch.Tensor([x]), metric_per_qconfig_sequences)) - - qconfig_sequence_index = HAWQPrecisionInitializer.choose_qconfig_sequence( - metric_per_qconfig_sequences, compression_ratio_per_qconfig, target_ratio - ) - - assert compression_ratio_per_qconfig[qconfig_sequence_index] == expected_ratio - - -def test_hawq_hw_npu_config_e2e(_seed, dataset_dir, tmp_path): - config = HAWQConfigBuilder().for_npu().liberal_mode().with_ratio(1.5).build() - model = mobilenet_v2(num_classes=10) - criterion = nn.CrossEntropyLoss() - if not dataset_dir: - dataset_dir = str(tmp_path) - train_loader, _ = create_test_dataloaders(config, dataset_dir) - config = register_default_init_args(config, train_loader, criterion=criterion) - - create_compressed_model_and_algo_for_test(model, config) - - -HAWQTestParams = namedtuple( - "HAWQTestParams", ("iter_number", "batch_size", "num_data_points", "cuda_ref_trace", "cpu_ref_trace") -) - - -@pytest.mark.parametrize( - "params", - ( - HAWQTestParams(200, 13, 100, 1.2741253547860323, 1.274125503581261), - HAWQTestParams(2, 13, 100, 1.2646427814393832, 1.2646428162034615), - HAWQTestParams(2, 10, 10, 1.830527384351921, 1.8305243724338203), - HAWQTestParams(2, 10, 5, 1.830527384351921, 1.8305243724338203), - ), - ids=("until_threshold", "until_num_iter", "batch_eq_num_data", "batch_larger_num_data"), -) -def test_hawq_on_single_conv_without_quantizers(_seed, dataset_dir, tmp_path, params: HAWQTestParams, mocker): - config = get_squeezenet_quantization_config(batch_size=params.batch_size) - iter_number = params.iter_number - tolerance = 4e-4 - - model = squeezenet1_1(num_classes=10, dropout=0) - - from torchvision.models import SqueezeNet1_1_Weights - - load_state(model, SqueezeNet1_1_Weights.IMAGENET1K_V1.get_state_dict(progress=False)) - criterion = nn.CrossEntropyLoss() - ref_trace = params.cpu_ref_trace - rtol = 1e-5 - if torch.cuda.is_available(): - model = model.cuda() - criterion = criterion.cuda() - ref_trace = params.cuda_ref_trace - - if not dataset_dir: - dataset_dir = str(tmp_path) - data_loader, _ = create_test_dataloaders(config, dataset_dir) - device = get_model_device(model) - - for _, param in model.named_parameters(): - param.requires_grad = False - first_conv = next(iter(get_all_modules_by_type(model, "Conv2d").values())) - first_conv.weight.requires_grad = True - ph_import = "nncf.torch.quantization.hessian_trace.ParameterHandler" - sample_rademacher_patch = mocker.patch(f"{ph_import}.sample_rademacher_like_params", autospec=True) - sample_normal_patch = mocker.patch(f"{ph_import}.sample_normal_like_params", autospec=True) - - def mock_sampling_fn(self): - return list(map(lambda x: torch.from_numpy(random_sample(x.shape)).to(device=self._device), self.parameters)) - - sample_rademacher_patch.side_effect = mock_sampling_fn - sample_normal_patch.side_effect = mock_sampling_fn - - trace_estimator = HessianTraceEstimator( - model, default_criterion_fn, criterion, device, data_loader, params.num_data_points - ) - actual_state = trace_estimator.get_average_traces(max_iter=iter_number, tolerance=tolerance) - assert math.isclose(actual_state.item(), ref_trace, rel_tol=rtol) - - -def get_size_of_search_space(m, L): - def nCr(n, r): - f = math.factorial - return f(n) // f(r) // f(n - r) - - ref_num = 0 - for j in range(1, m + 1): - ref_num += nCr(m, j) * nCr(L - 1, j - 1) - return ref_num - - -def test_get_non_decreasing_bit_sequences(): - bits = [4, 2, 8] - L = 4 - m = len(bits) - all_configs = list(itertools.product(bits, repeat=L)) - - ref_configs = [] - for bit_config in all_configs: - is_ok = True - for i in range(L - 1): - if bit_config[i + 1] < bit_config[i]: - is_ok = False - break - if is_ok: - ref_configs.append(list(bit_config)) - - order = TracesOrder(list(range(L))) - matcher = TraceOrderBitwidthMatcher(bits, order) - actual_config = matcher.get_all_non_decreasing_bitwidth_sequences() - ref_num = get_size_of_search_space(m, L) - assert len(ref_configs) == ref_num - assert len(actual_config) == ref_num - assert sorted(actual_config) == sorted(ref_configs) - - -def get_requires_grad_per_param(model): - not_sorted = OrderedDict({param_name: param.requires_grad for param_name, param in model.named_parameters()}) - return OrderedDict(sorted(not_sorted.items())) - - -def get_skipped_quantized_weight_node_names() -> list[NNCFNodeName]: - scopes_list = [ - "MobileNetV2/Sequential[features]/Conv2dNormActivation[18]/NNCFConv2d[0]/conv2d_0", - "MobileNetV2/Sequential[features]/InvertedResidual[17]/Sequential[conv]/NNCFConv2d[2]/conv2d_0", - "MobileNetV2/Sequential[features]/InvertedResidual[16]/Sequential[conv]/NNCFConv2d[2]/conv2d_0", - ] - return scopes_list - - -def test_disable_quantizer_gradients(): - _, parameters_to_restore, model, *_ = disable_quantizer_gradients() - assert len(parameters_to_restore.originally_disabled_gradients) == 354 - assert len(parameters_to_restore.skipped_gradients_to_enable) == 3 - actual_requires_grad_per_param = get_requires_grad_per_param(model) - path_to_ref = str(TEST_ROOT / "torch/data/hawq_reference/mobilenet_v2_requires_grad_per_param.json") - compare_with_ref_if_exists(actual_requires_grad_per_param, path_to_ref) - - -def test_enable_quantizer_gradients(): - switcher, params_to_restore, model, ctrl, origi_requires_grad_per_param = disable_quantizer_gradients() - quantized_modules = ctrl.weight_quantizers - HAWQPrecisionInitializer.restore_disabled_gradients(switcher, model, quantized_modules, params_to_restore) - actual_requires_grad_per_param = get_requires_grad_per_param(model) - assert origi_requires_grad_per_param == actual_requires_grad_per_param - - -def disable_quantizer_gradients(): - config = get_quantization_config_without_range_init() - config["input_info"] = { - "sample_size": [2, 3, 10, 10], - } - register_bn_adaptation_init_args(config) - model = mobilenet_v2() - model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - original_requires_grad_per_param = get_requires_grad_per_param(model) - quantization_types = [class_type.__name__ for class_type in QUANTIZATION_MODULES.registry_dict.values()] - all_quantizations = get_all_modules_by_type(model, quantization_types) - quantizers_switcher = QuantizersSwitcher(list(all_quantizations.values())) - params_to_restore = HAWQPrecisionInitializer.disable_all_gradients_except_weights_of_quantized_modules( - quantizers_switcher, compression_ctrl.weight_quantizers, model, get_skipped_quantized_weight_node_names() - ) - return quantizers_switcher, params_to_restore, model, compression_ctrl, original_requires_grad_per_param - - -def get_path_to_bitwidth_dump(tmp_path, rank): - out_file_path = tmp_path / f"bitwidth_per_scope_gpu{rank}.pt" - return out_file_path - - -def precision_init_dumping_worker(gpu, ngpus_per_node, config, tmp_path): - distributed_init_test_default(gpu, ngpus_per_node, config) - data_loader = create_rank_dataloader(config, gpu) - model = safe_thread_call(partial(mobilenet_v2, pretrained=True)) - model.eval() - criterion = torch.nn.MSELoss().cuda(config.gpu) - config = register_default_init_args( - config, data_loader, criterion=criterion, autoq_eval_fn=lambda *x: 0, val_loader=data_loader - ) - quant_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - quant_model = post_compression_test_distr_init(compression_ctrl, config, ngpus_per_node, quant_model) - - # just to reproduce the same scale values without Dropout - quant_model.eval() - - act_bitwidth_per_scope = get_bitwidth_per_scope(quant_model.module) - out_file_path = get_path_to_bitwidth_dump(tmp_path, config.rank) - torch.save(act_bitwidth_per_scope, str(out_file_path)) - - -@pytest.mark.cuda -def test_can_broadcast_initialized_precisions_in_distributed_mode(tmp_path, runs_subprocess_in_precommit): - if not torch.cuda.is_available(): - pytest.skip("Skipping CUDA test cases for CPU only setups") - config_builder = HAWQConfigBuilder(batch_size=2, num_data_points=10).for_trial() - config = config_builder.build() - ngpus_per_node = torch.cuda.device_count() - config.world_size = ngpus_per_node - torch.multiprocessing.spawn( - precision_init_dumping_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, config, tmp_path), join=True - ) - - assert not compare_multi_gpu_dump(config, tmp_path, get_path_to_bitwidth_dump) - - -@pytest.mark.parametrize(("method_name", "expected_behavior"), [("_calc_traces", pytest.raises(nncf.InternalError))]) -def test_hawq_behaviour__if_method_returns_none(mocker, method_name, expected_behavior): - config = HAWQConfigBuilder().with_sample_size([1, 1, 4, 4]).for_trial().build() - config["compression"]["initializer"]["range"]["num_init_samples"] = 0 - model = BasicConvTestModel() - mock_train_loader = mocker.stub() - mock_train_loader.batch_size = 1 - device = "cuda" if torch.cuda.is_available() else "cpu" - config.register_extra_structs( - [ - QuantizationPrecisionInitArgs( - criterion_fn=mocker.stub(), criterion=mocker.stub(), data_loader=mock_train_loader, device=device - ) - ] - ) - mocker.patch("nncf.common.initialization.batchnorm_adaptation.BatchnormAdaptationAlgorithm.run") - mocked_calc_traces = mocker.patch( - "nncf.torch.quantization.precision_init.hawq_init.HAWQPrecisionInitializer._calc_traces" - ) - stub = mocker.stub() - stub.traces_order = TracesOrder([0]) - mocked_calc_traces.return_value = stub - - mocked_method = mocker.patch( - "nncf.torch.quantization.precision_init.hawq_init.HAWQPrecisionInitializer." + method_name - ) - mocked_method.return_value = None - - with expected_behavior: - create_compressed_model_and_algo_for_test(model, config) - - -def test_check_hawq_dump(mocker, tmp_path): - tensor1 = torch.Tensor([1]) - tensor2 = torch.Tensor([2]) - qconf1 = QuantizerConfig(num_bits=2) - qconf2 = QuantizerConfig(num_bits=4) - id_ = 0 - quantizer_configurations = [[qconf1, qconf1], [qconf2, qconf2]] - flops_per_config = [tensor1.item(), tensor2.item()] - choosen_config_index = id_ - metric_per_qconfig_sequence = [tensor1, tensor2] - perturbations = Perturbations() - perturbations.add(id_, qconf1, tensor1) - perturbations.add(id_, qconf2, tensor2) - perturbations.add(id_ + 1, qconf1, tensor2) - perturbations.add(id_ + 1, qconf2, tensor1) - - observer1 = PerturbationObserver(mocker.stub()) - observer1.perturbation = tensor1 - observer1.numels = id_ - observer1.input_norm = id_ - - observer2 = PerturbationObserver(mocker.stub()) - observer2.perturbation = tensor2 - observer2.numels = id_ - observer2.input_norm = id_ - weight_observers = [observer1, observer2] - traces_per_layer = TracesPerLayer(torch.cat((tensor1, tensor2))) - - set_debug_log_dir(str(tmp_path)) - hawq_debugger = HAWQDebugger( - quantizer_configurations, - perturbations, - [weight_observers, weight_observers], - traces_per_layer, - [qconf1.num_bits, qconf2.num_bits], - ) - - hawq_debugger.dump_metric_MB(metric_per_qconfig_sequence) - hawq_debugger.dump_metric_flops(metric_per_qconfig_sequence, flops_per_config, choosen_config_index) - hawq_debugger.dump_avg_traces() - hawq_debugger.dump_density_of_quantization_noise() - hawq_debugger.dump_perturbations_ratio() - test_dir = tmp_path / Path("hawq_dumps") - num_dump_files = len([name for name in os.listdir(test_dir) if os.path.isfile(os.path.join(test_dir, name))]) - assert num_dump_files == 6 - - -def get_quantization_config_with_ignored_scope(): - config = get_quantization_config_without_range_init() - config["compression"]["ignored_scopes"] = "ConvLinear/NNCFLinear[fc]" - return config - - -class RatioCalculatorTestDesc: - NAMES_OF_INSERTION_POINTS = [ - "/nncf_model_input_0|OUTPUT", - "ConvLinear/NNCFConv2d[conv1]/conv2d_0|WEIGHT", - "ConvLinear/NNCFConv2d[conv1]/conv2d_0|OUTPUT", - "ConvLinear/NNCFLinear[fc]/linear_0|WEIGHT", - ] - - def __init__(self, ref_ratio: float = 1): - self._bitwidth_sequence = [8] * len(self.NAMES_OF_INSERTION_POINTS) - self._config_factory = get_quantization_config_without_range_init - self._ignored_scopes = [] - self.ref_ratio = ref_ratio - - def bitwidths(self, bitwidth_sequence=list[int]): - self._bitwidth_sequence = bitwidth_sequence - return self - - def ignore_fc(self): - self._ignored_scopes = ["ConvLinear/NNCFLinear[fc]/linear_0"] - return self - - def create_config(self): - config = self._config_factory() - if self._ignored_scopes: - config["compression"]["ignored_scopes"] = self._ignored_scopes - return config - - def apply_to_quantizer_setup(self, quantizer_setup: SingleConfigQuantizerSetup) -> SingleConfigQuantizerSetup: - for i, bitwidth in enumerate(self._bitwidth_sequence): - ip_name = self.NAMES_OF_INSERTION_POINTS[i] - quantization_points = quantizer_setup.quantization_points.values() - found_qp = list(filter(lambda qp: str(qp.insertion_point) == ip_name, quantization_points)) - assert len(found_qp) == 1 - found_qp[0].qconfig.num_bits = bitwidth - return quantizer_setup - - def __str__(self): - is_ignored = "with_FC_ignored" if self._ignored_scopes else "all" - return "_".join([is_ignored, *map(str, self._bitwidth_sequence)]) - - -class ConvLinear(nn.Module): - CONV_FLOPS = 72 - LINEAR_FLOPS = 108 - - def __init__(self): - super().__init__() - self.conv1 = create_conv(1, 1, 2, -1, -2) - self.fc = nn.Linear(3, 6) - - def forward(self, x): - return self.fc(self.conv1(x)) - - -CONV_FLOPS = ConvLinear.CONV_FLOPS -LINEAR_FLOPS = ConvLinear.LINEAR_FLOPS -MAX_BITS_COMPLEXITY = (CONV_FLOPS + LINEAR_FLOPS) * 8 -R48 = MAX_BITS_COMPLEXITY / (CONV_FLOPS * 4 + LINEAR_FLOPS * 8) -R84 = MAX_BITS_COMPLEXITY / (CONV_FLOPS * 8 + LINEAR_FLOPS * 4) - -RATIO_CALCULATOR_TEST_DESCS = [ - RatioCalculatorTestDesc(ref_ratio=2.0).bitwidths([4, 4, 4, 4]), - RatioCalculatorTestDesc(ref_ratio=R48).bitwidths([4, 4, 4, 8]), - RatioCalculatorTestDesc(ref_ratio=R48).bitwidths([4, 4, 8, 4]), - RatioCalculatorTestDesc(ref_ratio=R48).bitwidths([4, 4, 8, 8]), - RatioCalculatorTestDesc(ref_ratio=R84).bitwidths([4, 8, 4, 4]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([4, 8, 4, 8]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([4, 8, 8, 4]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([4, 8, 8, 8]), - RatioCalculatorTestDesc(ref_ratio=R84).bitwidths([8, 4, 4, 4]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 4, 4, 8]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 4, 8, 4]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 4, 8, 8]), - RatioCalculatorTestDesc(ref_ratio=R84).bitwidths([8, 8, 4, 4]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 8, 4, 8]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 8, 8, 4]), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 8, 8, 8]), - RatioCalculatorTestDesc(ref_ratio=2.0).bitwidths([4, 4]).ignore_fc(), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([4, 8]).ignore_fc(), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 4]).ignore_fc(), - RatioCalculatorTestDesc(ref_ratio=1.0).bitwidths([8, 8]).ignore_fc(), -] - - -@pytest.mark.parametrize("desc", RATIO_CALCULATOR_TEST_DESCS, ids=map(str, RATIO_CALCULATOR_TEST_DESCS)) -def test_compression_ratio(desc, mocker): - config = desc.create_config() - register_bn_adaptation_init_args(config) - from nncf.torch.quantization.algo import QuantizationBuilder - - get_single_config_quantizer_setup_spy = mocker.spy(QuantizationBuilder, "_get_single_config_quantizer_setup") - model, ctrl = create_compressed_model_and_algo_for_test(ConvLinear(), config) - - quantizer_setup = get_single_config_quantizer_setup_spy.spy_return - weight_qp_id_per_activation_qp_id = ctrl.groups_of_adjacent_quantizers.weight_qp_id_per_activation_qp_id - flops_per_module = model.nncf.get_flops_per_module() - ratio_calculator = CompressionRatioCalculator(flops_per_module, quantizer_setup, weight_qp_id_per_activation_qp_id) - - quantizer_setup = desc.apply_to_quantizer_setup(quantizer_setup) - assert ratio_calculator.run_for_quantizer_setup(quantizer_setup) == desc.ref_ratio - - -def test_staged_quantization_saves_enabled_quantizers_in_state_dict(tmp_path): - config = get_quantization_config_without_range_init() - config["compression"]["params"] = {"activations_quant_start_epoch": 2, "weights_quant_start_epoch": 1} - register_bn_adaptation_init_args(config) - _, ctrl_save = create_compressed_model_and_algo_for_test(BasicConvTestModel(), config) - ctrl_save.scheduler.epoch_step() - ctrl_save.scheduler.epoch_step() - compression_state = ctrl_save.get_compression_state() - _, ctrl_load = create_compressed_model_and_algo_for_test( - BasicConvTestModel(), config, compression_state=compression_state - ) - for quantizer_info in ctrl_load.non_weight_quantizers.values(): - assert not quantizer_info.quantizer_module_ref.is_enabled_quantization() - for quantizer_info in ctrl_load.weight_quantizers.values(): - assert quantizer_info.quantizer_module_ref.is_enabled_quantization() diff --git a/tests/torch/quantization/test_hw_config.py b/tests/torch/quantization/test_hw_config.py deleted file mode 100644 index 4488e9bf02b..00000000000 --- a/tests/torch/quantization/test_hw_config.py +++ /dev/null @@ -1,218 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import torch - -from nncf.common.quantization.quantizer_setup import DEFAULT_QUANTIZER_CONFIG -from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.torch.dynamic_graph.io_handling import FillerInputElement -from nncf.torch.dynamic_graph.io_handling import FillerInputInfo -from nncf.torch.hardware.config import PTHWConfig -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.algo import QuantizationBuilder -from nncf.torch.quantization.algo import QuantizationController -from nncf.torch.quantization.layers import AsymmetricQuantizer -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import SymmetricQuantizer -from tests.torch.quantization.quantization_helpers import get_quantization_config_without_range_init - - -class ModelForHWConfigTest(torch.nn.Module): - CONV2D_OP_NODE_NAME = "ModelForHWConfigTest/NNCFConv2d[conv2d]/conv2d_0" - - def __init__(self, with_hardswish=False): - super().__init__() - self.with_hardswish = with_hardswish - self.conv2d = torch.nn.Conv2d(2, 1, 1) - - def forward(self, x_): - if self.with_hardswish: - x_ = torch.nn.functional.hardswish(x_) - x_ = self.conv2d(x_) - x_ = x_.matmul(x_) - return x_ - - -class TestHWConfigRules: - @staticmethod - def get_model_and_ctrl_with_applied_hw_config_quantization( - model: torch.nn.Module, hw_config_dict: dict, should_be_quantize_inputs: bool = True - ): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["compression"].update({"quantize_inputs": should_be_quantize_inputs}) - nncf_config["target_device"] = "ANY" # for compatibility - - net = NNCFNetwork(model, input_info=FillerInputInfo([FillerInputElement([1, 2, 1, 1])])) - hw_config = PTHWConfig.from_dict(hw_config_dict) - qbuilder = QuantizationBuilder(nncf_config, should_init=False) - qbuilder.hw_config = hw_config - net = qbuilder.apply_to(net) - ctrl = qbuilder.build_controller(net) - return net, ctrl - - @staticmethod - def quantizer_has_default_config(quantizer: BaseQuantizer) -> bool: - default_qconfig = DEFAULT_QUANTIZER_CONFIG - is_ok = True - is_ok &= quantizer.num_bits == default_qconfig.num_bits - is_ok &= quantizer.per_channel == default_qconfig.per_channel - if default_qconfig.signedness_to_force is not None: - is_ok &= quantizer.signed == default_qconfig.signedness_to_force - is_ok &= isinstance( - quantizer, SymmetricQuantizer if default_qconfig.mode == QuantizationMode.SYMMETRIC else AsymmetricQuantizer - ) - return is_ok - - @staticmethod - def get_quantizer_module_after_op_name(op_name: str, ctrl: QuantizationController) -> BaseQuantizer: - input_matches = list( - filter( - lambda x: op_name in x.target_node_name and x.input_port_id is None, ctrl.non_weight_quantizers.keys() - ) - ) - assert len(input_matches) == 1 - act_quant_key = input_matches[0] - act_quantizer_ref = ctrl.non_weight_quantizers[act_quant_key].quantizer_module_ref - return act_quantizer_ref - - def test_missing_ir_op_results_in_fp32(self): - hw_config_dict = { - "target_device": "test", - "config": { - "quantization": { - "q8_a": { - "bits": 8, - "mode": ["symmetric", "asymmetric"], - "granularity": "pertensor", - "narrow_range": False, - }, - } - }, - "operations": [ - {"type": "MatMul", "quantization": {"activations": "q8_a", "weights": "q8_a"}}, - ], - } - - _, ctrl = self.get_model_and_ctrl_with_applied_hw_config_quantization( - ModelForHWConfigTest(with_hardswish=False), hw_config_dict, False - ) - assert len(ctrl.weight_quantizers) == 0 # Conv2d weights remain unquantized - assert len(ctrl.non_weight_quantizers) == 1 # Only the matmul input is quantized - - key = next(iter(ctrl.non_weight_quantizers.keys())) - # Corresponds to a quantizer AFTER conv2d, i.e. matmul input quantizer - assert key.target_node_name == ModelForHWConfigTest.CONV2D_OP_NODE_NAME - - def test_missing_non_ir_op_results_in_default_qconf_list(self): - hw_config_dict = { - "target_device": "test", - "config": { - "quantization": { - "q4_a": { - "bits": 4, - "mode": ["symmetric", "asymmetric"], - "granularity": "pertensor", - "narrow_range": False, - }, - } - }, - "operations": [ - { - "type": "MatMul", - "quantization": {"activations": "q4_a", "weights": "q4_a"}, - }, - {"type": "Convolution", "quantization": {"activations": "q4_a", "weights": "q4_a"}}, - ], - } - - _, ctrl = self.get_model_and_ctrl_with_applied_hw_config_quantization( - ModelForHWConfigTest(with_hardswish=True), hw_config_dict - ) - assert len(ctrl.weight_quantizers) == 1 # Conv2d weights quantized - assert len(ctrl.non_weight_quantizers) == 2 # conv2d input, matmul input (single in this case) - - w_key = next(iter(ctrl.weight_quantizers.keys())) - assert str(w_key.target_node_name) == ModelForHWConfigTest.CONV2D_OP_NODE_NAME - - def test_unspecified_quantization_for_fundamentally_quantizable_op_results_in_default_qconfig(self): - hw_config_dict = { # Only the MatMul will receive a default config here (8 bit symmetric per-tensor) - "target_device": "test", - "config": { - "quantization": { - "q4_a": { - "bits": 4, - "mode": ["symmetric", "asymmetric"], - "granularity": "pertensor", - "narrow_range": False, - }, - } - }, - "operations": [ - {"type": "MatMul"}, - {"type": "Convolution", "quantization": {"activations": "q4_a", "weights": "q4_a"}}, - ], - } - - _, ctrl = self.get_model_and_ctrl_with_applied_hw_config_quantization( - ModelForHWConfigTest(with_hardswish=False), hw_config_dict, False - ) - assert len(ctrl.weight_quantizers) == 1 # Conv2d weights quantized - conv2d_weight_quantizer_ref = list(ctrl.weight_quantizers.values())[0].quantizer_module_ref - assert not self.quantizer_has_default_config(conv2d_weight_quantizer_ref) - - assert len(ctrl.non_weight_quantizers) == 1 # Matmul input - matmul_input_matches = list( - filter( - lambda x: x.target_node_name == ModelForHWConfigTest.CONV2D_OP_NODE_NAME, - ctrl.non_weight_quantizers.keys(), - ) - ) - - assert len(matmul_input_matches) == 1 - matmul_quantizer_ref = ctrl.non_weight_quantizers[matmul_input_matches[0]].quantizer_module_ref - assert self.quantizer_has_default_config(matmul_quantizer_ref) - - non_matmul_input_matches = list( - filter( - lambda x: x.target_node_name != ModelForHWConfigTest.CONV2D_OP_NODE_NAME, - ctrl.non_weight_quantizers.keys(), - ) - ) - for quantizer_id in non_matmul_input_matches: - quantizer_ref = ctrl.non_weight_quantizers[quantizer_id].quantizer_module_ref - assert not self.quantizer_has_default_config(quantizer_ref) - - def test_unspecified_quantization_for_weighted_op_results_in_default_qconf_list_for_weights(self): - hw_config_dict = { - "target_device": "test", - "config": { - "quantization": { - "q4_a": { - "bits": 4, - "mode": ["symmetric", "asymmetric"], - "granularity": "pertensor", - "narrow_range": False, - }, - } - }, - "operations": [ - {"type": "MatMul"}, - {"type": "Convolution"}, - ], - } - - _, ctrl = self.get_model_and_ctrl_with_applied_hw_config_quantization( - ModelForHWConfigTest(with_hardswish=False), hw_config_dict - ) - assert len(ctrl.weight_quantizers) == 1 # Conv2d weights quantized with default config - assert len(ctrl.non_weight_quantizers) == 2 # All inputs are quantized. - for quantizer_ref in ctrl.all_quantizations.values(): - assert self.quantizer_has_default_config(quantizer_ref) diff --git a/tests/torch/quantization/test_logarithm_scale.py b/tests/torch/quantization/test_logarithm_scale.py deleted file mode 100644 index 5ce396d9a23..00000000000 --- a/tests/torch/quantization/test_logarithm_scale.py +++ /dev/null @@ -1,103 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import itertools - -import pytest -import torch - -import nncf -from nncf import NNCFConfig -from nncf.torch.initialization import PTInitializingDataLoader -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args - -SAMPLE_SIZE = [1, 1, 4, 4] - - -def get_config_for_logarithm_scale(logarithm_scale: bool, quantization_type: str) -> NNCFConfig: - nncf_config = NNCFConfig() - nncf_config.update( - { - "input_info": {"sample_size": SAMPLE_SIZE}, - "target_device": "TRIAL", - "compression": { - "algorithm": "quantization", - "initializer": { - "range": { - "num_init_samples": 4, - "type": "percentile", - "params": {"min_percentile": 0.001, "max_percentile": 99.999}, - } - }, - "activations": {"mode": quantization_type, "logarithm_scale": logarithm_scale}, - "weights": {"mode": quantization_type, "signed": True, "logarithm_scale": logarithm_scale}, - }, - } - ) - - class RandDatasetMock: - def __getitem__(self, index): - return torch.rand(*SAMPLE_SIZE) - - def __len__(self): - return 4 - - data_loader = torch.utils.data.DataLoader(RandDatasetMock(), batch_size=1, shuffle=False, drop_last=True) - - class SquadInitializingDataloader(PTInitializingDataLoader): - def get_inputs(self, dataloader_output): - return dataloader_output, {} - - def get_target(self, dataloader_output): - return None - - initializing_data_loader = SquadInitializingDataloader(data_loader) - init_range = nncf.config.structures.QuantizationRangeInitArgs(initializing_data_loader) - nncf_config.register_extra_structs([init_range]) - register_bn_adaptation_init_args(nncf_config) - - return nncf_config - - -@pytest.mark.parametrize( - ["logarithm_scale_setting_1", "logarithm_scale_setting_2", "quantization_type"], - list(itertools.product((True, False), (True, False), ("symmetric", "asymmetric"))), -) -def test_logarithm_scale_parameter(logarithm_scale_setting_1, logarithm_scale_setting_2, quantization_type): - for logarithm_scales in [[False, True], [True, False]]: - for symmetric in [False, True]: - model0, _ = create_compressed_model_and_algo_for_test( - TwoConvTestModel(), - get_config_for_logarithm_scale( - logarithm_scale=logarithm_scale_setting_1, quantization_type=quantization_type - ), - ) - - model1, _ = create_compressed_model_and_algo_for_test( - TwoConvTestModel(), - get_config_for_logarithm_scale( - logarithm_scale=logarithm_scale_setting_2, quantization_type=quantization_type - ), - ) - - sd0 = model0.state_dict() - model1.load_state_dict(sd0) - sd1 = model1.state_dict() - - for k, v0 in sd0.items(): - v1 = sd1[k] - diff = (v1 - v0).abs().sum().item() / v1.numel() - err_msg = ( - f"symmetric {symmetric} logarithm_scales {logarithm_scales} param {k}" - f" is corrupted mean({v0}-{v1})={diff}" - ) - assert diff < 1e-6, err_msg diff --git a/tests/torch/quantization/test_onnx_export.py b/tests/torch/quantization/test_onnx_export.py deleted file mode 100644 index d7f5e2413a4..00000000000 --- a/tests/torch/quantization/test_onnx_export.py +++ /dev/null @@ -1,360 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import onnx -import pytest -import torch -from torch import nn - -from nncf import NNCFConfig -from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.layers import AsymmetricQuantizer -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import PTQuantizerSpec -from nncf.torch.quantization.layers import QuantizerExportMode -from nncf.torch.quantization.layers import SymmetricQuantizer -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import get_all_inputs_for_graph_node -from tests.torch.helpers import get_nodes_by_type -from tests.torch.helpers import load_exported_onnx_version -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.helpers import resolve_constant_node_inputs_to_values - - -def get_config_for_export_mode(should_be_onnx_standard: bool) -> NNCFConfig: - nncf_config = NNCFConfig() - nncf_config.update( - { - "input_info": {"sample_size": [1, 1, 4, 4]}, - "compression": {"algorithm": "quantization", "export_to_onnx_standard_ops": should_be_onnx_standard}, - } - ) - register_bn_adaptation_init_args(nncf_config) - return nncf_config - - -def test_onnx_export_to_fake_quantize(tmp_path): - model = TwoConvTestModel() - nncf_config = get_config_for_export_mode(should_be_onnx_standard=False) - onnx_model_proto = load_exported_onnx_version(nncf_config, model, path_to_storage_dir=tmp_path) - num_fq = 0 - num_model_nodes = 0 - num_other_nodes = 0 - - for node in onnx_model_proto.graph.node: - op_type = node.op_type - if op_type == "FakeQuantize": - num_fq += 1 - elif op_type in ["Conv", "Constant"]: - num_model_nodes += 1 - else: - num_other_nodes += 1 - assert num_fq == 4 - assert num_other_nodes == 0 - - -def test_onnx_export_to_quantize_dequantize(tmp_path): - # It doesn't work with CPU target_device because - # per-channel quantization is not supported in onnxruntime. - model = TwoConvTestModel() - nncf_config = get_config_for_export_mode(should_be_onnx_standard=True) - nncf_config["target_device"] = "TRIAL" - onnx_model_proto = load_exported_onnx_version( - nncf_config, model, path_to_storage_dir=tmp_path, save_format="onnx_13" - ) - num_q = 0 - num_dq = 0 - num_model_nodes = 0 - num_other_nodes = 0 - - for node in onnx_model_proto.graph.node: - op_type = node.op_type - if op_type == "QuantizeLinear": - num_q += 1 - elif op_type == "DequantizeLinear": - num_dq += 1 - elif op_type in ["Conv", "Constant"]: - num_model_nodes += 1 - else: - num_other_nodes += 1 - assert num_q == 4 - assert num_q == num_dq - assert num_other_nodes == 0 - - -INPUT_TENSOR_SHAPE = (2, 64, 15, 10) -PER_CHANNEL_AQ_SCALE_SHAPE = (1, INPUT_TENSOR_SHAPE[1], 1, 1) - - -@pytest.mark.parametrize( - "export_mode", (QuantizerExportMode.FAKE_QUANTIZE, QuantizerExportMode.ONNX_QUANTIZE_DEQUANTIZE_PAIRS) -) -def test_onnx_export_to_quantize_dequantize_per_channel( - is_per_channel: bool, quantization_mode: QuantizationMode, export_mode: QuantizerExportMode -): - scale_shape = PER_CHANNEL_AQ_SCALE_SHAPE if is_per_channel else (1,) - qspec = PTQuantizerSpec( - scale_shape=scale_shape, - num_bits=8, - mode=quantization_mode, - signedness_to_force=None, - logarithm_scale=False, - narrow_range=False, - half_range=False, - is_quantized_on_export=False, - ) - - q_cls = QUANTIZATION_MODULES.get(quantization_mode) - quantizer = q_cls(qspec) - if quantization_mode is QuantizationMode.SYMMETRIC: - quantizer.scale = torch.nn.Parameter(torch.rand_like(quantizer.scale)) - else: - quantizer.input_low = torch.nn.Parameter(torch.rand_like(quantizer.input_low)) - quantizer.input_range = torch.nn.Parameter(torch.rand_like(quantizer.input_range)) - - quantizer._export_mode = export_mode - - x = torch.rand(INPUT_TENSOR_SHAPE) - quantizer.run_export_quantization(x) - - -class TargetCompressionIdxTestModel(torch.nn.Module): - CONV2D_TARGET_CHANNEL_COUNT = 5 - CONV2D_TRANSPOSE_TARGET_CHANNEL_COUNT = 10 - - def __init__(self): - super().__init__() - self.conv = torch.nn.Conv2d(in_channels=1, out_channels=self.CONV2D_TARGET_CHANNEL_COUNT, kernel_size=(1, 1)) - self.conv_t = torch.nn.ConvTranspose2d( - in_channels=self.CONV2D_TARGET_CHANNEL_COUNT, - out_channels=self.CONV2D_TRANSPOSE_TARGET_CHANNEL_COUNT, - kernel_size=(1, 1), - ) - - def forward(self, x): - x = self.conv(x) - x = self.conv_t(x) - return x - - -def get_weight_fq_for_conv_node(node: onnx.NodeProto, graph: onnx.GraphProto): - weight_input_tensor_id = node.input[1] - matches = [x for x in graph.node if weight_input_tensor_id in x.output] - assert len(matches) == 1 - match = next(iter(matches)) - assert match.op_type == "FakeQuantize" - return match - - -def get_input_low_input_high_for_wfq_node( - wfq_node: onnx.NodeProto, graph: onnx.GraphProto -) -> tuple[onnx.AttributeProto, onnx.AttributeProto]: - assert wfq_node.op_type == "FakeQuantize" - conv_wfq_inputs = list(resolve_constant_node_inputs_to_values(wfq_node, graph).values()) - return conv_wfq_inputs[1], conv_wfq_inputs[2] - - -def test_target_compression_idx(tmp_path): - model = TargetCompressionIdxTestModel() - nncf_config = get_config_for_export_mode(should_be_onnx_standard=False) - onnx_model_proto = load_exported_onnx_version(nncf_config, model, path_to_storage_dir=tmp_path) - onnx_graph = onnx_model_proto.graph - conv_nodes = get_nodes_by_type(onnx_model_proto, "Conv") - assert len(conv_nodes) == 1 - conv_node = next(iter(conv_nodes)) - conv_wfq_node = get_weight_fq_for_conv_node(conv_node, onnx_graph) - input_low_attr, input_high_attr = get_input_low_input_high_for_wfq_node(conv_wfq_node, onnx_graph) - assert input_low_attr.shape == (TargetCompressionIdxTestModel.CONV2D_TARGET_CHANNEL_COUNT, 1, 1, 1) - assert input_low_attr.shape == input_high_attr.shape - - conv_t_nodes = get_nodes_by_type(onnx_model_proto, "ConvTranspose") - assert len(conv_t_nodes) == 1 - conv_t_node = next(iter(conv_t_nodes)) - conv_t_wfq_node = get_weight_fq_for_conv_node(conv_t_node, onnx_graph) - input_low_t_attr, input_high_t_attr = get_input_low_input_high_for_wfq_node(conv_t_wfq_node, onnx_graph) - assert input_low_t_attr.shape == (1, TargetCompressionIdxTestModel.CONV2D_TRANSPOSE_TARGET_CHANNEL_COUNT, 1, 1) - assert input_low_t_attr.shape == input_high_t_attr.shape - - -class ModelWithBranches(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv_1 = torch.nn.Conv2d(2, 2, (1, 1)) - self.conv_2 = torch.nn.Conv2d(2, 2, (1, 1), groups=2) - self.conv_3 = torch.nn.Conv2d(2, 2, (1, 1), groups=2) - - def forward(self, x): - x1 = self.conv_1(x) - x2 = self.conv_2(x) - x3 = self.conv_3(x) - x4 = x + x - return x1, x2, x3, x4 - - -def get_successors(node: onnx.NodeProto, graph: onnx.GraphProto) -> list[onnx.NodeProto]: - retval = [] - for output_name in node.output: - for target_node in graph.node: - if output_name in target_node.input: - retval.append(target_node) - return retval - - -@pytest.mark.parametrize( - "export_mode", [QuantizerExportMode.FAKE_QUANTIZE, QuantizerExportMode.ONNX_QUANTIZE_DEQUANTIZE_PAIRS] -) -def test_branching_fqs_are_not_chained(tmp_path, export_mode): - nncf_config = NNCFConfig.from_dict( - { - "input_info": {"sample_size": [1, 2, 2, 2]}, - "compression": { - "algorithm": "quantization", - "preset": "mixed", - "ignored_scopes": ["/nncf_model_input_0", "{re}.*__add__.*"], - "initializer": { - "range": {"num_init_samples": 0}, - "batchnorm_adaptation": {"num_bn_adaptation_samples": 0}, - }, - }, - } - ) - onnx_model_proto = load_exported_onnx_version(nncf_config, ModelWithBranches(), path_to_storage_dir=tmp_path) - target_node_type = "FakeQuantize" if export_mode is QuantizerExportMode.FAKE_QUANTIZE else "DequantizeLinear" - quantizer_nodes = get_nodes_by_type(onnx_model_proto, target_node_type) - # Quantizer nodes should, for this model, immediately be followed by the quantized operation. Chained quantizers - # mean that the ONNX export was incorrect. - - follower_node_lists = [get_successors(x, onnx_model_proto.graph) for x in quantizer_nodes] - follower_nodes = [] - for lst in follower_node_lists: - follower_nodes += lst - follower_node_types = [x.op_type for x in follower_nodes] - assert not any(x == target_node_type for x in follower_node_types) - - -def set_parameters_to_quantizer_and_get_attrs( - quantizer: BaseQuantizer, paramaters_to_set: dict -) -> tuple[np.ndarray, np.ndarray, int]: - if isinstance(quantizer, SymmetricQuantizer): - return set_scale_to_sym_quantizer_and_get_attrs(quantizer, **paramaters_to_set) - return set_input_low_and_input_range_to_asym_quantizer_and_get_attrs(quantizer, **paramaters_to_set) - - -def set_scale_to_sym_quantizer_and_get_attrs( - quantizer: SymmetricQuantizer, scale: float -) -> tuple[np.ndarray, np.ndarray, int]: - scale = np.full(quantizer.scale.size(), scale) - levels = quantizer.levels - level_low = quantizer.level_low - level_high = quantizer.level_high - input_low = scale * (level_low / level_high) - input_range = scale - input_low - quant_len = input_range / (levels - 1) - quantizer.scale = nn.Parameter(torch.from_numpy(scale.astype(np.single))) - return input_low, quant_len, levels - - -def set_input_low_and_input_range_to_asym_quantizer_and_get_attrs( - quantizer: AsymmetricQuantizer, input_low: float, input_range: float -) -> tuple[np.ndarray, np.ndarray, int]: - input_low = np.full(quantizer.input_low.size(), input_low) - input_range = np.full(quantizer.input_low.size(), input_range) - levels = quantizer.levels - quant_len = input_range / (levels - 1) - quantizer.input_low = nn.Parameter(torch.from_numpy(input_low.astype(np.single))) - quantizer.input_range = nn.Parameter(torch.from_numpy(input_range.astype(np.single))) - return input_low, quant_len, levels - - -def generate_middle_quants( - size: list[int], input_low: np.ndarray, quant_len: np.ndarray, levels: np.ndarray -) -> torch.Tensor: - ref_weights = [input_low + (i + 0.5) * quant_len for i in range(levels)] - elems = np.prod(size) - ref_weights = ref_weights * int(np.round(0.5 + elems / levels)) - ref_weights = np.reshape(np.array(ref_weights).flatten()[:elems], size, "F") - return torch.from_numpy(ref_weights.astype(np.single)) - - -@pytest.mark.parametrize( - "quantization_mode, parameters_to_set", - [("symmetric", {"scale": 1.0}), ("asymmetric", {"input_low": -1.0, "input_range": 3.0})], -) -def test_export_quantized_weights_with_middle_quants(tmp_path, is_half_range, quantization_mode, parameters_to_set): - model = TwoConvTestModel() - sample_size = [1, 1, 20, 20] - config = get_config_for_export_mode(False) - config["compression"]["weights"] = {"mode": quantization_mode} - if not is_half_range: - config["compression"]["overflow_fix"] = "disable" - config["input_info"]["sample_size"] = sample_size - - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - quantizers = compression_ctrl.weight_quantizers.values() - for quantizer in quantizers: - input_low, quant_len, levels = set_parameters_to_quantizer_and_get_attrs( - quantizer.quantizer_module_ref, parameters_to_set - ) - ref_weights = generate_middle_quants( - list(quantizer.quantized_module.weight.size()), input_low, quant_len, levels - ) - quantizer.quantized_module.weight = nn.Parameter(ref_weights) - - onnx_checkpoint_path = str(tmp_path / "two_conv_model_int8.onnx") - compression_ctrl.export_model(onnx_checkpoint_path) - model_onnx = onnx.load(onnx_checkpoint_path) - - fq_nodes = get_nodes_by_type(model_onnx, "FakeQuantize") - - inputs = [get_all_inputs_for_graph_node(fq_node, model_onnx.graph) for fq_node in fq_nodes] - - for quantizer, fq_parametres in zip(quantizers, inputs[1::2]): - tensor_weight, _, __ = list(fq_parametres.values()) - # Quantize weights as they are exported quantized - quantized_weights = quantizer.quantizer_module_ref(quantizer.quantized_module.weight).detach() - - diff = (quantized_weights.detach() - tensor_weight).abs() - if (diff > 1e-6).any(): - assert ((diff[diff > 1e-6] - quant_len).abs() < 1e-6).all(), "quants completely different!" - assert False, f"quant moved at flatten positions {torch.where(diff.flatten() > 1e-6)}" - - -def test_torch_onnx_export(tmp_path): - model = TwoConvTestModel() - nncf_config = get_config_for_export_mode(should_be_onnx_standard=False) - - compression_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - - onnx_checkpoint_path = tmp_path / "model.onnx" - compression_ctrl.prepare_for_export() - - dummy_input = torch.randn(1, 1, 4, 4) - torch.onnx.export(compression_model, dummy_input, onnx_checkpoint_path, verbose=False, dynamo=False) - onnx_model_proto = onnx.load_model(onnx_checkpoint_path) - - num_fq = 0 - num_model_nodes = 0 - num_other_nodes = 0 - - for node in onnx_model_proto.graph.node: - op_type = node.op_type - if op_type == "FakeQuantize": - num_fq += 1 - elif op_type in ["Conv", "Constant"]: - num_model_nodes += 1 - else: - num_other_nodes += 1 - assert num_fq == 4 - assert num_other_nodes == 0 diff --git a/tests/torch/quantization/test_overflow_issue_export.py b/tests/torch/quantization/test_overflow_issue_export.py deleted file mode 100644 index a13b36b9834..00000000000 --- a/tests/torch/quantization/test_overflow_issue_export.py +++ /dev/null @@ -1,418 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import onnx -import onnxruntime as rt -import pytest -import torch -from torch import nn - -from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.torch.checkpoint_loading import load_state -from nncf.torch.quantization.layers import AsymmetricQuantizer -from nncf.torch.quantization.layers import PTQuantizerSpec -from nncf.torch.quantization.layers import SymmetricQuantizer -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_conv -from tests.torch.helpers import get_all_inputs_for_graph_node -from tests.torch.helpers import get_nodes_by_type -from tests.torch.quantization.test_onnx_export import get_config_for_export_mode - - -@pytest.mark.parametrize( - "num_bits, mode, scale_shape, half_range, assert_vals", - [ - (8, QuantizationMode.SYMMETRIC, (1, 2, 3, 4), True, (128, -64, 63)), - (8, QuantizationMode.ASYMMETRIC, (1, 2, 3, 4), True, (128, 0, 127)), - (7, QuantizationMode.SYMMETRIC, (1, 2, 3, 4), True, (64, -32, 31)), - (4, QuantizationMode.SYMMETRIC, (1, 1, 1, 1), True, (8, -4, 3)), - (8, QuantizationMode.SYMMETRIC, (1, 1, 1, 1), True, (128, -64, 63)), - (8, QuantizationMode.SYMMETRIC, (1, 2, 3, 8), False, (256, -128, 127)), - ], -) -def test_is_correct_overflow_issue_levels(num_bits, mode, scale_shape, half_range, assert_vals): - qspec = PTQuantizerSpec( - num_bits=num_bits, - mode=mode, - signedness_to_force=True, - narrow_range=False, - scale_shape=scale_shape, - logarithm_scale=False, - half_range=half_range, - is_quantized_on_export=True, - ) - - quantizer = SymmetricQuantizer(qspec) if mode == QuantizationMode.SYMMETRIC else AsymmetricQuantizer(qspec) - - assert quantizer._half_range == half_range - assert quantizer.levels == assert_vals[0] - assert quantizer.level_low == assert_vals[1] - assert quantizer.level_high == assert_vals[2] - - -def helper_to_test_if_overflow_fix_was_applied(nncf_config, target_device): - model = TwoConvTestModel() - nncf_config.update({"target_device": target_device}) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - - for quantizer in compression_ctrl.weight_quantizers.values(): - assert quantizer.quantizer_module_ref._half_range - assert quantizer.quantizer_module_ref.levels == 128 - assert quantizer.quantizer_module_ref.level_low == -64 - assert quantizer.quantizer_module_ref.level_high == 63 - - for quantizer in compression_ctrl.non_weight_quantizers.values(): - assert not quantizer.quantizer_module_ref._half_range - - -def helper_to_test_if_overflow_fix_was_applied_only_to_first_conv_later(nncf_config, target_device): - model = TwoConvTestModel() - nncf_config.update({"target_device": target_device}) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - - for idx, quantizer in enumerate(compression_ctrl.weight_quantizers.values()): - if idx == 0: - assert quantizer.quantizer_module_ref._half_range - else: - assert not quantizer.quantizer_module_ref._half_range - for quantizer in compression_ctrl.non_weight_quantizers.values(): - assert not quantizer.quantizer_module_ref._half_range - - -def helper_to_test_if_overflow_fix_wasnt_applied(nncf_config, target_device): - model = TwoConvTestModel() - nncf_config.update({"target_device": target_device}) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - - for quantizer in compression_ctrl.weight_quantizers.values(): - assert not quantizer.quantizer_module_ref._half_range - for quantizer in compression_ctrl.non_weight_quantizers.values(): - assert not quantizer.quantizer_module_ref._half_range - - -def test_config_option_disable_overflow_fix(): - nncf_config = get_config_for_export_mode(True) - nncf_config.update({"compression": {"algorithm": "quantization", "overflow_fix": "disable"}}) - - for device in ["CPU", "ANY", "NPU", "GPU", "TRIAL"]: - helper_to_test_if_overflow_fix_wasnt_applied(nncf_config, device) - - nncf_config.update({"compression": {"algorithm": "quantization", "overflow_fix": "enable"}}) - - for device in ["CPU", "ANY"]: - helper_to_test_if_overflow_fix_was_applied(nncf_config, device) - - for device in ["NPU", "GPU", "TRIAL"]: - helper_to_test_if_overflow_fix_wasnt_applied(nncf_config, device) - - nncf_config.update({"compression": {"algorithm": "quantization", "overflow_fix": "first_layer_only"}}) - - for device in ["CPU", "ANY"]: - helper_to_test_if_overflow_fix_was_applied_only_to_first_conv_later(nncf_config, device) - - for device in ["NPU", "GPU", "TRIAL"]: - helper_to_test_if_overflow_fix_wasnt_applied(nncf_config, device) - - -def test_hw_config_overflow_fix_applied(): - nncf_config = get_config_for_export_mode(True) - - for device in ["CPU", "ANY"]: - helper_to_test_if_overflow_fix_was_applied(nncf_config, device) - - for device in ["NPU", "GPU", "TRIAL"]: - helper_to_test_if_overflow_fix_wasnt_applied(nncf_config, device) - - -class EightConvTestModel(nn.Module): - def __init__(self, in_out_ch=((1, 3), (3, 5), (5, 7), (7, 10))): - super().__init__() - self.features = [] - self.features.append(create_conv(*in_out_ch[0], 2, -1, -2)) - self.features.append(nn.BatchNorm2d(in_out_ch[0][1])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*in_out_ch[1], 5, 1, 1)) - self.features.append(nn.BatchNorm2d(in_out_ch[1][1])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*in_out_ch[2], 1, 2, 2)) - self.features.append(nn.BatchNorm2d(in_out_ch[2][1])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*in_out_ch[3], 9, -1, 0)) - self.features.append(nn.BatchNorm2d(in_out_ch[3][1])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*reversed(in_out_ch[3]), 3, 0, 1)) - self.features.append(nn.BatchNorm2d(in_out_ch[3][0])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*reversed(in_out_ch[2]), 1, -1, 9)) - self.features.append(nn.BatchNorm2d(in_out_ch[2][0])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*reversed(in_out_ch[1]), 2, 10, 1)) - self.features.append(nn.BatchNorm2d(in_out_ch[1][0])) - self.features.append(nn.ReLU()) - self.features.append(create_conv(*reversed(in_out_ch[0]), 1, 1, 1)) - self.features.append(nn.BatchNorm2d(in_out_ch[0][0])) - self.features.append(nn.ReLU()) - self.features = nn.Sequential(*self.features) - - def forward(self, x): - return self.features(x) - - -class DepthWiseConvTestModel(nn.Module): - def __init__(self): - super().__init__() - self.features = [] - self.features.append(nn.Conv2d(1, 3, 3, groups=1)) - self.features.append(nn.Conv2d(3, 30, 3, groups=3)) - self.features.append(nn.Conv2d(30, 1, 3)) - self.features = nn.Sequential(*self.features) - - def forward(self, x): - return self.features(x) - - -def are_symmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl): - level_high = 63 - level_low = -64 - levels = 128 - # Update scale tensors in the model - quantizers = compression_ctrl.weight_quantizers.values() - with torch.no_grad(): - for quantizer in quantizers: - assert quantizer.quantizer_module_ref.levels == levels - assert quantizer.quantizer_module_ref._half_range - assert quantizer.quantizer_module_ref.level_low == level_low - assert quantizer.quantizer_module_ref.level_high == level_high - quantizer.quantizer_module_ref.scale = torch.nn.Parameter( - 5 * torch.rand_like(quantizer.quantizer_module_ref.scale, dtype=torch.float32, requires_grad=True) - ) - - onnx_checkpoint_path = str(tmp_path / "model.onnx") - compression_ctrl.export_model(onnx_checkpoint_path, input_names=["input"]) - - onnx_model = onnx.load(onnx_checkpoint_path) - - # Find weight tensors in ONNX model - fq_nodes = get_nodes_by_type(onnx_model, "FakeQuantize") - - inputs = [get_all_inputs_for_graph_node(fq_node, onnx_model.graph) for fq_node in fq_nodes] - - level_high_ratio = (2 * level_high + 1) / level_high - level_positive_negative_ratio = abs(level_low / level_high) - - for quantizer, fq_parametres in zip(quantizers, inputs[1::2]): - tensor_weight, input_output_low, input_output_high = list(fq_parametres.values()) - quantizer_scale = quantizer.quantizer_module_ref.scale - - # Quantize weights as they are exported quantized - quantized_weights = quantizer.quantizer_module_ref(quantizer.quantized_module.weight).detach() - - assert np.allclose(tensor_weight, np.array(quantized_weights)) - assert np.allclose(level_high_ratio * quantizer_scale.detach().numpy(), input_output_high) - assert np.allclose(-2.0 * level_positive_negative_ratio * quantizer_scale.detach().numpy(), input_output_low) - - -def are_asymmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl): - level_high = 127 - level_low = 0 - levels = 128 - # Update scale tensors in the model - quantizers = compression_ctrl.weight_quantizers.values() - with torch.no_grad(): - for quantizer in quantizers: - assert quantizer.quantizer_module_ref.levels == levels - assert quantizer.quantizer_module_ref._half_range - assert quantizer.quantizer_module_ref.level_low == level_low - assert quantizer.quantizer_module_ref.level_high == level_high - quantizer.quantizer_module_ref.input_range = torch.nn.Parameter( - 5 * torch.rand_like(quantizer.quantizer_module_ref.input_range, dtype=torch.float32, requires_grad=True) - ) - - onnx_checkpoint_path = str(tmp_path / "model.onnx") - compression_ctrl.export_model(onnx_checkpoint_path, input_names=["input"]) - - onnx_model = onnx.load(onnx_checkpoint_path) - - # Find weight tensors in ONNX model - fq_nodes = get_nodes_by_type(onnx_model, "FakeQuantize") - - inputs = [get_all_inputs_for_graph_node(fq_node, onnx_model.graph) for fq_node in fq_nodes] - - level_high_ratio = (2 * level_high + 1) / level_high - for quantizer, fq_parametres in zip(quantizers, inputs[1::2]): - tensor_weight, input_output_low, input_output_high = list(fq_parametres.values()) - quantizer_input_range = quantizer.quantizer_module_ref.input_range - quantizer_input_low = quantizer.quantizer_module_ref.input_low - - # Quantize weights as they are exported quantized - quantized_weights = quantizer.quantizer_module_ref(quantizer.quantized_module.weight).detach() - - assert np.allclose(tensor_weight, np.array(quantized_weights)) - assert np.allclose(level_high_ratio * quantizer_input_range.detach().numpy(), input_output_high) - assert np.allclose(quantizer_input_low.detach().numpy(), input_output_low) - - -def test_are_symmetric_fq_exported_depthwise_per_channel_weights_tensors_clipped(tmp_path): - model = DepthWiseConvTestModel() - nncf_config = get_config_for_export_mode(False) - nncf_config.update({"input_info": {"sample_size": [1, 1, 20, 20]}}) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - are_symmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl) - - -def test_are_asymmetric_fq_exported_depthwise_per_channel_weights_tensors_clipped(tmp_path): - model = DepthWiseConvTestModel() - nncf_config = get_config_for_export_mode(False) - nncf_config.update({"input_info": {"sample_size": [1, 1, 20, 20]}}) - nncf_config.update( - { - "compression": { - "algorithm": "quantization", - "export_to_onnx_standard_ops": False, - "weights": {"mode": "asymmetric"}, - } - } - ) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - are_asymmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl) - - -def test_are_symmetric_fq_exported_per_channel_weights_tensors_clipped(tmp_path): - in_out_ch = [[1, 3], [3, 5], [5, 7], [7, 10]] - model = EightConvTestModel(in_out_ch) - nncf_config = get_config_for_export_mode(False) - nncf_config.update({"input_info": {"sample_size": [1, 1, 20, 20]}}) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - are_symmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl) - - -def test_are_assymetric_fq_exported_per_channel_weights_tensors_clipped(tmp_path): - in_out_ch = [[1, 3], [3, 5], [5, 7], [7, 10]] - model = EightConvTestModel(in_out_ch) - nncf_config = get_config_for_export_mode(False) - nncf_config.update({"input_info": {"sample_size": [1, 1, 20, 20]}}) - nncf_config.update( - { - "compression": { - "algorithm": "quantization", - "export_to_onnx_standard_ops": False, - "weights": {"mode": "asymmetric"}, - } - } - ) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - are_asymmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl) - - -def test_are_qdq_exported_per_tensor_weights_tensors_clipped(tmp_path): - model = TwoConvTestModel() - nncf_config = get_config_for_export_mode(True) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - # Set scale tensors - first_scale_tensor = torch.tensor((0.1, 0.1), dtype=torch.float32, requires_grad=True) - second_scale_tensor = torch.tensor(3000, dtype=torch.float32, requires_grad=True) - - # Update scale tensors in the model - first_quantizer, second_quantizer = compression_ctrl.weight_quantizers.values() - - first_quantizer.quantizer_module_ref.scale = torch.nn.Parameter(first_scale_tensor) - second_quantizer.quantizer_module_ref.scale = torch.nn.Parameter(second_scale_tensor) - - for quantizer in [first_quantizer, second_quantizer]: - assert quantizer.quantizer_module_ref.levels == 128 - assert quantizer.quantizer_module_ref.level_low == -64 - assert quantizer.quantizer_module_ref.level_high == 63 - assert quantizer.quantizer_module_ref._half_range - - onnx_checkpoint_path = str(tmp_path / "model.onnx") - compression_ctrl.export_model(onnx_checkpoint_path, input_names=["input"], save_format="onnx_13") - - onnx_model = onnx.load(onnx_checkpoint_path) - - # Find weight tensors in ONNX model - quantize_nodes = get_nodes_by_type(onnx_model, "QuantizeLinear") - - inputs = [get_all_inputs_for_graph_node(fq_node, onnx_model.graph) for fq_node in quantize_nodes] - - level_high_ratio = 127.0 / 63.0 - level_positive_negative_ratio = 64.0 / 63.0 - - for quantizer, onnx_q_parametres in zip([first_quantizer, second_quantizer], inputs[1::2]): - onnx_tensor_weight, onnx_q_scale, onnx_zero_level = list(onnx_q_parametres.values()) - quantizer_scale = quantizer.quantizer_module_ref.scale.detach().numpy() - - onnx_input_output_low = -128 * onnx_q_scale + onnx_zero_level - onnx_input_output_high = 127 * onnx_q_scale + onnx_zero_level - - if quantizer_scale.shape: - quantizer_scale = quantizer_scale[0] - - # Quantize weights as they are exported quantized - quantized_weights = quantizer.quantizer_module_ref(quantizer.quantized_module.weight).detach() - - assert np.allclose(onnx_tensor_weight, np.array(quantized_weights)) - assert np.allclose(level_high_ratio * quantizer_scale, onnx_input_output_high) - assert np.allclose(-2.0 * level_positive_negative_ratio * quantizer_scale, onnx_input_output_low) - - -@pytest.mark.parametrize("model", [TwoConvTestModel(), EightConvTestModel(), DepthWiseConvTestModel()]) -def test_is_pytorch_output_the_same_as_onnx_qdq_overflow_fix_applied(tmp_path, model): - nncf_config = get_config_for_export_mode(True) - nncf_config.update({"input_info": {"sample_size": [1, 1, 20, 20]}}) - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - - onnx_checkpoint_path = str(tmp_path / "model.onnx") - compression_ctrl.export_model(onnx_checkpoint_path, save_format="onnx_13") - input_tensors = [ - np.random.normal(size=[1, 1, 20, 20]), - np.random.uniform(size=[1, 1, 20, 20]), - 100 * np.random.normal(size=[1, 1, 20, 20]), - 100 * np.random.uniform(size=[1, 1, 20, 20]), - ] - - # TODO(andrey-churkin): Remove after the issue https://github.com/microsoft/onnxruntime/issues/24518 is fixed. - sess_options = None - if isinstance(model, DepthWiseConvTestModel): - sess_options = rt.SessionOptions() - sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL - - sess = rt.InferenceSession(onnx_checkpoint_path, sess_options) - for input_tensor in input_tensors: - torch_input = torch.tensor(input_tensor, dtype=torch.float32) - - with torch.no_grad(): - torch_out = compressed_model(torch_input) - - input_name = sess.get_inputs()[0].name - onnx_out = sess.run(None, {input_name: input_tensor.astype(np.float32)})[0] - - assert np.allclose(torch_out.numpy(), onnx_out, rtol=1e-5, atol=1e-3) - - -def test_is_overflow_fix_applied_model_resumed_correctly(tmp_path): - model = TwoConvTestModel() - nncf_config = get_config_for_export_mode(False) - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, nncf_config) - compression_state = compression_ctrl.get_compression_state() - model_state_dict = compressed_model.state_dict() - # Must create new model as the previous one was somehow changed during create_compressed_model_and_algo_for_test() - model = TwoConvTestModel() - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - model, nncf_config, compression_state=compression_state - ) - load_state(compressed_model, model_state_dict, is_resume=True) - are_symmetric_fq_nodes_are_exported_correct_with_overflow_fix(tmp_path, compression_ctrl) diff --git a/tests/torch/quantization/test_quantization_metric.py b/tests/torch/quantization/test_quantization_metric.py deleted file mode 100644 index ebb029e5a75..00000000000 --- a/tests/torch/quantization/test_quantization_metric.py +++ /dev/null @@ -1,262 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import namedtuple - -import pytest - -from nncf import NNCFConfig -from nncf.common.graph.patterns.manager import TargetDevice -from nncf.torch import create_compressed_model -from nncf.torch.quantization.metrics import MemoryConsumptionStatisticsCollector -from nncf.torch.quantization.metrics import ShareEdgesQuantizedDataPathStatisticsCollector -from tests.torch import test_models -from tests.torch.helpers import register_bn_adaptation_init_args - -pytestmark = pytest.mark.legacy - - -def get_basic_quantization_config(): - config = NNCFConfig() - config.update( - { - "model": "AlexNet", - "input_info": { - "sample_size": [1, 3, 32, 32], - }, - "compression": { - "algorithm": "quantization", - "quantize_inputs": True, - "initializer": {"range": {"num_init_samples": 0}}, - }, - } - ) - register_bn_adaptation_init_args(config) - - return config - - -def as_dict(obj): - if isinstance(obj, list): - return [as_dict(value) for value in obj] - if isinstance(obj, dict): - return {key: as_dict(value) for key, value in obj.items()} - if hasattr(obj, "__dict__"): - return {key: as_dict(value) for key, value in obj.__dict__.items() if not key.startswith("_")} - return obj - - -CaseStruct = namedtuple( - "CaseStruct", ("initializers", "activations", "weights", "ignored_scopes", "target_device", "expected") -) - - -QUANTIZATION_SHARE_AND_BITWIDTH_DISTR_STATS_TEST_CASES = [ - CaseStruct( - initializers={}, - activations={}, - weights={}, - ignored_scopes=[], - target_device="TRIAL", - expected={ - "wq_counter": { - "num_symmetric": 8, - "num_asymmetric": 0, - "num_signed": 0, - "num_unsigned": 8, - "num_per_tensor": 8, - "num_per_channel": 0, - "total_count": 8, - "potential_count": 8, - }, - "aq_counter": { - "num_symmetric": 8, - "num_asymmetric": 0, - "num_signed": 0, - "num_unsigned": 8, - "num_per_tensor": 8, - "num_per_channel": 0, - "total_count": 8, - "potential_count": None, - }, - "num_wq_per_bitwidth": {8: 8}, - "num_aq_per_bitwidth": {8: 8}, - }, - ), - CaseStruct( - initializers={}, - activations={}, - weights={}, - ignored_scopes=[], - target_device="CPU", - expected={ - "wq_counter": { - "num_symmetric": 8, - "num_asymmetric": 0, - "num_signed": 8, - "num_unsigned": 0, - "num_per_tensor": 0, - "num_per_channel": 8, - "total_count": 8, - "potential_count": 8, - }, - "aq_counter": { - "num_symmetric": 8, - "num_asymmetric": 0, - "num_signed": 0, - "num_unsigned": 8, - "num_per_tensor": 8, - "num_per_channel": 0, - "total_count": 8, - "potential_count": None, - }, - "num_wq_per_bitwidth": {8: 8}, - "num_aq_per_bitwidth": {8: 8}, - }, - ), -] - - -@pytest.mark.parametrize("data", QUANTIZATION_SHARE_AND_BITWIDTH_DISTR_STATS_TEST_CASES) -def test_quantization_share_and_bitwidth_distribution_stats(data): - config = get_basic_quantization_config() - config["compression"]["initializer"].update(data.initializers) - config["compression"]["activations"] = data.activations - config["compression"]["weights"] = data.weights - config["compression"]["ignored_scopes"] = data.ignored_scopes - config["target_device"] = data.target_device - - ctrl, _ = create_compressed_model(test_models.AlexNet(), config) - nncf_stats = ctrl.statistics() - quantization_stats = nncf_stats.quantization - - for attr_name, expected_value in data.expected.items(): - actual_value = as_dict(getattr(quantization_stats, attr_name)) - assert expected_value == actual_value - - -MEMORY_CONSUMPTION_STATS_TEST_CASES = [ - CaseStruct( - initializers={}, - activations={}, - weights={}, - ignored_scopes=[], - target_device="TRIAL", - expected={ - "fp32_weight_size": 88.74, - "quantized_weight_size": 22.18, - "max_fp32_activation_size": 0.0625, - "max_compressed_activation_size": 0.015625, - "weight_memory_consumption_decrease": 4.0, - }, - ), - CaseStruct( - initializers={ - "precision": { - "bitwidth_per_scope": [ - [2, "AlexNet/Sequential[features]/NNCFConv2d[0]/conv2d_0|WEIGHT"], - [4, "AlexNet/Sequential[features]/NNCFConv2d[6]/conv2d_0|WEIGHT"], - ] - } - }, - activations={}, - weights={"bits": 8}, - ignored_scopes=[], - target_device="TRIAL", - expected={ - "fp32_weight_size": 88.74, - "quantized_weight_size": 21.86, - "max_fp32_activation_size": 0.0625, - "max_compressed_activation_size": 0.015625, - "weight_memory_consumption_decrease": 4.05, - }, - ), - CaseStruct( - initializers={}, - activations={}, - weights={}, - ignored_scopes=["AlexNet/Sequential[features]/NNCFConv2d[0]/conv2d_0"], - target_device="TRIAL", - expected={ - "fp32_weight_size": 88.74, - "quantized_weight_size": 22.19, - "max_fp32_activation_size": 0.0625, - "max_compressed_activation_size": 0.0625, - "weight_memory_consumption_decrease": 3.99, - }, - ), -] - - -@pytest.mark.parametrize("data", MEMORY_CONSUMPTION_STATS_TEST_CASES) -def test_memory_consumption_stats(data): - config = get_basic_quantization_config() - config["compression"]["initializer"].update(data.initializers) - config["compression"]["weights"] = data.weights - config["compression"]["ignored_scopes"] = data.ignored_scopes - config["target_device"] = data.target_device - - ctrl, _ = create_compressed_model(test_models.AlexNet(), config) - stats = MemoryConsumptionStatisticsCollector( - ctrl.model, ctrl.weight_quantizers, ctrl.non_weight_quantizers - ).collect() - - for attr_name, expected_value in data.expected.items(): - actual_value = getattr(stats, attr_name) - assert expected_value == pytest.approx(actual_value, rel=1e-2) - - -QUANTIZATION_CONFIGURATION_STATS_TEST_CASES = [ - CaseStruct( - initializers={}, - activations={}, - weights={}, - ignored_scopes=[], - target_device="TRIAL", - expected={"quantized_edges_in_cfg": 173, "total_edges_in_cfg": 177}, - ), - CaseStruct( - initializers={}, - activations={}, - weights={}, - ignored_scopes=[ - "Inception3/__add___0", - "Inception3/__add___1", - "Inception3/__add___2", - "Inception3/__mul___0", - "Inception3/__mul___1", - "Inception3/__mul___2", - ], - target_device="TRIAL", - expected={"quantized_edges_in_cfg": 173, "total_edges_in_cfg": 177}, - ), -] - - -@pytest.mark.parametrize("data", QUANTIZATION_CONFIGURATION_STATS_TEST_CASES) -def test_quantization_configuration_stats(data): - config = get_basic_quantization_config() - config["compression"]["ignored_scopes"] = data.ignored_scopes - config["input_info"]["sample_size"] = [2, 3, 299, 299] - - ctrl, _ = create_compressed_model(test_models.Inception3(aux_logits=True, transform_input=True), config) - stats = ShareEdgesQuantizedDataPathStatisticsCollector(ctrl.model, ctrl, TargetDevice.ANY).collect() - - for attr_name, expected_value in data.expected.items(): - actual_value = as_dict(getattr(stats, attr_name)) - assert expected_value == actual_value - - -def test_full_ignored_scope(): - config = get_basic_quantization_config() - config["compression"]["ignored_scopes"] = ["{re}.*"] - ctrl, _ = create_compressed_model(test_models.AlexNet(), config) - ctrl.statistics() diff --git a/tests/torch/quantization/test_range_init.py b/tests/torch/quantization/test_range_init.py deleted file mode 100644 index 64cd16be544..00000000000 --- a/tests/torch/quantization/test_range_init.py +++ /dev/null @@ -1,1066 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import itertools -import re -from collections import namedtuple -from dataclasses import dataclass -from functools import partial -from typing import Union - -import pytest -import torch -import torch.utils.data -from pytest import approx -from torch import nn -from torch.utils.data import DataLoader -from torchvision.models import squeezenet1_1 - -import nncf -import nncf.torch.tensor_statistics.collectors as pt_collectors -from nncf.common.graph import NNCFNodeName -from nncf.common.quantization.initialization.range import PerLayerRangeInitConfig -from nncf.common.quantization.initialization.range import RangeInitConfig -from nncf.common.quantization.quantizer_setup import ActivationQuantizationInsertionPoint -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizationPoint -from nncf.common.quantization.quantizer_setup import WeightQuantizationInsertionPoint -from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.common.quantization.structs import QuantizerConfig -from nncf.common.quantization.structs import QuantizerGroup -from nncf.config import NNCFConfig -from nncf.config.structures import QuantizationRangeInitArgs -from nncf.tensor import Tensor -from nncf.torch import utils -from nncf.torch.checkpoint_loading import load_state -from nncf.torch.initialization import DefaultInitializingDataLoader -from nncf.torch.initialization import wrap_dataloader_for_init -from nncf.torch.quantization.external_quantizer import EXTERNAL_QUANTIZERS_STORAGE_NAME -from nncf.torch.quantization.init_range import PTRangeInitCollectorParams -from nncf.torch.quantization.init_range import PTRangeInitParams -from nncf.torch.quantization.init_range import StatCollectorGenerator -from nncf.torch.quantization.layers import QUANTIZATION_MODULES -from nncf.torch.quantization.layers import AsymmetricQuantizer -from nncf.torch.quantization.layers import BaseQuantizer -from nncf.torch.quantization.layers import PTQuantizerSpec -from nncf.torch.quantization.layers import SymmetricQuantizer -from nncf.torch.tensor_statistics.statistics import pt_convert_stat_to_min_max_tensor_stat -from nncf.torch.utils import get_all_modules_by_type -from nncf.torch.utils import safe_thread_call -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_ones_mock_dataloader -from tests.torch.helpers import get_empty_config -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.quantization.quantization_helpers import compare_multi_gpu_dump -from tests.torch.quantization.quantization_helpers import create_rank_dataloader -from tests.torch.quantization.quantization_helpers import distributed_init_test_default -from tests.torch.quantization.quantization_helpers import get_squeezenet_quantization_config -from tests.torch.quantization.quantization_helpers import post_compression_test_distr_init - - -def scale_signed_dumping_worker(gpu, ngpus_per_node, config, tmp_path): - distributed_init_test_default(gpu, ngpus_per_node, config) - data_loader = create_rank_dataloader(config, gpu) - model = safe_thread_call(partial(squeezenet1_1, pretrained=True)) - - config.register_extra_structs([QuantizationRangeInitArgs(wrap_dataloader_for_init(data_loader))]) - quant_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - compression_scheduler = compression_ctrl.scheduler - - quant_model = post_compression_test_distr_init(compression_ctrl, config, ngpus_per_node, quant_model) - - criterion = torch.nn.MSELoss().cuda(config.gpu) - optimizer = torch.optim.Adam(quant_model.parameters(), lr=0.01) - - torch.backends.cudnn.benchmark = True - - # just to reproduce the same scale values without Dropout - quant_model.eval() - - act_sum = 0 - for layer in get_all_modules_by_type(quant_model, "SymmetricQuantizer").values(): - act_sum += layer.scale.sum() - ref_sum = 3720.864 - assert act_sum.item() == approx(ref_sum, 0.01), ( - f"sum of scales is not expected {act_sum.item()} vs {ref_sum} rank {config.rank}" - ) - - out_file_path = get_path_after_broadcast(tmp_path, config.rank) - save_params(quant_model, out_file_path) - compression_scheduler.step() - for i, (input_, _) in enumerate(data_loader): - if i > 5: - break - output = quant_model(input_) - optimizer.zero_grad() - dummy_target = torch.randn(1000).cuda(config.gpu, non_blocking=True) - loss = criterion(output, dummy_target) - compression_scheduler.step() - loss.backward() - optimizer.step() - compression_scheduler.step() - - out_file_path = get_path_path_after_train_iters(tmp_path, config.rank) - save_params(quant_model, out_file_path) - - -def get_path_path_after_train_iters(tmp_path, rank): - out_file_path = tmp_path / f"scale_signed_after_1_train_iter_gpu{rank}.pt" - return out_file_path - - -def get_path_after_broadcast(tmp_path, rank): - out_file_path = tmp_path / f"scale_signed_after_broadcast_gpu{rank}.pt" - return out_file_path - - -def save_params(model, out_file_path): - gpu_scale_signed_params = [] - for layer in utils.get_all_modules_by_type(model, "SymmetricQuantizer").values(): - gpu_scale_signed_params.append( - (layer.scale.to(torch.device("cpu")), layer.signed_tensor.to(torch.device("cpu"))) - ) - with out_file_path.open("wb") as out_file: - torch.save(gpu_scale_signed_params, out_file) - - -@pytest.mark.cuda -def test_multiprocessing_distributed_shares_init_scales_signedness_across_gpus(tmp_path, runs_subprocess_in_precommit): - if not torch.cuda.is_available(): - pytest.skip("Skipping CUDA test cases for CPU only setups") - num_init_samples = 10 - - config = get_squeezenet_quantization_config() - config["compression"]["initializer"] = {"range": {"num_init_samples": num_init_samples}} - - ngpus_per_node = torch.cuda.device_count() - config.world_size = ngpus_per_node - register_bn_adaptation_init_args(config) - torch.multiprocessing.spawn( - scale_signed_dumping_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, config, tmp_path), join=True - ) - - assert not compare_multi_gpu_dump(config, tmp_path, get_path_after_broadcast) - assert not compare_multi_gpu_dump(config, tmp_path, get_path_path_after_train_iters) - - -def create_empty_config_without_init_section(): - config = get_empty_config() - config["compression"] = {"algorithm": "quantization"} - register_bn_adaptation_init_args(config) - return config - - -def create_config(): - config = get_empty_config() - config["compression"] = {"algorithm": "quantization", "initializer": {"range": {"num_init_samples": 1}}} - register_bn_adaptation_init_args(config) - return config - - -def generate_qp( - node_name: NNCFNodeName, target: QuantizerGroup, input_port_id: int = None -) -> SingleConfigQuantizationPoint: - if target is QuantizerGroup.WEIGHTS: - qip = WeightQuantizationInsertionPoint(target_node_name=node_name) - elif target is QuantizerGroup.ACTIVATIONS: - qip = ActivationQuantizationInsertionPoint(target_node_name=node_name, input_port_id=input_port_id) - else: - msg = ( - f"Invalid quantizer group: {target}. " - f"Supported groups are {QuantizerGroup.WEIGHTS}" - f"and {QuantizerGroup.ACTIVATIONS}." - ) - raise nncf.InvalidQuantizerGroupError(msg) - return SingleConfigQuantizationPoint(qip, QuantizerConfig(), [node_name]) - - -@pytest.mark.parametrize("wrap_dataloader", [True], ids=["wrapped_dataloader"]) -class TestRangeInit: - @staticmethod - def create_algo_and_compressed_model(config): - model = TwoConvTestModel() - compressed_model, algo = create_compressed_model_and_algo_for_test(model, config) - return algo, compressed_model - - @staticmethod - def create_dataloader(wrap_dataloader, config, num_samples=1) -> DataLoader: - data_loader = create_ones_mock_dataloader(config, num_samples) - if wrap_dataloader: - data_loader = DefaultInitializingDataLoader(data_loader) - return data_loader - - @staticmethod - def check_sign_and_scale(model, ref_table): - model_conv = get_all_modules_by_type(model, "SymmetricQuantizer") - for scope, module in model_conv.items(): - for pattern, ref_values in ref_table.items(): - match = re.search(pattern, str(scope)) - if match: - assert isinstance(module, SymmetricQuantizer) - assert module.signed == ref_values[0], f"sign is not matched for {str(scope)}" - assert all(module.scale == ref_values[1]), f"scale is not matched for {str(scope)}" - - @pytest.mark.parametrize("config_creator", (create_config, create_empty_config_without_init_section)) - def test_scale_and_sign_init_for_quant_algo__without_init_section(self, wrap_dataloader, config_creator): - config = config_creator() - data_loader = self.create_dataloader(wrap_dataloader, config) - config.register_extra_structs([QuantizationRangeInitArgs(data_loader)]) - _, compressed_model = self.create_algo_and_compressed_model(config) - - self.check_sign_and_scale( - compressed_model, - { - ".*Sequential\\[0\\].*UpdateWeight.*": (True, torch.ones(2, 1, 1, 1)), - ".*Sequential\\[1\\].*UpdateWeight. *": (True, 1), - ".*activation_quantizers.*Sequential\\[0\\].*": (True, 4), - ".*activation_quantizers.*nncf_model_input*": (False, 1), - }, - ) - - def test_scale_and_sign_init_for_quant_algo__with_zero_init_steps(self, wrap_dataloader): - config = create_config() - config["compression"]["initializer"]["range"]["num_init_samples"] = 0 - - data_loader = self.create_dataloader(wrap_dataloader, config) - config.register_extra_structs([QuantizationRangeInitArgs(data_loader)]) - _, compressed_model = self.create_algo_and_compressed_model(config) - - self.check_sign_and_scale( - compressed_model, - { - ".*Sequential\\[0\\].*UpdateWeight.*": (True, torch.ones(2, 1, 1, 1)), - ".*Sequential\\[1\\].*UpdateWeight. *": (True, 1), - ".*activation_quantizers.*Sequential\\[0\\].*": (False, 1), - ".*activation_quantizers.*nncf_model_input*": (False, 1), - }, - ) - - def test_scale_and_sign_init_for_quant_algo__after_load_state(self, wrap_dataloader): - config = create_config() - data_loader = self.create_dataloader(wrap_dataloader, config) - config.register_extra_structs([QuantizationRangeInitArgs(data_loader)]) - _, compressed_model = self.create_algo_and_compressed_model(config) - ref_loaded_scale_val = torch.ones((1, 1, 1, 1)) * 100 - load_state( - compressed_model, - { - "module.features.0.0.pre_ops.0.op.signed_tensor": torch.tensor( - [0.0] - ), # quantizer of 1st conv's weights - "module.features.1.0.pre_ops.0.op.scale": ref_loaded_scale_val, # quantizer of 2nd conv's weights - }, - ) - - self.check_sign_and_scale( - compressed_model, - { - ".*Sequential\\[0\\].*UpdateWeight.*": (False, torch.ones(2, 1, 1, 1)), - ".*Sequential\\[1\\].*UpdateWeight. *": (True, ref_loaded_scale_val), - ".*activation_quantizers.*Sequential\\[0\\].*": (True, 4), - ".*activation_quantizers.*nncf_model_input*": (False, 1), - }, - ) - - def test_scope_overrides(self, wrap_dataloader): - config = create_config() - config["target_device"] = "TRIAL" - config["compression"]["scope_overrides"] = { - "weights": { - r"{re}NNCFConv2d\[[0-9]*\]/conv2d_0": { - "bits": 7, - "mode": "asymmetric", - }, - }, - "activations": { - r"{re}NNCFConv2d\[[0-9]*\]/conv2d_0": { - "bits": 7, - "signed": False, - } - }, - } - data_loader = self.create_dataloader(wrap_dataloader, config) - config.register_extra_structs([QuantizationRangeInitArgs(data_loader)]) - _, compressed_model = self.create_algo_and_compressed_model(config) - - quantizers = get_all_modules_by_type(compressed_model, ["SymmetricQuantizer", "AsymmetricQuantizer"]) - quantizer_str_dict = {str(k): v for k, v in quantizers.items()} - group_1 = [ - quantizer_str_dict[ - "TwoConvTestModel/Sequential[features]/" - "Sequential[0]/NNCFConv2d[0]/ModuleDict[pre_ops]/UpdateWeight[0]/" - "AsymmetricQuantizer[op]" - ], - quantizer_str_dict[ - "TwoConvTestModel/Sequential[features]/" - "Sequential[1]/NNCFConv2d[0]/ModuleDict[pre_ops]/UpdateWeight[0]/" - "AsymmetricQuantizer[op]" - ], - ] - group_2 = [ - quantizer_str_dict[ - f"TwoConvTestModel/NNCFNetworkInterface[_nncf]/" - f"ModuleDict[{EXTERNAL_QUANTIZERS_STORAGE_NAME}]/" - "SymmetricQuantizer[TwoConvTestModel/Sequential[features]" - "/Sequential[0]/NNCFConv2d[0]/conv2d_0|OUTPUT]" - ], - quantizer_str_dict[ - f"TwoConvTestModel/NNCFNetworkInterface[_nncf]/" - f"ModuleDict[{EXTERNAL_QUANTIZERS_STORAGE_NAME}]/SymmetricQuantizer" - "[/nncf_model_input_0|OUTPUT]" - ], - ] - - for quantizer in group_1: - assert isinstance(quantizer, AsymmetricQuantizer) - assert quantizer.levels == 2**7 - for quantizer in group_2: - assert isinstance(quantizer, SymmetricQuantizer) - assert not quantizer.signed - - PerLayerRangeInitTestStruct = namedtuple( - "PerLayerRangeInitTestStruct", ("range_init_config", "qps_vs_expected_init_config") - ) - - PER_LAYER_RANGE_INIT_TEST_CASES = [ - PerLayerRangeInitTestStruct( - range_init_config=[{"type": "min_max", "num_init_samples": 1, "target_scopes": ["{re}.*"]}], - qps_vs_expected_init_config=[ - ( - generate_qp( - "/nncf_model_input_0", - QuantizerGroup.ACTIVATIONS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[0]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.ACTIVATIONS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[1]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.WEIGHTS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ], - ), - PerLayerRangeInitTestStruct( - range_init_config=[ - { - "type": "min_max", - "num_init_samples": 1, - "target_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*"], - }, - { - "type": "mean_min_max", - "num_init_samples": 2, - "ignored_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*"], - }, - ], - qps_vs_expected_init_config=[ - ( - generate_qp("/nncf_model_input_0", QuantizerGroup.ACTIVATIONS), - RangeInitConfig(init_type="mean_min_max", num_init_samples=2), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[0]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.ACTIVATIONS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[0]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.WEIGHTS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[1]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.ACTIVATIONS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ], - ), - PerLayerRangeInitTestStruct( - range_init_config=[ - { - "type": "min_max", - "num_init_samples": 1, - "target_quantizer_group": "weights", - "target_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*"], - }, - { - "type": "mean_min_max", - "num_init_samples": 2, - "ignored_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*", "{re}/nncf_model_input_0"], - }, - { - "type": "threesigma", - "num_init_samples": 1, - "target_quantizer_group": "activations", - "target_scopes": ["{re}/nncf_model_input_0"], - }, - { - "type": "percentile", - "num_init_samples": 10, - "params": {"min_percentile": "0.1", "max_percentile": "99.9"}, - "target_quantizer_group": "activations", - "target_scopes": [ - "TwoConvTestModel/Sequential[features]/Sequential[1]/NNCFConv2d[0]/conv2d_0|OUTPUT" - ], - }, - ], - qps_vs_expected_init_config=[ - ( - generate_qp("/nncf_model_input_0", QuantizerGroup.ACTIVATIONS), - RangeInitConfig(init_type="threesigma", num_init_samples=1), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[0]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.WEIGHTS, - ), - RangeInitConfig(init_type="min_max", num_init_samples=1), - ), - ( - generate_qp( - "TwoConvTestModel/Sequential[features]/Sequential[1]/NNCFConv2d[0]/conv2d_0", - QuantizerGroup.ACTIVATIONS, - ), - RangeInitConfig( - init_type="percentile", - num_init_samples=10, - init_type_specific_params={"min_percentile": "0.1", "max_percentile": "99.9"}, - ), - ), - ], - ), - ] - - @staticmethod - @pytest.fixture(params=PER_LAYER_RANGE_INIT_TEST_CASES) - def per_layer_range_init_test_struct(request): - return request.param - - def test_get_init_config_for_quantization_point(self, wrap_dataloader, per_layer_range_init_test_struct): - per_layer_configs = [] - for sub_init_range_config_dict in per_layer_range_init_test_struct.range_init_config: - per_layer_configs.append(PerLayerRangeInitConfig.from_dict(sub_init_range_config_dict)) - - params = PTRangeInitParams( - wrap_dataloader, "", global_init_config=None, per_layer_range_init_configs=per_layer_configs - ) - - for qp, ref_range_init_config in per_layer_range_init_test_struct.qps_vs_expected_init_config: - assert params.get_init_config_for_quantization_point(qp) == ref_range_init_config - - @pytest.mark.parametrize("quant_type", ("symmetric", "asymmetric")) - def test_ad_hoc_range_init_does_not_replace_parameter_tensors(self, wrap_dataloader, quant_type): - config = create_config() - config["compression"].update({"activations": {"mode": quant_type}, "weights": {"mode": quant_type}}) - - data_loader = self.create_dataloader(wrap_dataloader, config) - config.register_extra_structs([QuantizationRangeInitArgs(data_loader)]) - - model = TwoConvTestModel() - quant_model, quant_ctrl = create_compressed_model_and_algo_for_test(model, config) - param_name_vs_id = {name: id(tnsr) for name, tnsr in quant_model.named_parameters()} - - quant_ctrl.init_range() - - for name, param in quant_model.named_parameters(): - assert param_name_vs_id[name] == id(param) - - -class SingleConv2dIdentityModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv2d = nn.Conv2d(3, 3, 1) - self.conv2d.weight = torch.nn.Parameter(torch.ones_like(self.conv2d.weight)) - - def forward(self, input_): - return self.conv2d(input_) - - -def _get_init_tensor_for_range_init_test() -> torch.Tensor: - test_input_sample = torch.ones([3, 100, 100]) - test_input_sample[0] = torch.range(1, 10_000).view((100, 100)) - test_input_sample[1] = test_input_sample[0] * -2 - test_input_sample[2] = test_input_sample[0] * 3 - return test_input_sample - - -class SingleConv2dSyntheticWeightModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv2d = nn.Conv2d(3, 3, 100) - - with torch.no_grad(): - value = _get_init_tensor_for_range_init_test() - for i in range(0, 3): - self.conv2d.weight[:, i] = value - - def forward(self, input_): - return self.conv2d(input_) - - -def init_idfn(val): - if isinstance(val, tuple): - return val[0] - return val - - -@dataclass -class SymQuantizerScaleRef: - scale: tuple[float, ...] - - -@dataclass -class AsymQuantizerScaleRef: - input_low: tuple[float, ...] - input_range: tuple[float, ...] - - -@dataclass -class GranularityQuantizerRefs: - per_channel: Union[SymQuantizerScaleRef, AsymQuantizerScaleRef] - per_tensor: Union[SymQuantizerScaleRef, AsymQuantizerScaleRef] - - -@dataclass -class RangeInitTestCase: - collector_name: str - weights_refs_symmetric: GranularityQuantizerRefs - weights_refs_assymetric: GranularityQuantizerRefs - activations_refs_symmetric: GranularityQuantizerRefs - activations_refs_assymetric: GranularityQuantizerRefs - - -@pytest.mark.parametrize( - "range_init_test_case", - ( - [ - RangeInitTestCase( - collector_name="min_max", - weights_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((10000.0, 20000.0, 30000.0)).view((3, 1, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=30000.0), - ), - weights_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((1.0, -20000.0, 3.0)).view((3, 1, 1, 1)), - input_range=torch.tensor((9999.0, 19998.0, 29997.0)).view((3, 1, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-20000.0, input_range=50000.0), - ), - activations_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((20000.0, 40000.0, 60000.0)).view((1, 3, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=60000.0), - ), - activations_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((1.0, -40000.0, 3.0)).view((1, 3, 1, 1)), - input_range=torch.tensor((19999.0, 39998.0, 59997.0)).view((1, 3, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-40000.0, input_range=100000.0), - ), - ), - RangeInitTestCase( - collector_name="mixed_min_max", - weights_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((10000.0, 20000.0, 30000.0)).view((3, 1, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=30000.0), - ), - weights_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((1.0, -20000.0, 3.0)).view((3, 1, 1, 1)), - input_range=torch.tensor((9999.0, 19998.0, 29997.0)).view((3, 1, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-20000.0, input_range=50000.0), - ), - activations_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((20000.0, 40000.0, 60000.0)).view((1, 3, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=45000.0), - ), - activations_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((1.0, -40000.0, 3.0)).view((1, 3, 1, 1)), - input_range=torch.tensor((19999.0, 39998.0, 59997.0)).view((1, 3, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-30000.0, input_range=75000.0), - ), - ), - RangeInitTestCase( - collector_name="mean_min_max", - weights_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((10000.0, 20000.0, 30000.0)).view((3, 1, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=30000.0), - ), - weights_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((1.0, -20000.0, 3.0)).view((3, 1, 1, 1)), - input_range=torch.tensor((9999.0, 19998.0, 29997.0)).view((3, 1, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-20000.0, input_range=50000.0), - ), - activations_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((15000.0, 30000.0, 45000.0)).view((1, 3, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=45000.0), - ), - activations_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((1.5, -30000.0, 4.5)).view((1, 3, 1, 1)), - input_range=torch.tensor((14998.5000, 29997.0000, 44995.5000)).view((1, 3, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-30000.0, input_range=75000.0), - ), - ), - RangeInitTestCase( - collector_name="threesigma", - weights_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((16120.1719, 32240.3438, 48360.5156)).view((3, 1, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=33780.2891), - ), - weights_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((-6119.1719, -32240.3438, -18357.5156)).view((3, 1, 1, 1)), - input_range=torch.tensor((22239.3438, 44478.6875, 66718.0312)).view((3, 1, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-26279.2871, input_range=60059.5781), - ), - activations_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((21494.4707, 42988.9414, 64483.4141)).view((1, 3, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=52662.1367), - ), - activations_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((-8159.4707, -42988.9414, -24478.4141)).view((1, 3, 1, 1)), - input_range=torch.tensor((29653.9414, 59307.8828, 88961.8281)).view((1, 3, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-42660.1367, input_range=95322.2734), - ), - ), - RangeInitTestCase( - collector_name="percentile", - weights_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((6789.3213, 13580.6416, 20367.9629)).view((3, 1, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=7776.0), - ), - weights_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((3210.6790, -13580.6416, 9632.0371)).view((3, 1, 1, 1)), - input_range=torch.tensor((3578.6423, 7157.2837, 10735.9258)).view((3, 1, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-740.6420, input_range=8516.6416), - ), - activations_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((9052.3213, 18108.0000, 27156.9629)).view((1, 3, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=10734.6426), - ), - activations_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((4280.6792, -18108.0000, 12842.0371)).view((1, 3, 1, 1)), - input_range=torch.tensor((4771.6421, 9544.0000, 14314.9258)).view((1, 3, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-988.0, input_range=11722.6426), - ), - ), - RangeInitTestCase( - collector_name="mean_percentile", - weights_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((9990.0010, 19980.0020, 29970.0039)).view((3, 1, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=29910.0039), - ), - weights_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((10.999, -19980.0, 32.997)).view((3, 1, 1, 1)), - input_range=torch.tensor((9979.0020, 19958.0039, 29937.0078)).view((3, 1, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-19940.0020, input_range=49850.0078), - ), - activations_refs_symmetric=GranularityQuantizerRefs( - per_channel=SymQuantizerScaleRef( - scale=torch.tensor((14985.0020, 29970.0039, 44955.0078)).view((1, 3, 1, 1)) - ), - per_tensor=SymQuantizerScaleRef(scale=44865.0078), - ), - activations_refs_assymetric=GranularityQuantizerRefs( - per_channel=AsymQuantizerScaleRef( - input_low=torch.tensor((16.498, -2.9970e04, 49.496)).view((1, 3, 1, 1)), - input_range=torch.tensor((14968.5039, 29937.0078, 44905.5117)).view((1, 3, 1, 1)), - ), - per_tensor=AsymQuantizerScaleRef(input_low=-29910.0039, input_range=74775.0156), - ), - ), - ] - ), - ids=init_idfn, -) -def test_init_ranges_are_set( - quantization_mode: str, - is_per_channel: bool, - range_init_test_case: RangeInitTestCase, -): - class SyntheticDataset(torch.utils.data.Dataset): - def __init__(self): - super().__init__() - self._length = 2 - - def __getitem__(self, idx): - if idx >= self._length: - raise StopIteration - test_input_sample = _get_init_tensor_for_range_init_test() * (idx + 1) - return test_input_sample, test_input_sample - - def __len__(self): - return self._length - - data_loader = torch.utils.data.DataLoader(SyntheticDataset(), batch_size=1, drop_last=True) - - range_init_type = range_init_test_case.collector_name - config_with_init = NNCFConfig() - config_with_init.update( - { - "input_info": {"sample_size": [1, 3, 100, 100]}, - "target_device": "TRIAL", - "compression": { - "algorithm": "quantization", - "activations": {"mode": quantization_mode, "per_channel": is_per_channel}, - "weights": {"mode": quantization_mode, "per_channel": is_per_channel}, - "initializer": {"range": {"num_init_samples": 2, "type": range_init_type}}, - }, - } - ) - - if range_init_type == "percentile": - config_with_init["compression"]["initializer"]["range"]["params"] = { - "min_percentile": 32.10, - "max_percentile": 67.89, - } - - # Activations init check - id_model = SingleConv2dIdentityModel() - config_with_init.register_extra_structs([QuantizationRangeInitArgs(wrap_dataloader_for_init(data_loader))]) - register_bn_adaptation_init_args(config_with_init) - _, compression_ctrl = create_compressed_model_and_algo_for_test(id_model, config_with_init) - - act_quantizer_info = next(iter(compression_ctrl.non_weight_quantizers.values())) - - if is_per_channel: - ref_scale = range_init_test_case.activations_refs_symmetric.per_channel.scale - ref_input_low = range_init_test_case.activations_refs_assymetric.per_channel.input_low - ref_input_range = range_init_test_case.activations_refs_assymetric.per_channel.input_range - else: - ref_scale = range_init_test_case.activations_refs_symmetric.per_tensor.scale - ref_input_low = range_init_test_case.activations_refs_assymetric.per_tensor.input_low - ref_input_range = range_init_test_case.activations_refs_assymetric.per_tensor.input_range - - def check_scales(quantizer: BaseQuantizer, per_channel: bool): - # Absolute tolerance is 1.0 due to percentile value interpolation - if quantization_mode == "symmetric": - assert torch.allclose(quantizer.scale, torch.tensor(ref_scale), atol=1.0) - if per_channel: - assert quantizer.scale.numel() == 3 - else: - assert quantizer.scale.numel() == 1 - else: - assert torch.allclose(quantizer.input_low, torch.tensor(ref_input_low), atol=1.0) - - assert torch.allclose( - quantizer.input_range, - torch.tensor(ref_input_range), - atol=1.0, - ) - if per_channel: - assert quantizer.input_low.numel() == 3 - assert quantizer.input_range.numel() == 3 - else: - assert quantizer.input_low.numel() == 1 - assert quantizer.input_range.numel() == 1 - - check_scales(act_quantizer_info.quantizer_module_ref, is_per_channel) - # Weight init check - synth_weight_model = SingleConv2dSyntheticWeightModel() - config_with_init["compression"]["initializer"]["range"]["num_init_samples"] = 1 - _, compression_ctrl = create_compressed_model_and_algo_for_test(synth_weight_model, config_with_init) - - weight_quantizer_info = next(iter(compression_ctrl.weight_quantizers.values())) - if is_per_channel: - ref_scale = range_init_test_case.weights_refs_symmetric.per_channel.scale - ref_input_low = range_init_test_case.weights_refs_assymetric.per_channel.input_low - ref_input_range = range_init_test_case.weights_refs_assymetric.per_channel.input_range - else: - ref_scale = range_init_test_case.weights_refs_symmetric.per_tensor.scale - ref_input_low = range_init_test_case.weights_refs_assymetric.per_tensor.input_low - ref_input_range = range_init_test_case.weights_refs_assymetric.per_tensor.input_range - - check_scales(weight_quantizer_info.quantizer_module_ref, is_per_channel) - - -RangeInitCallCountTestStruct = namedtuple( - "RangeInitCallCountTestStruct", - ( - "range_init_config", - "expected_call_count_initializer_create", - "expected_call_count_register_input", - ), -) -RANGE_INIT_CALL_COUNT_TEST_CASES = [ - RangeInitCallCountTestStruct( - range_init_config={"type": "min_max", "num_init_samples": 5}, - expected_call_count_initializer_create={"min_max": 4, "mean_min_max": 0, "three_sigma": 0}, - expected_call_count_register_input={ - "min_max": 12, # 2 activation statistics for 5x inputs, 2 weight statistics for 1 input each - "mean_min_max": 0, - "three_sigma": 0, - }, - ), - RangeInitCallCountTestStruct( - range_init_config=[ - { - "type": "min_max", - "num_init_samples": 5, - "target_quantizer_group": "weights", - "target_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*"], - }, - { - "type": "mean_min_max", - "num_init_samples": 2, - "ignored_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*"], - }, - { - "type": "threesigma", - "num_init_samples": 3, - "target_quantizer_group": "activations", - "target_scopes": ["{re}TwoConvTestModel/Sequential\\[features\\]/.*"], - }, - ], - expected_call_count_initializer_create={"min_max": 2, "mean_min_max": 1, "three_sigma": 1}, - expected_call_count_register_input={ - "min_max": 2, # Weights only require single input registration - "mean_min_max": 2, - "three_sigma": 3, - }, - ), -] - - -@pytest.fixture(params=RANGE_INIT_CALL_COUNT_TEST_CASES) -def range_init_call_count_test_struct(request): - return request.param - - -class CustomSpy: - def __init__(self, fn) -> None: - self._fn = fn - self.call_count = 0 - self.return_values_list = [] - - def __call__(self, *args, **kwargs): - self.call_count += 1 - retval = self._fn(*args, **kwargs) - self.return_values_list.append(retval) - return retval - - -def test_per_layer_range_init_collectors_are_called_the_required_number_of_times( - range_init_call_count_test_struct, mocker -): - range_minmax_init_create_spy = CustomSpy(pt_collectors.get_min_max_statistic_collector) - mocker.patch("nncf.torch.quantization.init_range.get_min_max_statistic_collector", new=range_minmax_init_create_spy) - range_meanminmax_init_create_spy = CustomSpy(pt_collectors.get_mixed_min_max_statistic_collector) - mocker.patch( - "nncf.torch.quantization.init_range.get_mixed_min_max_statistic_collector", new=range_meanminmax_init_create_spy - ) - range_threesigma_init_create_spy = CustomSpy(pt_collectors.get_median_mad_statistic_collector) - mocker.patch( - "nncf.torch.quantization.init_range.get_median_mad_statistic_collector", new=range_threesigma_init_create_spy - ) - - config = create_config() - config["compression"]["initializer"]["range"] = range_init_call_count_test_struct.range_init_config - data_loader = TestRangeInit.create_dataloader(True, config, 10) - config.register_extra_structs([QuantizationRangeInitArgs(data_loader)]) - - TestRangeInit.create_algo_and_compressed_model(config) - - for stat_type, spy in [ - ("min_max", range_minmax_init_create_spy), - ("mean_min_max", range_meanminmax_init_create_spy), - ("three_sigma", range_threesigma_init_create_spy), - ]: - assert spy.call_count == range_init_call_count_test_struct.expected_call_count_initializer_create[stat_type] - collected_samples = 0 - for tensor_collector in spy.return_values_list: - cur_values = set() - for aggr in tensor_collector.aggregators.values(): - cur_values.add(aggr._collected_samples) - assert len(cur_values) == 1 - collected_samples += cur_values.pop() - - assert collected_samples == range_init_call_count_test_struct.expected_call_count_register_input[stat_type] - - -QUANTIZER_RANGE_INITIALIZERS = [ - "min_max", - "threesigma", - "mean_min_max", - "percentile", - "mixed_min_max", - "mean_percentile", -] - - -class QuantizeRangeInitScaleShapeTestStruct: - def __init__(self, per_channel: bool, is_weights: bool, input_shape: list[int], ref_scale_shape: tuple[int, ...]): - self.per_channel = per_channel - self.is_weights = is_weights - self.input_shape = input_shape - self.ref_scale_shape = ref_scale_shape - - -QRISSTS = QuantizeRangeInitScaleShapeTestStruct - -QUANTIZER_RANGE_INIT_TEST_CASES = [ - QRISSTS(per_channel=False, is_weights=False, input_shape=[41, 42, 43, 44], ref_scale_shape=(1,)), - QRISSTS(per_channel=True, is_weights=False, input_shape=[41, 42, 43, 44], ref_scale_shape=(1, 42, 1, 1)), - QRISSTS(per_channel=False, is_weights=True, input_shape=[41, 42, 43, 44], ref_scale_shape=(1,)), - QRISSTS(per_channel=True, is_weights=True, input_shape=[41, 42, 43, 44], ref_scale_shape=(41, 1, 1, 1)), -] - - -def quantizer_range_init_scale_shape_idfn(fixture_value): - test_struct: QRISSTS = fixture_value[0] - postfix = "" - if test_struct.is_weights: - postfix += "-W" - else: - postfix += "-A" - - if test_struct.per_channel: - postfix += "-PC" - else: - postfix += "-PT" - return fixture_value[1] + postfix - - -@pytest.fixture( - params=itertools.product(QUANTIZER_RANGE_INIT_TEST_CASES, QUANTIZER_RANGE_INITIALIZERS), - ids=quantizer_range_init_scale_shape_idfn, -) -def quantizer_range_init_test_struct(request): - return request.param - - -def test_quantize_range_init_sets_correct_scale_shapes(quantizer_range_init_test_struct: tuple[QRISSTS, str]): - test_struct = quantizer_range_init_test_struct[0] - initializer_type = quantizer_range_init_test_struct[1] - for quantization_mode in [QuantizationMode.SYMMETRIC, QuantizationMode.ASYMMETRIC]: - qconfig = PTQuantizerSpec( - num_bits=8, - mode=quantization_mode, - signedness_to_force=None, - scale_shape=tuple(test_struct.ref_scale_shape), - narrow_range=test_struct.is_weights, - half_range=False, - logarithm_scale=False, - ) - q_cls = QUANTIZATION_MODULES.get(quantization_mode) - quantizer: BaseQuantizer = q_cls(qconfig) - range_init_config = RangeInitConfig(init_type=initializer_type, num_init_samples=1) - - if test_struct.is_weights: - channel_idx = 0 # channel dim for weights - else: - channel_idx = 1 # channel dim for activations - - collector_params = PTRangeInitCollectorParams( - test_struct.is_weights, - quantization_mode, - test_struct.per_channel, - tuple(test_struct.input_shape), - channel_idx, - ) - - collector = StatCollectorGenerator.generate_stat_collector_for_range_init_config( - range_init_config, tuple(quantizer.scale_shape), collector_params - ) - collector.register_input_for_all_reducers(Tensor(torch.ones(test_struct.input_shape))) - stat = collector.get_statistics() - minmax_values = pt_convert_stat_to_min_max_tensor_stat(stat) - quantizer.apply_minmax_init(min_values=minmax_values.min_values.data, max_values=minmax_values.max_values.data) - - assert quantizer.scale_shape == test_struct.ref_scale_shape - if quantization_mode == QuantizationMode.SYMMETRIC: - assert tuple(quantizer.scale.shape) == test_struct.ref_scale_shape - elif quantization_mode == QuantizationMode.ASYMMETRIC: - assert tuple(quantizer.input_low.shape) == test_struct.ref_scale_shape - assert tuple(quantizer.input_range.shape) == test_struct.ref_scale_shape - else: - assert False # options above should be exhaustive - - -def test_range_initialization_in_train_mode(): - """ - Check that if a model in train mode is being compressed, - the range initialization statistic collection still runs in eval mode - """ - - class Model(nn.Module): - def forward(self, x): - # This forward produces different number of operations depending on - # the self.training state. If statistics collection was run in - # training mode it would fail with StatisticsNotCollectedError, - # because it wouldn't find some nodes discovered during model graph - # building, which runs in eval mode. - if self.training: - return x - return x * x * x - - config = get_empty_config() - config["compression"] = {"algorithm": "quantization", "initializer": {"range": {"num_init_samples": 1}}} - data_loader = wrap_dataloader_for_init(create_ones_mock_dataloader(config, 1)) - - config.register_extra_structs([QuantizationRangeInitArgs(data_loader=data_loader)]) - - model = Model() - model.train() - _, _ = create_compressed_model_and_algo_for_test(model, config) diff --git a/tests/torch/quantization/test_scheduler.py b/tests/torch/quantization/test_scheduler.py deleted file mode 100644 index 4f2e05f8770..00000000000 --- a/tests/torch/quantization/test_scheduler.py +++ /dev/null @@ -1,292 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import pytest -import torch -from torch import nn -from torch.utils.data import DataLoader - -from nncf.common.statistics import NNCFStatistics -from nncf.config.structures import QuantizationRangeInitArgs -from nncf.torch import register_default_init_args -from nncf.torch.dynamic_graph.io_handling import FillerInputInfo -from nncf.torch.initialization import wrap_dataloader_for_init -from nncf.torch.quantization.base_ctrl import QuantizationControllerBase -from nncf.torch.quantization.schedulers import StagedQuantizationScheduler -from tests.torch.helpers import OnesDatasetMock -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.quantization.test_algo_quantization import get_squeezenet_quantization_config -from tests.torch.test_models import squeezenet1_1 - -pytestmark = pytest.mark.legacy - - -def create_staged_scheduler(ctrl_spy, w_start=2, a_start=1): - params = {"activations_quant_start_epoch": a_start, "weights_quant_start_epoch": w_start} - scheduler = StagedQuantizationScheduler(ctrl_spy.get_mocked_algo(), params) - return scheduler - - -class QuantizationControllerBaseForTest(QuantizationControllerBase): - @property - def loss(self): - pass - - @property - def scheduler(self): - pass - - def statistics(self, quickly_collected_only: bool = False): - return NNCFStatistics() - - def compression_stage(self): - pass - - -class QuantizationCtrlBaseSpy: - def __init__(self, mocker): - self._mocked_ctrl = QuantizationControllerBaseForTest(mocker.stub) - mocker.patch.object(self._mocked_ctrl, "enable_weight_quantization") - mocker.patch.object(self._mocked_ctrl, "enable_activation_quantization") - mocker.patch.object(self._mocked_ctrl, "disable_weight_quantization") - mocker.patch.object(self._mocked_ctrl, "disable_activation_quantization") - mocker.patch.object(self._mocked_ctrl, "init_range") - - def enable_weight_count(self): - return self._mocked_ctrl.enable_weight_quantization.call_count - - def enable_activation_count(self): - return self._mocked_ctrl.enable_activation_quantization.call_count - - def disable_weight_count(self): - return self._mocked_ctrl.disable_weight_quantization.call_count - - def disable_activation_count(self): - return self._mocked_ctrl.disable_activation_quantization.call_count - - def init_range_count(self): - return self._mocked_ctrl.init_range.call_count - - def get_mocked_algo(self): - return self._mocked_ctrl - - def check_call_counts( - self, - enable_weight_count: int, - enable_activation_count: int, - disable_weight_count: int, - disable_activation_count: int, - init_range_count: int, - ): - assert self.enable_weight_count() == enable_weight_count, "enable weight count mismatch" - assert self.enable_activation_count() == enable_activation_count, "enable activation count mismatch" - assert self.disable_weight_count() == disable_weight_count, "disable weight count mismatch" - assert self.disable_activation_count() == disable_activation_count, "disable activation count mismatch" - assert self.init_range_count() == init_range_count, "init range count mismatch" - - -def test_scheduler_not_enables_quantizations__by_default(mocker): - ctrl_spy = QuantizationCtrlBaseSpy(mocker) - StagedQuantizationScheduler(ctrl_spy.get_mocked_algo()) - ctrl_spy.check_call_counts(0, 0, 1, 1, 0) - - -def test_staged_scheduler_enables_quantizations__with_zero(mocker): - ctrl_spy = QuantizationCtrlBaseSpy(mocker) - create_staged_scheduler(ctrl_spy, 0, 0) - ctrl_spy.check_call_counts(1, 1, 0, 0, 0) - - -def test_staged_scheduler_enables_quantizations_on_epoch_step(mocker): - ctrl_spy = QuantizationCtrlBaseSpy(mocker) - scheduler = create_staged_scheduler(ctrl_spy) - ctrl_spy.check_call_counts(0, 0, 1, 1, 0) - - scheduler.epoch_step() - ctrl_spy.check_call_counts(0, 0, 1, 1, 0) - - scheduler.epoch_step() - ctrl_spy.check_call_counts(0, 1, 1, 1, 1) - - scheduler.epoch_step() - ctrl_spy.check_call_counts(1, 1, 1, 1, 2) - - -def test_staged_scheduler_enables_quantizations_on_epoch_step__at_the_same_time(mocker): - ctrl_spy = QuantizationCtrlBaseSpy(mocker) - scheduler = create_staged_scheduler(ctrl_spy, 1, 1) - ctrl_spy.check_call_counts(0, 0, 1, 1, 0) - - scheduler.epoch_step() - ctrl_spy.check_call_counts(0, 0, 1, 1, 0) - - scheduler.epoch_step() - ctrl_spy.check_call_counts(1, 1, 1, 1, 1) - - -def test_staged_scheduler_enables_quantizations_on_load(mocker): - old_ctrl_spy = QuantizationCtrlBaseSpy(mocker) - old_scheduler = create_staged_scheduler(old_ctrl_spy) - old_scheduler.epoch_step() - old_scheduler.epoch_step() - old_scheduler.epoch_step() - scheduler_state = old_scheduler.get_state() - - ctrl_spy = QuantizationCtrlBaseSpy(mocker) - scheduler = create_staged_scheduler(ctrl_spy, 1, 3) - ctrl_spy.check_call_counts(0, 0, 1, 1, 0) - - scheduler.load_state(scheduler_state) - ctrl_spy.check_call_counts(1, 0, 1, 2, 0) - - scheduler.epoch_step() - ctrl_spy.check_call_counts(1, 1, 1, 2, 1) - - -def test_staged_scheduler_with_empty_quantization(): - config = get_squeezenet_quantization_config() - config["compression"].update( - { - "params": { - "activations_quant_start_epoch": 1, - "weights_quant_start_epoch": 2, - } - } - ) - register_bn_adaptation_init_args(config) - model = squeezenet1_1(num_classes=10, dropout=0) - - model, algo = create_compressed_model_and_algo_for_test(model, config) - scheduler = algo.scheduler - for module in algo.all_quantizations.values(): - assert not module.is_enabled_quantization() - - scheduler.epoch_step() - for module in algo.all_quantizations.values(): - assert not module.is_enabled_quantization() - scheduler.epoch_step() - for wq_info in algo.weight_quantizers.values(): - assert not wq_info.quantizer_module_ref.is_enabled_quantization() - for aq_info in algo.non_weight_quantizers.values(): - assert aq_info.quantizer_module_ref.is_enabled_quantization() - - scheduler.epoch_step() - for module in algo.all_quantizations.values(): - assert module.is_enabled_quantization() - - -def test_staged_scheduler_with_range_init(): - config = get_squeezenet_quantization_config() - config["compression"].update( - { - "params": { - "activations_quant_start_epoch": 1, - "weights_quant_start_epoch": 2, - }, - "initializer": {"range": {"num_init_samples": 1}}, - } - ) - register_bn_adaptation_init_args(config) - model = squeezenet1_1(num_classes=10, dropout=0) - - input_infos_list = FillerInputInfo.from_nncf_config(config) - input_sample_size = input_infos_list.elements[0].shape - data_loader = DataLoader( - OnesDatasetMock(input_sample_size[1:]), - batch_size=1, - num_workers=0, # Workaround for PyTorch MultiprocessingDataLoader issues - shuffle=False, - ) - config.register_extra_structs([QuantizationRangeInitArgs(wrap_dataloader_for_init(data_loader))]) - - model, algo = create_compressed_model_and_algo_for_test(model, config) - scheduler = algo.scheduler - - for module in algo.all_quantizations.values(): - assert not module.is_enabled_quantization() - - scheduler.epoch_step() - for module in algo.all_quantizations.values(): - assert not module.is_enabled_quantization() - - scheduler.epoch_step() - - for wq_info in algo.weight_quantizers.values(): - assert not wq_info.quantizer_module_ref.is_enabled_quantization() - for aq_info in algo.non_weight_quantizers.values(): - assert aq_info.quantizer_module_ref.is_enabled_quantization() - - scheduler.epoch_step() - for module in algo.all_quantizations.values(): - assert module.is_enabled_quantization() - - -class HawqDatasetMock: - def __init__(self, input_size, num_classes): - self.input_size = input_size - self.num_classes = num_classes - super().__init__() - - def __getitem__(self, index): - return torch.ones(self.input_size), torch.LongTensor([1]).squeeze_() - - def __len__(self): - return 1 - - -@pytest.mark.xfail(reason="Ticket: 175018") -def test_staged_scheduler_with_hawq(): - config = get_squeezenet_quantization_config() - config["compression"].update( - { - "params": { - "activations_quant_start_epoch": 1, - "weights_quant_start_epoch": 2, - }, - "initializer": { - "range": {"num_init_samples": 1}, - "precision": {"type": "hawq", "num_data_points": 1, "iter_number": 1, "tolerance": 1}, - }, - } - ) - num_classes = 10 - model = squeezenet1_1(num_classes=num_classes, dropout=0) - - input_infos_list = FillerInputInfo.from_nncf_config(config) - input_sample_size = input_infos_list.elements[0].shape - data_loader = DataLoader( - HawqDatasetMock(input_sample_size[1:], num_classes), - batch_size=1, - num_workers=0, # Workaround for PyTorch MultiprocessingDataLoader issues - shuffle=False, - ) - criterion = nn.CrossEntropyLoss().cuda() - config = register_default_init_args(config, data_loader, criterion=criterion) - - model, algo = create_compressed_model_and_algo_for_test(model, config) - scheduler = algo.scheduler - - for module in algo.all_quantizations.values(): - assert not module.is_enabled_quantization() - - scheduler.epoch_step() - for module in algo.all_quantizations.values(): - assert not module.is_enabled_quantization() - - scheduler.epoch_step() - for wq_info in algo.weight_quantizers.values(): - assert not wq_info.quantizer_module_ref.is_enabled_quantization() - for aq_info in algo.non_weight_quantizers.values(): - assert aq_info.quantizer_module_ref.is_enabled_quantization() - - scheduler.epoch_step() - for module in algo.all_quantizations.values(): - assert module.is_enabled_quantization() diff --git a/tests/torch/quantization/test_strip.py b/tests/torch/quantization/test_strip.py index fd1367719ac..4b8414bdc88 100644 --- a/tests/torch/quantization/test_strip.py +++ b/tests/torch/quantization/test_strip.py @@ -9,21 +9,15 @@ # See the License for the specific language governing permissions and # limitations under the License. -from typing import Any import numpy as np import pytest import torch -from torch import nn -from torch.quantization.fake_quantize import FakeQuantize -import nncf from nncf.common.quantization.quantizers import calculate_asymmetric_level_ranges from nncf.common.quantization.quantizers import calculate_symmetric_level_ranges from nncf.common.quantization.quantizers import get_num_levels from nncf.common.quantization.structs import QuantizationScheme as QuantizationMode -from nncf.config import NNCFConfig -from nncf.torch.graph.transformations.commands import ExtraCompressionModuleType from nncf.torch.quantization.layers import AsymmetricLoraQuantizer from nncf.torch.quantization.layers import AsymmetricQuantizer from nncf.torch.quantization.layers import PTLoraSpec @@ -34,69 +28,19 @@ from nncf.torch.quantization.strip import convert_to_torch_fakequantizer from nncf.torch.quantization.strip import sym_fq_to_decompressor from tests.common.quantization.data_generators import check_outputs -from tests.common.quantization.data_generators import generate_lazy_sweep_data from tests.common.quantization.data_generators import generate_random_low_and_range_by_input_size from tests.common.quantization.data_generators import generate_random_scale_by_input_size from tests.common.quantization.data_generators import generate_sweep_data from tests.common.quantization.data_generators import get_quant_len_by_range -from tests.torch.helpers import BasicConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args from tests.torch.quantization.test_functions import get_test_data -def _get_config_for_algo(input_size, quant_mode="symmetric", overflow_fix="enable", bits=8): - config = NNCFConfig() - config.update({"model": "model", "input_info": {"sample_size": input_size}, "compression": []}) - config["target_device"] = "TRIAL" - config["compression"].append( - { - "algorithm": "quantization", - "initializer": {"range": {"num_init_samples": 0}}, - "weights": {"mode": quant_mode, "bits": bits}, - "activations": {"mode": quant_mode, "bits": bits}, - "overflow_fix": overflow_fix, - } - ) - return config - - def _idfn(val): if isinstance(val, tuple): return "{}".format("-".join([str(v) for v in val])) return None -def check_quantizer_operators(model, levels=255): - """Check that model contains only 8bit FakeQuantize operators.""" - - compression_module_type = ExtraCompressionModuleType.EXTERNAL_QUANTIZER - if model.nncf.is_compression_module_registered(compression_module_type): - external_quantizers = model.nncf.get_compression_modules_by_type(compression_module_type) - for key in list(external_quantizers.keys()): - op = external_quantizers[key] - assert isinstance(external_quantizers[key], FakeQuantize) - assert op.quant_max - op.quant_min == levels - - for node in model.nncf.get_original_graph().get_all_nodes(): - if node.node_type in ["nncf_model_input", "nncf_model_output"]: - continue - - nncf_module = model.nncf.get_containing_module(node.node_name) - - if hasattr(nncf_module, "pre_ops"): - for key in list(nncf_module.pre_ops.keys()): - op = nncf_module.get_pre_op(key).op - assert isinstance(op, FakeQuantize) - assert op.quant_max - op.quant_min == levels - - if hasattr(nncf_module, "post_ops"): - for key in list(nncf_module.post_ops.keys()): - op = nncf_module.post_ops(key).op - assert isinstance(op, FakeQuantize) - assert op.quant_max - op.quant_min == levels - - INPUT_TEST_SCALES = ( (1, 16, 24, 24), (16, 8, 12, 12), @@ -285,77 +229,6 @@ def test_converting_asymmetric_quantizer(input_size, is_per_channel, is_weights, check_outputs(x_nncf.detach().numpy(), x_torch.detach().numpy(), np_is_near_mid_point, quant_lens) -@pytest.mark.parametrize("mode", ("asymmetric", "symmetric")) -@pytest.mark.parametrize("overflow_fix", ("disable", "enable"), ids=("overflow_fix_disable", "overflow_fix_enable")) -def test_strip_quantization(mode, overflow_fix, tmp_path): - num_bits = 8 - model = BasicConvTestModel() - - config = _get_config_for_algo(model.INPUT_SIZE, mode, overflow_fix, bits=num_bits) - register_bn_adaptation_init_args(config) - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - input_tensor = torch.Tensor(generate_lazy_sweep_data(model.INPUT_SIZE)) - x_nncf = compressed_model(input_tensor) - - inference_model = compression_ctrl.strip() - x_torch = inference_model(input_tensor) - check_quantizer_operators(inference_model, 2**num_bits - 1) - - assert torch.all(torch.isclose(x_nncf, x_torch)), f"{x_nncf.view(-1)} != {x_torch.view(-1)}" - - torch.onnx.export(inference_model, input_tensor, f"{tmp_path}/model.onnx", dynamo=False) - - -@pytest.mark.parametrize("strip_type", ("nncf", "torch", "nncf_interfere")) -@pytest.mark.parametrize("do_copy", (True, False), ids=["copy", "inplace"]) -def test_nncf_strip_api(strip_type, do_copy): - model = BasicConvTestModel() - config = _get_config_for_algo(model.INPUT_SIZE) - - quantized_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - - if strip_type == "nncf": - strip_model = nncf.strip(quantized_model, do_copy=do_copy) - elif strip_type == "torch": - strip_model = nncf.torch.strip(quantized_model, do_copy=do_copy) - elif strip_type == "nncf_interfere": - strip_model = quantized_model.nncf.strip(do_copy) - - if do_copy: - assert id(strip_model) != id(quantized_model) - else: - assert id(strip_model) == id(quantized_model) - - assert id(quantized_model) == id(compression_ctrl.model) - - assert isinstance(strip_model.conv.get_pre_op("0").op, FakeQuantize) - assert isinstance(strip_model.nncf.external_quantizers["/nncf_model_input_0|OUTPUT"], FakeQuantize) - - -def check_compression_modules( - model_: nn.Module, - expected_module_type: ExtraCompressionModuleType, - not_expected_module_type: ExtraCompressionModuleType, - expected_class: Any, -) -> None: - """ - Checks if the given model has the expected compression module registered and not the unexpected one. - Also verifies that the compression module is of the expected class type. - - :param model_: The model to be checked, which should have an 'nncf' attribute with compression module methods. - :param expected_module_type: The type of the compression module that is expected to be registered. - :param not_expected_module_type: The type of the compression module that is not expected to be registered. - :param expected_class: The class type that the expected compression module should be an instance of. - """ - assert model_.nncf.is_compression_module_registered(expected_module_type) - assert not model_.nncf.is_compression_module_registered(not_expected_module_type) - compression_modules_dict = model_.nncf.get_compression_modules_by_type(expected_module_type) - assert len(compression_modules_dict) == 1 - compression_module = next(iter(compression_modules_dict.values())) - assert isinstance(compression_module, expected_class) - - SIGNED_WEIGHT_SAMPLE = [-1.0, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75] SCALE_SAMPLE = [2.0] diff --git a/tests/torch/quantization/test_unified_scales.py b/tests/torch/quantization/test_unified_scales.py deleted file mode 100644 index 2559a5291bc..00000000000 --- a/tests/torch/quantization/test_unified_scales.py +++ /dev/null @@ -1,731 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import itertools -from collections import Counter -from functools import partial - -import onnx -import pytest -import torch -import torch.nn - -import nncf -from nncf.common.graph import NNCFNodeName -from nncf.common.graph.transformations.commands import TargetType -from nncf.common.hardware.config import HWConfigType -from nncf.common.quantization.quantizer_propagation.solver import QuantizerPropagationSolver -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.torch.dynamic_graph.operation_address import OperationAddress -from nncf.torch.graph.transformations.commands import PTTargetPoint -from nncf.torch.model_creation import wrap_model -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.layers import AsymmetricQuantizer -from tests.cross_fw.test_templates.test_unified_scales import TemplateTestUnifiedScales -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import get_nodes_by_type -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.helpers import resolve_constant_node_inputs_to_values -from tests.torch.quantization.quantization_helpers import get_quantization_config_without_range_init -from tests.torch.quantization.test_onnx_export import get_successors - - -def make_op_address_for_coalescing_test(scope_str: str) -> OperationAddress: - op_address = OperationAddress.from_str(scope_str) - return op_address - - -def make_insertion_point_for_coalescing_test(node_name: NNCFNodeName, input_port_id: int = None) -> PTTargetPoint: - retval = PTTargetPoint(TargetType.OPERATOR_POST_HOOK, target_node_name=node_name, input_port_id=input_port_id) - return retval - - -@pytest.mark.parametrize( - "input_insertion_points, linked_scopes_groups_list, ref_coalesced_ip_lists", - # ref_coalesced_ip_lists == None means that the coalescing should raise an exception - [ - # 0 - Empty linked scopes list - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - ], - [], - # Each coalesced list has one entry - [ - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - ], - ], - ), - # 1 - Linked scope only affects 1 operation - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0"), - ], - [["Foo/Baz[bar]/conv2d_0"]], - # Each coalesced list has one entry - [ - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0", input_port_id=0), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0"), - ], - ], - ), - # 2 - Same as 1 but with multiple groups - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - ], - [["Foo/Baz[bar]/conv2d_0"], ["Foo/Xyz[leet]/__add___0"]], - # Each coalesced list has one entry again - [ - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0", input_port_id=0), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - ], - ], - ), - # 3 - Single group affecting some of the scopes - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0", input_port_id=1), - ], - [["Foo/Xyz[leet]/matmul_0", "Foo/Xyz[leet]/__add___0", "Foo/Baz[bar]/linear_0"]], - [ - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0", input_port_id=1), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - ], - ], - ), - # 4 - Multiple groups, each affecting one operation - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - ], - [["Foo/Baz[bar]/linear_0"], ["Foo/Asdf[jkl]/softmax_0"]], - [ - # Each coalesced list has one entry again - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0"), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0", input_port_id=0), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=0), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0", input_port_id=0), - ], - ], - ), - # 5 - Multiple groups affecting multiple operations without overlapping - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_1", input_port_id=0), - ], - [ - ["Foo/Baz[bar]/conv2d_0", "Foo/Baz[bar]/linear_0"], - ["Foo/Asdf[jkl]/softmax_1", "Foo/Xyz[leet]/__add___0"], - ], - [ - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0", input_port_id=0), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_1", input_port_id=0), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0", input_port_id=1), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0"), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - ], - ], - ), - # 6 - A variation of 5 - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - make_insertion_point_for_coalescing_test( - "Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0", - input_port_id=0, - ), - ], - [ - ["Foo/Baz[bar]/conv2d_0", "Foo/Baz[bar]/linear_0", "Foo/Xyz[leet]/matmul_0"], - ["Foo/Asdf[jkl]/softmax_0", "Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"], - ], - [ - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/matmul_0"), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0", input_port_id=0), - ], - [ - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0"), - ], - ], - ), - # 7 - Overlapping groups - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0"), - make_insertion_point_for_coalescing_test( - "Foo/Xyz[leet]/matmul_0", - input_port_id=1, - ), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"), - ], - [ - ["Foo/Baz[bar]/conv2d_0", "Foo/Baz[bar]/linear_0", "Foo/Xyz[leet]/matmul_0"], - ["Foo/Xyz[leet]/matmul_0", "Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"], - ], - None, - ), - # 8 - More than 1 match for the operation specified in the group - ( - [ - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/conv2d_0"), - make_insertion_point_for_coalescing_test( - "Foo/Baz[bar]/conv2d_0", - input_port_id=0, - ), - make_insertion_point_for_coalescing_test( - "Foo/Baz[bar]/linear_0", - ), - make_insertion_point_for_coalescing_test( - "Foo/Xyz[leet]/__add___0", - ), - make_insertion_point_for_coalescing_test( - "Foo/Xyz[leet]/matmul_0", - input_port_id=1, - ), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"), - ], - [ - ["Foo/Baz[bar]/conv2d_0", "Foo/Xyz[leet]/matmul_0"], - ["Foo/Xyz[leet]/matmul_0", "Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"], - ], - None, - ), - # 9 - No match for an operation specified in the group - ( - [ - make_insertion_point_for_coalescing_test( - "Foo/Baz[bar]/conv2d_0", - input_port_id=0, - ), - make_insertion_point_for_coalescing_test("Foo/Baz[bar]/linear_0"), - make_insertion_point_for_coalescing_test("Foo/Xyz[leet]/__add___0"), - make_insertion_point_for_coalescing_test( - "Foo/Xyz[leet]/matmul_0", - input_port_id=1, - ), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/softmax_0"), - make_insertion_point_for_coalescing_test("Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"), - ], - [ - ["Foo/Baz[bar]/conv2d_0", "Foo/Xyz[leet]/matmul_1"], - ["Foo/Xyz[leet]/matmul_0", "Foo/Asdf[jkl]/Qwer[tyu]/conv2d_0"], - ], - None, - ), - ], -) -def test_insertion_point_coalescing( - input_insertion_points: list[PTTargetPoint], - linked_scopes_groups_list: list[list[str]], - ref_coalesced_ip_lists: list[list[PTTargetPoint]], -): - if ref_coalesced_ip_lists is None: - with pytest.raises((nncf.InternalError, nncf.ValidationError)): - _ = QuantizerPropagationSolver.coalesce_insertion_points(input_insertion_points, linked_scopes_groups_list) - else: - test_coalesced_ip_lists = QuantizerPropagationSolver.coalesce_insertion_points( - input_insertion_points, linked_scopes_groups_list - ) - assert len(test_coalesced_ip_lists) == len(ref_coalesced_ip_lists) - for idx, test_list in enumerate(test_coalesced_ip_lists): - assert Counter(test_list) == Counter(ref_coalesced_ip_lists[idx]) - - -class EltwiseQuantizerLinkingTestModel(torch.nn.Module): - def __init__(self): - super().__init__() - - class Path(torch.nn.Module): - def forward(self, input_1, input_2): - retval0 = input_1 + input_2 - retval1 = retval0 * input_2 - retval2 = retval0 + retval1 - # __add___0, __mul___0, __add___1 results respectively - return retval0, retval1, retval2 - - self.path1 = Path() - self.path2 = Path() - - def forward(self, input_1, input_2): - path1_results = self.path1(input_1, input_2) - path2_results = self.path2(input_1, input_2) - return tuple(x + y for x, y in zip(path1_results, path2_results)) - - -def test_quantizer_scale_linking(mocker): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["input_info"] = [ - { - "sample_size": [1, 1, 1, 1], - }, - { - "sample_size": [1, 1, 1, 1], - }, - ] - nncf_config["compression"]["activations"] = { - "unified_scale_ops": [ - [ - # Note: Assuming that quantizers are attached as a post-op to the specified operation - "EltwiseQuantizerLinkingTestModel/Path[path2]/__mul___0", - "EltwiseQuantizerLinkingTestModel/Path[path2]/__add___0", - ] - ], - "ignored_scopes": [ - # Ignore path output averaging operations - "EltwiseQuantizerLinkingTestModel/__add___0", - "EltwiseQuantizerLinkingTestModel/__add___1", - "EltwiseQuantizerLinkingTestModel/__add___2", - ], - } - register_bn_adaptation_init_args(nncf_config) - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - EltwiseQuantizerLinkingTestModel(), nncf_config - ) - - # 18 inputs to quantize (14 regular + 4 linked), - # 8 quantization points left after propagation, out of these 3 are linked - assert len(compression_ctrl.non_weight_quantizers) == 6 - - shared_quantizer_id = NonWeightQuantizerId(target_node_name="/nncf_model_input_0") - - non_shared_spies = [] - for aq_id, aq_info in compression_ctrl.non_weight_quantizers.items(): - quantizer = aq_info.quantizer_module_ref - spy = mocker.spy(quantizer, "forward") - if aq_id == shared_quantizer_id: - shared_spy = spy - else: - non_shared_spies.append(spy) - - test_input1 = torch.ones([1, 1, 1, 1]) - test_input2 = 2 * test_input1 - compressed_model(test_input1, test_input2) - - assert shared_spy.call_count == 3 - for non_shared_spy in non_shared_spies: - assert non_shared_spy.call_count == 1 - - -def test_eltwise_unified_scales_for_npu(): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["input_info"] = [ - { - "sample_size": [1, 1, 1, 1], - }, - { - "sample_size": [1, 1, 1, 1], - }, - ] - nncf_config["target_device"] = "NPU" - register_bn_adaptation_init_args(nncf_config) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(EltwiseQuantizerLinkingTestModel(), nncf_config) - - assert len(compression_ctrl.non_weight_quantizers) == 2 - - total_quantizations = sum(len(info.affected_insertions) for info in compression_ctrl.non_weight_quantizers.values()) - assert total_quantizations == 8 - - -class SingleCatModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv = torch.nn.Conv2d(4, 1, 1) - - def forward(self, x, y): - x = x * x - y = y * y - z = torch.cat([x, y]) - v = self.conv(z) - return v - - -class DoubleCatModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv = torch.nn.Conv2d(4, 1, 1) - - def forward(self, x, y): - x = x * x - y = y * y - z = torch.cat([x, y]) - v = torch.cat([x, z]) - w = self.conv(v) - return w - - -class UNetLikeModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv_1 = torch.nn.Conv2d(4, 8, 1) - self.conv_2 = torch.nn.Conv2d(8, 16, 1) - self.conv_3 = torch.nn.Conv2d(16, 32, 1) - self.conv_t_3 = torch.nn.ConvTranspose2d(32, 16, 1) - self.conv_t_2 = torch.nn.ConvTranspose2d(16, 8, 1) - self.conv_t_1 = torch.nn.ConvTranspose2d(8, 4, 1) - - def forward(self, x, y): - y1 = self.conv_1(x) - y2 = self.conv_2(y1) - y3 = self.conv_3(y2) - z3 = self.conv_t_3(y3) - z3 = torch.cat([z3, y2]) - z2 = self.conv_t_2(z3) - z2 = torch.cat([z2, y1]) - z1 = self.conv_t_1(z2) - return z1 - - -CAT_UNIFIED_SCALE_TEST_STRUCTS = [(SingleCatModel, 3, 4), (DoubleCatModel, 3, 4), (UNetLikeModel, 4, 6)] - - -@pytest.mark.parametrize( - "target_device, model_creator, ref_aq_module_count, ref_quantizations", - [ - (t_dev,) + rest - for t_dev, rest in itertools.product([x.value for x in HWConfigType], CAT_UNIFIED_SCALE_TEST_STRUCTS) - ], -) -def test_unified_scales_with_concat(target_device, model_creator, ref_aq_module_count, ref_quantizations): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["input_info"] = [ - { - "sample_size": [1, 4, 1, 1], - }, - { - "sample_size": [1, 4, 1, 1], - }, - ] - - nncf_config["target_device"] = target_device - register_bn_adaptation_init_args(nncf_config) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(model_creator(), nncf_config) - - assert len(compression_ctrl.non_weight_quantizers) == ref_aq_module_count - - total_quantizations = sum(len(info.affected_insertions) for info in compression_ctrl.non_weight_quantizers.values()) - assert total_quantizations == ref_quantizations - - -class SimplerModelForUnifiedScalesTesting(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv2d_1 = torch.nn.Conv2d(1, 1, 1) - self.conv2d_2 = torch.nn.Conv2d(1, 1, 1) - self.conv2d_3 = torch.nn.Conv2d(1, 1, 1) - self.conv2d_4 = torch.nn.Conv2d(1, 1, 1) - self.conv2d_5 = torch.nn.Conv2d(1, 1, 1) - self.conv2d_6 = torch.nn.Conv2d(1, 1, 1) - - def forward(self, x): - in_1, in_2 = x.chunk(dim=-1, chunks=2) - in_1 = self.conv2d_1(in_1) - in_2 = self.conv2d_2(in_2) - x = in_1 + in_2 - x = torch.stack([x, x], dim=-1) - x = x.squeeze(dim=0) - in1, in2 = x.chunk(dim=-1, chunks=2) - in1 = self.conv2d_3(in1) - in2 = self.conv2d_3(in2) - x = torch.cat([in1, in2], dim=-1) - in_1, in_2 = x.chunk(dim=-1, chunks=2) - in_1 = self.conv2d_5(in_1) - in_2 = self.conv2d_6(in_2) - x = in_1 * in_2 - return x - - -class TwoEmbeddingAddModel(torch.nn.Module): - EMBEDDING_IO_SHAPE = [10, 10] - - def __init__(self): - super().__init__() - self.embedding1 = torch.nn.Embedding(*self.EMBEDDING_IO_SHAPE) - self.embedding2 = torch.nn.Embedding(*self.EMBEDDING_IO_SHAPE) - - def forward(self, x): - y1 = self.embedding1(x) - y2 = self.embedding2(x) - return y1 + y2 - - -class TestsWithONNXInspection: - @staticmethod - def get_fq_nodes(onnx_model: onnx.ModelProto) -> list[onnx.NodeProto]: - return get_nodes_by_type(onnx_model, "FakeQuantize") - - @staticmethod - def immediately_dominates_add_or_mul(node: onnx.NodeProto, graph: onnx.GraphProto) -> bool: - if len(node.output) != 1: - return False - output_tensor_id = node.output[0] - matches = [x for x in graph.node if output_tensor_id in x.input] - for match in matches: - if match.op_type in ["Add", "Mul"]: - return True - return False - - @staticmethod - def immediately_dominates_cat(node: onnx.NodeProto, graph: onnx.GraphProto) -> bool: - if len(node.output) != 1: - return False - output_tensor_id = node.output[0] - matches = [x for x in graph.node if output_tensor_id in x.input] - for match in matches: - if match.op_type in ["Concat"]: - return True - return False - - @staticmethod - def immediately_dominates_embedding(node: onnx.NodeProto, graph: onnx.GraphProto) -> bool: - if len(node.output) != 1: - return False - output_tensor_id = node.output[0] - matches = [x for x in graph.node if output_tensor_id in x.input] - for match in matches: - if match.op_type in ["Gather"]: - return True - return False - - @staticmethod - def group_nodes_by_output_target(nodes: list[onnx.NodeProto], graph: onnx.GraphProto) -> list[list[onnx.NodeProto]]: - output_nodes: dict[str, list[onnx.NodeProto]] = {} - for node in nodes: - succs = get_successors(node, graph) - assert len(succs) == 1 - target_node_name = next(iter(succs)).name - if target_node_name not in output_nodes: - output_nodes[target_node_name] = [] - output_nodes[target_node_name].append(node) - return list(output_nodes.values()) - - def test_unified_scales_are_identical_in_onnx(self, tmp_path): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["compression"]["quantize_outputs"] = True - nncf_config["input_info"] = [ - { - "sample_size": [1, 1, 1, 2], - }, - ] - nncf_config["target_device"] = "NPU" - register_bn_adaptation_init_args(nncf_config) - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - SimplerModelForUnifiedScalesTesting(), nncf_config - ) - - with torch.no_grad(): - for quant_info in compression_ctrl.non_weight_quantizers.values(): - if isinstance(quant_info.quantizer_module_ref, AsymmetricQuantizer): - quant_info.quantizer_module_ref.input_range *= torch.abs( - torch.rand_like(quant_info.quantizer_module_ref.input_range) - ) - else: - quant_info.quantizer_module_ref.scale *= torch.abs( - torch.rand_like(quant_info.quantizer_module_ref.scale) - ) - - test_input1 = torch.ones([1, 1, 1, 2]) - compressed_model.forward(test_input1) - - onnx_path = str(tmp_path / "model.onnx") - # Exporting the operator ::chunk to ONNX opset version 9 is not supported. - # Support for this operator was added in version 11 - compression_ctrl.export_model(onnx_path, save_format="onnx_11") - - onnx_model = onnx.load(onnx_path) - - fq_nodes = TestsWithONNXInspection.get_fq_nodes(onnx_model) - eltwise_dominator_predicate = partial( - TestsWithONNXInspection.immediately_dominates_add_or_mul, graph=onnx_model.graph - ) - eltwise_fq_nodes = list(filter(eltwise_dominator_predicate, fq_nodes)) - - cat_dominator_predicate = partial(TestsWithONNXInspection.immediately_dominates_cat, graph=onnx_model.graph) - cat_fq_nodes = list(filter(cat_dominator_predicate, fq_nodes)) - - fq_nodes_grouped_by_output = TestsWithONNXInspection.group_nodes_by_output_target( - eltwise_fq_nodes + cat_fq_nodes, onnx_model.graph - ) - - for unified_scale_group in fq_nodes_grouped_by_output: - inputs = [ - resolve_constant_node_inputs_to_values(fq_node, onnx_model.graph) for fq_node in unified_scale_group - ] - for inputs_dict in inputs[1:]: - curr_values = list(inputs_dict.values()) - ref_values = list(inputs[0].values()) - assert curr_values == ref_values # All inputs for unified scale quantizers must be equal - - def test_weight_and_act_quantizer_scale_unification(self, tmp_path): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["input_info"] = [ - {"sample_size": [1, 5], "type": "long", "filler": "zeros"}, - ] - nncf_config["target_device"] = "NPU" - register_bn_adaptation_init_args(nncf_config) - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - TwoEmbeddingAddModel(), nncf_config - ) - - with torch.no_grad(): - for quant_module in compression_ctrl.all_quantizations.values(): - if isinstance(quant_module, AsymmetricQuantizer): - quant_module.input_range *= torch.abs(torch.rand_like(quant_module.input_range)) - else: - quant_module.scale *= torch.abs(torch.rand_like(quant_module.scale)) - - test_input1 = torch.ones([1, 5], dtype=torch.long) - compressed_model.forward(test_input1) - onnx_path = str(tmp_path / "model.onnx") - compression_ctrl.export_model(onnx_path) - - onnx_model = onnx.load(onnx_path) - - fq_nodes = TestsWithONNXInspection.get_fq_nodes(onnx_model) - eltwise_dominator_predicate = partial( - TestsWithONNXInspection.immediately_dominates_add_or_mul, graph=onnx_model.graph - ) - embedding_dominator_predicate = partial( - TestsWithONNXInspection.immediately_dominates_embedding, graph=onnx_model.graph - ) - eltwise_fq_nodes = list(filter(eltwise_dominator_predicate, fq_nodes)) - embedding_weight_fq_nodes = list(filter(embedding_dominator_predicate, fq_nodes)) - - fq_nodes_with_expected_unified_scales = embedding_weight_fq_nodes + eltwise_fq_nodes - - unified_fq_node_inputs = [ - resolve_constant_node_inputs_to_values(fq_node, onnx_model.graph) - for fq_node in fq_nodes_with_expected_unified_scales - ] - - # delete weights from input dict - for inputs_for_single_fq in unified_fq_node_inputs: - weight_input_names = [] - for input_name, input_tensor in inputs_for_single_fq.items(): - if list(input_tensor.shape) == TwoEmbeddingAddModel.EMBEDDING_IO_SHAPE: - weight_input_names.append(input_name) - for weight_input_name in weight_input_names: - inputs_for_single_fq.pop(weight_input_name) - - ref_values = list(unified_fq_node_inputs[0].values()) - for inputs_dict in unified_fq_node_inputs[1:]: - curr_values = list(inputs_dict.values()) - assert curr_values == ref_values # All inputs for unified scale quantizers must be equal - - -class SharedEmbeddingAddModel(torch.nn.Module): - def __init__(self): - super().__init__() - self.shared_embedding = torch.nn.Embedding(10, 10) - - def forward(self, x): - y1 = self.shared_embedding(x) - y2 = self.shared_embedding(x) - return y1 + y2 - - -def test_unified_scales_with_shared_nodes(): - nncf_config = get_quantization_config_without_range_init(model_size=1) - nncf_config["input_info"] = [ - {"sample_size": [1, 5], "type": "long", "filler": "zeros"}, - ] - nncf_config["target_device"] = "NPU" - register_bn_adaptation_init_args(nncf_config) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(SharedEmbeddingAddModel(), nncf_config) - - assert len(compression_ctrl.weight_quantizers) == 1 # The two embedding nodes point to a single shared layer - assert len(compression_ctrl.non_weight_quantizers) == 0 # The "add" operation has its inputs already quantized - - -class TestUnifiedScales(TemplateTestUnifiedScales): - def get_backend_specific_model(self, model: torch.nn.Module) -> NNCFNetwork: - q_input_shape = model.Q_INPUT_SHAPE - kv_input_shape = model.KV_INPUT_SHAPE - backend_model = wrap_model( - model, - ( - torch.ones(q_input_shape), - torch.ones(q_input_shape), - torch.ones(kv_input_shape), - torch.ones(kv_input_shape), - ), - trace_parameters=True, - ) - - return backend_model diff --git a/tests/torch/test_api_behavior.py b/tests/torch/test_api_behavior.py deleted file mode 100644 index edd97a79943..00000000000 --- a/tests/torch/test_api_behavior.py +++ /dev/null @@ -1,144 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -import torch -from torch.utils.data import DataLoader - -import nncf -from nncf import NNCFConfig -from nncf.common.quantization.quantizer_setup import SingleConfigQuantizerSetup -from nncf.torch import create_compressed_model -from nncf.torch import register_default_init_args -from nncf.torch.tensor_statistics.algo import TensorStatisticsCollectionBuilder -from nncf.torch.tensor_statistics.algo import TensorStatisticsCollectionController -from tests.torch.helpers import BasicConvTestModel -from tests.torch.helpers import OnesDatasetMock -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.nncf_network.helpers import SimplestModel - -pytestmark = pytest.mark.legacy - -INPUT_SAMPLE_SIZE = [1, 1, 4, 4] -CONFIG_WITH_ALL_INIT_TYPES = { - "model": "basic_quant_conv", - "input_info": { - "sample_size": INPUT_SAMPLE_SIZE, - }, - "compression": { - "algorithm": "quantization", - "initializer": { - "precision": {"type": "hawq", "bits": [4, 8, 6], "num_data_points": 1, "iter_number": 1, "tolerance": 1e-2}, - "range": {"num_init_samples": 1}, - "batchnorm_adaptation": {"num_bn_adaptation_samples": 5}, - }, - }, -} - - -@pytest.fixture(name="nncf_config_with_default_init_args") -def nncf_config_with_default_init_args_(mocker): - config = NNCFConfig.from_dict(CONFIG_WITH_ALL_INIT_TYPES) - - train_loader = DataLoader( - OnesDatasetMock(INPUT_SAMPLE_SIZE[1:]), - batch_size=1, - num_workers=0, # Workaround for PyTorch MultiprocessingDataLoader issues - shuffle=False, - ) - mocker_criterion = mocker.stub() - mocker_criterion.batch_size = 1 - - config = register_default_init_args(config, train_loader, criterion=mocker_criterion) - return config - - -@pytest.mark.parametrize( - ("config_cutter", "tensor_statistics_collection_count", "precision_init_call_count", "bn_adaptation_call_count"), - [ - # 1 stat collection for setting up an experimental quantization setup for precision init, - # + 1 stat collection for implicit range initialization with default parameters - (lambda x: x["initializer"].pop("range"), 2, 1, 1), - (lambda x: x.pop("initializer"), 1, 0, 1), - (lambda x: x["initializer"].pop("precision"), 1, 0, 1), - (lambda x: x["initializer"]["range"].update({"num_init_samples": 0}), 0, 1, 1), - ], - ids=["precision_init_only", "no_init_params", "range_init_only", "skip_range_init"], -) -def test_range_init_is_called( - nncf_config_with_default_init_args, - config_cutter, - tensor_statistics_collection_count, - precision_init_call_count, - bn_adaptation_call_count, - mocker, -): - config = nncf_config_with_default_init_args - model = BasicConvTestModel() - - _ = mocker.patch("nncf.torch.initialization.SimpleDataLoaderRunner.run") - stat_builder_apply_to_spy = mocker.spy(TensorStatisticsCollectionBuilder, "apply_to") - stat_builder_build_controller_mm = mocker.patch( - "nncf.torch.tensor_statistics.algo.TensorStatisticsCollectionBuilder.build_controller" - ) - stat_builder_build_controller_mm.return_value = TensorStatisticsCollectionController(None, {}) - - precision_init_spy = mocker.patch( - "nncf.torch.quantization.precision_init.hawq_init.HAWQPrecisionInitializer.apply_init", autospec=True - ) # autospec=True will patch the function as an instance method - bn_adaptation_spy = mocker.patch("nncf.torch.initialization.DataLoaderBNAdaptationRunner.run") - - def fn(self) -> SingleConfigQuantizerSetup: - return self._algo.get_quantizer_setup_for_current_state() - - precision_init_spy.side_effect = fn - - config_cutter(config["compression"]) - create_compressed_model_and_algo_for_test(model, config) - - assert stat_builder_apply_to_spy.call_count == tensor_statistics_collection_count - assert precision_init_spy.call_count == precision_init_call_count - assert bn_adaptation_spy.call_count == bn_adaptation_call_count - - -class DeviceCheckingModel(torch.nn.Module): - def __init__(self, device): - super().__init__() - self.model = TwoConvTestModel() - self.original_device = device - self.model.to(device) - - def forward(self, x): - for param in self.model.parameters(): - # 'in' to handle the situation when .to('cuda') results in 'cuda:0' actual device - assert self.original_device in str(param.device) - return self.model.forward(x) - - -@pytest.mark.xfail(reason="Ticket: 175018") -@pytest.mark.parametrize( - "original_device", - ["cpu", pytest.param("cuda", marks=pytest.mark.cuda), pytest.param("cuda:0", marks=pytest.mark.cuda)], -) -def test_model_is_inited_with_own_device_by_default(nncf_config_with_default_init_args, original_device): - if not torch.cuda.is_available() and "cuda" in original_device: - pytest.skip("Skipping for CPU-only setups") - model = DeviceCheckingModel(original_device) - create_compressed_model_and_algo_for_test(model, nncf_config_with_default_init_args) - - -def test_repeat_compression_fails(): - model = SimplestModel() - nncf_config = NNCFConfig.from_dict({"input_info": {"sample_size": SimplestModel.INPUT_SIZE}}) - _ = create_compressed_model(model, nncf_config) - with pytest.raises(nncf.InternalError, match="The model object has already been compressed."): - _ = create_compressed_model(model, nncf_config) diff --git a/tests/torch/test_compressed_graph.py b/tests/torch/test_compressed_graph.py index 594269d2faf..68b03a00794 100644 --- a/tests/torch/test_compressed_graph.py +++ b/tests/torch/test_compressed_graph.py @@ -42,17 +42,11 @@ from nncf.torch.layers import NNCF_WRAPPED_USER_MODULES_DICT from nncf.torch.layers import LSTMCellNNCF from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.quantization.algo import QuantizationBuilder -from nncf.torch.utils import get_all_modules_by_type from nncf.torch.utils import get_model_device from tests.cross_fw.shared.nx_graph import compare_nx_graph_with_reference from tests.cross_fw.shared.paths import TEST_ROOT from tests.torch import test_models -from tests.torch.helpers import create_compressed_model_and_algo_for_test from tests.torch.helpers import get_empty_config -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.modules.seq2seq.gnmt import GNMT -from tests.torch.modules.test_rnn import replace_lstm from tests.torch.test_models.synthetic import ArangeModel from tests.torch.test_models.synthetic import Baddbmm from tests.torch.test_models.synthetic import ConvBNLeakyReLU @@ -80,18 +74,6 @@ pytestmark = pytest.mark.legacy -def get_basic_quantization_config( - quantization_type="symmetric", input_sample_sizes=None, input_info: Union[list, dict] = None -): - config = get_empty_config(input_sample_sizes=input_sample_sizes, input_info=input_info) - config["compression"] = { - "algorithm": "quantization", - "activations": {"mode": quantization_type}, - "weights": {"mode": quantization_type}, - } - return config - - def get_basic_quantization_config_with_hw_config_type(hw_config_type, input_sample_size): config = get_empty_config(input_sample_sizes=input_sample_size) config["target_device"] = hw_config_type @@ -283,122 +265,6 @@ def get_sparsifiable_modules(self, algo_name): sparsifiable_modules.append(module_cls.__name__) return sparsifiable_modules - @pytest.mark.parametrize( - "algo", - ( - "rb_sparsity", - "magnitude_sparsity", - "const_sparsity", - ), - ids=["RB", "Magnitude", "Const"], - ) - def test_sparse_network(self, desc: ModelDesc, algo): - model = desc.model_builder() - - config = get_empty_config(input_sample_sizes=desc.input_sample_sizes) - config["compression"] = {"algorithm": algo} - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - model, config, dummy_forward_fn=desc.dummy_forward_fn, wrap_inputs_fn=desc.wrap_inputs_fn - ) - - sparsifiable_modules = self.get_sparsifiable_modules(algo) - ref_num_sparsed = len(get_all_modules_by_type(model, sparsifiable_modules)) - assert ref_num_sparsed == len(compression_ctrl.sparsified_module_info) - check_model_graph(compressed_model, desc.dot_filename(), algo) - - def test_quantize_network(self, desc: ModelDesc, _case_config): - model = desc.model_builder() - - config = get_basic_quantization_config(_case_config.quant_type, input_sample_sizes=desc.input_sample_sizes) - register_bn_adaptation_init_args(config) - compressed_model, _ = create_compressed_model_and_algo_for_test( - model, config, dummy_forward_fn=desc.dummy_forward_fn, wrap_inputs_fn=desc.wrap_inputs_fn - ) - check_model_graph(compressed_model, desc.dot_filename(), _case_config.graph_dir) - - def test_sparse_quantize_network(self, desc: ModelDesc): - model = desc.model_builder() - - config = get_empty_config(input_sample_sizes=desc.input_sample_sizes) - config["compression"] = [{"algorithm": "rb_sparsity"}, {"algorithm": "quantization"}] - register_bn_adaptation_init_args(config) - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - model, config, dummy_forward_fn=desc.dummy_forward_fn, wrap_inputs_fn=desc.wrap_inputs_fn - ) - - sparsifiable_modules = self.get_sparsifiable_modules("rb_sparsity") - ref_num_sparsed = len(get_all_modules_by_type(compressed_model, sparsifiable_modules)) - - assert ref_num_sparsed == len(compression_ctrl.child_ctrls[0].sparsified_module_info) - check_model_graph(compressed_model, desc.dot_filename(), "quantized_rb_sparsity") - - -@pytest.mark.skip(reason="Sporadic failures") -def test_gnmt_quantization(_case_config): - model = GNMT(vocab_size=32) - model = replace_lstm(model) - forward_fn_ = gnmt_forward_fn(seq_len=10, batch_size=3, vocab_size=32) - - config = get_basic_quantization_config(_case_config.quant_type) - config["input_info"] = [ - {"sample_size": [3, 10], "type": "long"}, - {"sample_size": [3], "type": "long"}, - {"sample_size": [3, 10], "type": "long"}, - ] - config["compression"].update( - { - "ignored_scopes": [ - "GNMT/ResidualRecurrentEncoder[encoder]/Embedding[embedder]", - "GNMT/ResidualRecurrentDecoder[decoder]/Embedding[embedder]", - ] - } - ) - - compressed_model = NNCFNetwork( - model, - input_info=FillerInputInfo.from_nncf_config(config), - dummy_forward_fn=forward_fn_, - wrap_inputs_fn=gnmt_wrap_inputs_fn, - scopes_without_shape_matching=[ - "GNMT/ResidualRecurrentDecoder[decoder]/RecurrentAttention[att_rnn]/BahdanauAttention[attn]" - ], - ) - - builder = QuantizationBuilder(config, should_init=False) - builder.apply_to(compressed_model) - - check_model_graph(compressed_model, "gnmt_variable.dot", _case_config.graph_dir) - - -def test_resnet18__with_not_qinput(_case_config): - model = test_models.ResNet18() - input_shape = [1, 3, 32, 32] - - config = get_basic_quantization_config(_case_config.quant_type, input_sample_sizes=input_shape) - config["compression"].update({"quantize_inputs": False}) - register_bn_adaptation_init_args(config) - - compressed_model, _ = create_compressed_model_and_algo_for_test(model, config) - check_model_graph(compressed_model, "resnet18_no_qinput.dot", _case_config.graph_dir) - - -def test_resnet18__with_ignore(_case_config): - model = test_models.ResNet18() - input_shape = [1, 3, 32, 32] - - config = get_basic_quantization_config(_case_config.quant_type, input_sample_sizes=input_shape) - ignored_scopes = [ - "{re}ResNet/Sequential\\[layer3\\].*", - ] - config.update({"ignored_scopes": ignored_scopes}) # Global config ignored_scopes for NNCF module replacement - config["compression"].update({"ignored_scopes": ignored_scopes}) # Local ignored_scopes for quantization - register_bn_adaptation_init_args(config) - - compressed_model, _ = create_compressed_model_and_algo_for_test(model, config) - check_model_graph(compressed_model, "resnet18_ignore.dot", _case_config.graph_dir) - def n_inputs_fn(model_args, model_kwargs, nargs=2): model_args = tuple(nncf_model_input(model_args[i]) for i in range(nargs)) @@ -815,40 +681,6 @@ def forward(self, x): ] -@pytest.mark.parametrize( - "synthetic_model_desc", SYNTHETIC_MODEL_DESC_LIST, ids=[m.model_name for m in SYNTHETIC_MODEL_DESC_LIST] -) -def test_synthetic_model_quantization(synthetic_model_desc: IModelDesc): - model = synthetic_model_desc.get_model() - if isinstance(model, MultiOutputSameTensorModel): - pytest.xfail("The MultiOutputSameTensorModel is skipped, ticket 110944.") - - config = get_basic_quantization_config( - input_sample_sizes=synthetic_model_desc.get_input_sample_sizes(), input_info=synthetic_model_desc.input_info - ) - register_bn_adaptation_init_args(config) - - compressed_model, _ = create_compressed_model_and_algo_for_test( - model, config, wrap_inputs_fn=synthetic_model_desc.get_wrap_inputs_fn() - ) - - check_model_graph( - compressed_model, synthetic_model_desc.get_dot_filename(), os.path.join("quantized", "synthetic_model") - ) - - -def test_output_quantization(_case_config): - model = test_models.UNet() - input_shape = [1, 3, 360, 480] - - config = get_basic_quantization_config(_case_config.quant_type, input_sample_sizes=input_shape) - config["compression"].update({"quantize_outputs": True}) - register_bn_adaptation_init_args(config) - - compressed_model, _ = create_compressed_model_and_algo_for_test(model, config) - check_model_graph(compressed_model, "unet_qoutput.dot", _case_config.graph_dir) - - TEST_HW_MODELS_DESC = [ ModelDesc("resnet50", test_models.ResNet50, [1, 3, 32, 32]), ModelDesc("inception_v3", partial(test_models.Inception3, aux_logits=True, transform_input=True), [2, 3, 299, 299]), @@ -858,35 +690,6 @@ def test_output_quantization(_case_config): TYPE_HW = [(HWConfigType.CPU), (HWConfigType.GPU), (HWConfigType.NPU)] -@pytest.fixture(scope="function", params=TYPE_HW) -def hw_config_type(request): - type_hw = request.param - return type_hw - - -@pytest.mark.parametrize("desc", TEST_HW_MODELS_DESC, ids=[m.model_name for m in TEST_HW_MODELS_DESC]) -def test_compressed_graph_models_hw(desc, hw_config_type): - model = desc.model_builder() - config = get_basic_quantization_config_with_hw_config_type( - hw_config_type.value, input_sample_size=desc.input_sample_sizes - ) - input_info = FillerInputInfo.from_nncf_config(config) - compressed_model = NNCFNetwork(model, input_info=input_info) - - quantization_builder = QuantizationBuilder(config, should_init=False) - single_config_quantizer_setup = quantization_builder._get_single_config_quantizer_setup(compressed_model) - sketch_graph = compressed_model.nncf.get_original_graph() - - potential_quantizer_graph = prepare_potential_quantizer_graph(sketch_graph, single_config_quantizer_setup) - path_to_dot = get_full_path_to_the_graph(desc.dot_filename(), _case_dir(hw_config_type.value)) - compare_nx_graph_with_reference(potential_quantizer_graph, path_to_dot, sort_dot_graph=False) - - -def _case_dir(type_hw_config): - graph_dir = os.path.join("quantized", "hw", type_hw_config) - return graph_dir - - def prepare_potential_quantizer_graph(graph: PTNNCFGraph, quantizer_setup: SingleConfigQuantizerSetup) -> nx.DiGraph: quantizers_weights_attr = {} pre_hooked_quantizers_activations_attr: dict[NNCFNodeName, tuple[int, str]] = {} @@ -965,18 +768,3 @@ def prepare_potential_quantizer_graph(graph: PTNNCFGraph, quantizer_setup: Singl nx_graph.add_edge(weight_quantizer_node_key, node_key) return nx_graph - - -def test_output_quantization_with_user_forward(_case_config): - desc = TEST_MODELS_DESC[-1] - model = desc.model_builder() - - input_shape = desc.input_sample_sizes - - config = get_basic_quantization_config(_case_config.quant_type, input_sample_sizes=input_shape) - config["compression"].update({"quantize_outputs": True}) - register_bn_adaptation_init_args(config) - compressed_model, _ = create_compressed_model_and_algo_for_test( - model, config, dummy_forward_fn=desc.dummy_forward_fn, wrap_inputs_fn=desc.wrap_inputs_fn - ) - check_model_graph(compressed_model, "sr_small_model_qoutput.dot", _case_config.graph_dir) diff --git a/tests/torch/test_compression_lr_multiplier.py b/tests/torch/test_compression_lr_multiplier.py deleted file mode 100644 index 57bcccc74ff..00000000000 --- a/tests/torch/test_compression_lr_multiplier.py +++ /dev/null @@ -1,518 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import copy -from typing import Callable - -import pytest -import torch -from torch import nn -from torch.nn import functional as F -from torch.optim import SGD -from torch.utils.data import DataLoader - -from nncf import NNCFConfig -from nncf.torch.layer_utils import CompressionParameter -from tests.torch.helpers import LeNet -from tests.torch.helpers import PTTensorListComparator -from tests.torch.helpers import RandomDatasetMock -from tests.torch.helpers import create_initialized_compressed_model -from tests.torch.helpers import create_random_mock_dataloader -from tests.torch.helpers import get_grads -from tests.torch.helpers import set_torch_seed -from tests.torch.quantization.test_algo_quantization import get_quantization_config_without_range_init - -pytestmark = pytest.mark.legacy - -ALGO_NAME_TO_PATH_MAP = { - "quantization": "nncf.torch.quantization", - "rb_sparsity": "nncf.torch.sparsity.rb", -} - - -def get_quantization_config() -> NNCFConfig: - config = get_quantization_config_without_range_init(LeNet.INPUT_SIZE[-1]) - config["compression"]["initializer"] = {"range": {"num_init_samples": 10}} - return config - - -def get_config_algorithms(config: NNCFConfig) -> list[dict]: - if isinstance(config["compression"], list): - algorithms = config["compression"] - else: - algorithms = [config["compression"]] - return algorithms - - -def add_multiplier_to_config( - config: NNCFConfig, local_multiplier: float = None, global_multiplier: float = None -) -> NNCFConfig: - config = copy.deepcopy(config) - - if local_multiplier is not None: - algorithms = get_config_algorithms(config) - - for algo in algorithms: - algo.update({"compression_lr_multiplier": local_multiplier}) - - if global_multiplier is not None: - config["compression_lr_multiplier"] = global_multiplier - - return config - - -def get_multipliers_from_config(config: NNCFConfig) -> dict[str, float]: - algo_to_multipliers = {} - - algorithms = get_config_algorithms(config) - global_multiplier = config.get("compression_lr_multiplier", 1) - for algo in algorithms: - algo_name = algo["algorithm"] - algo_to_multipliers[algo_name] = algo.get("compression_lr_multiplier", global_multiplier) - - return algo_to_multipliers - - -def merge_configs(configs: list[NNCFConfig], use_algo_list: bool = True) -> NNCFConfig: - res_config = None - algorithms = [] - - for source_config in configs: - source_config = copy.deepcopy(source_config) - - algorithms.extend(get_config_algorithms(source_config)) - del source_config["compression"] - - if res_config is None: - res_config = source_config - res_config.update(source_config) - - if not use_algo_list: - if len(algorithms) > 1: - msg = "If there is more than one algorithm you could use only use_algo_list=True" - raise Exception(msg) - res_config["compression"] = algorithms[0] - else: - res_config["compression"] = algorithms - - res_config["model"] = "merged_model" - return res_config - - -def get_configs_building_params() -> list[dict]: - res = [] - get_orig_config_fns = [get_quantization_config] - num_orig_configs = len(get_orig_config_fns) - - for global_multiplier in [0, 1, 10]: - res.append( - { - "get_orig_config_fns": get_orig_config_fns, - "multipliers": [None] * num_orig_configs, - "global_multiplier": global_multiplier, - "use_algo_list": True, - } - ) - - global_multiplier = 10 - multipliers = [global_multiplier * (1.1**i) for i in range(num_orig_configs)] - - res.append( - { - "get_orig_config_fns": get_orig_config_fns, - "multipliers": multipliers, - "global_multiplier": global_multiplier, - "use_algo_list": True, - } - ) - - for i in range(num_orig_configs): - cur_multipliers = copy.deepcopy(multipliers) - cur_multipliers[i] = None - res.append( - { - "get_orig_config_fns": get_orig_config_fns, - "multipliers": cur_multipliers, - "global_multiplier": None, - "use_algo_list": True, - } - ) - - for get_orig_config_fn in get_orig_config_fns: - for use_algo_list in [False, True]: - for global_multiplier, multiplier in [(11, 10), (11, None), (None, 10)]: - res.append( - { - "get_orig_config_fns": [get_orig_config_fn], - "multipliers": [multiplier], - "global_multiplier": global_multiplier, - "use_algo_list": use_algo_list, - } - ) - - return res - - -def create_initialized_lenet_model_and_dataloader(config: NNCFConfig) -> tuple[nn.Module, DataLoader]: - with set_torch_seed(): - train_loader = create_random_mock_dataloader(config, num_samples=10) - model = LeNet() - for param in model.parameters(): - nn.init.normal_(param) - model = create_initialized_compressed_model(model, config, train_loader) - return model, train_loader - - -@pytest.fixture(name="configs_building_params", params=get_configs_building_params()) -def configs_building_params_(request) -> dict: - return request.param - - -@pytest.fixture(name="ref_configs") -def ref_configs_(configs_building_params: dict) -> list[NNCFConfig]: - return [get_ref_config_fn() for get_ref_config_fn in configs_building_params["get_orig_config_fns"]] - - -@pytest.fixture(name="ref_config") -def ref_config_(ref_configs, configs_building_params) -> NNCFConfig: - return merge_configs(ref_configs, configs_building_params["use_algo_list"]) - - -@pytest.fixture(name="target_configs") -def target_configs_(ref_configs: list[NNCFConfig], configs_building_params: dict) -> list[NNCFConfig]: - return [ - add_multiplier_to_config(config, local_multiplier=multiplier) - for config, multiplier in zip(ref_configs, configs_building_params["multipliers"]) - ] - - -@pytest.fixture(name="target_config") -def target_config_(target_configs: list[NNCFConfig], configs_building_params: dict) -> NNCFConfig: - target_config = merge_configs(target_configs, configs_building_params["use_algo_list"]) - return add_multiplier_to_config(target_config, global_multiplier=configs_building_params["global_multiplier"]) - - -@pytest.fixture(name="get_ref_lenet_model_and_dataloader") -def get_ref_lenet_model_and_dataloader_(ref_config: NNCFConfig) -> Callable[[], tuple[nn.Module, DataLoader]]: - def f(): - return create_initialized_lenet_model_and_dataloader(ref_config) - - return f - - -@pytest.fixture(name="get_target_lenet_model_and_dataloader") -def get_target_lenet_model_and_dataloader_(target_config: NNCFConfig) -> Callable[[], tuple[nn.Module, DataLoader]]: - def f(): - return create_initialized_lenet_model_and_dataloader(target_config) - - return f - - -class OneParameterModel(nn.Module): - INPUT_SIZE = (0,) - - def __init__(self, param): - super().__init__() - self.param = param - - def forward(self, _x): - return self.param.sum() - - -def get_one_parameter_model_creation_params(for_training: bool = False) -> list[dict]: - params = [] - for init_requires_grad in [False, True]: - requires_grad_settings_list = [ - [], - [("attr", False)], - [("attr", True)], - [("fn", False)], - [("fn", True)], - [("attr", not init_requires_grad), ("attr", True)], - [("fn", not init_requires_grad), ("fn", True)], - [("attr", not init_requires_grad), ("fn", True)], - [("fn", not init_requires_grad), ("attr", True)], - ] - - for requires_grad_settings in requires_grad_settings_list: - trainable = init_requires_grad if len(requires_grad_settings) == 0 else requires_grad_settings[-1][1] - if for_training and not trainable: - continue - multipliers = [0.1, 1, 10] if trainable else [0.1] - - for multiplier in multipliers: - params.append( - { - "init_requires_grad": init_requires_grad, - "requires_grad_settings": requires_grad_settings, - "multiplier": multiplier, - } - ) - return params - - -def create_initialized_one_parameter_model_and_dataloader( - parameter_cls: type, - init_requires_grad: bool, - requires_grad_settings: list[tuple[str, bool]], - multiplier: float = None, -) -> [nn.Module, DataLoader]: - with set_torch_seed(): - data = torch.randn(size=(1, 1, 5, 5)) - if parameter_cls is nn.Parameter: - param = parameter_cls(data, requires_grad=init_requires_grad) - elif parameter_cls is CompressionParameter: - param = parameter_cls(data, requires_grad=init_requires_grad, compression_lr_multiplier=multiplier) - else: - msg = f"Unsupported parameter type: {parameter_cls}" - raise Exception(msg) - - for setting_type, requires_grad in requires_grad_settings: - if setting_type == "attr": - param.requires_grad = requires_grad - elif setting_type == "fn": - param.requires_grad_(requires_grad) - else: - msg = f"Unsupported setting type: {setting_type}" - raise Exception(msg) - - model = OneParameterModel(param) - train_loader = DataLoader( - RandomDatasetMock(model.INPUT_SIZE), batch_size=1, shuffle=False, num_workers=0, drop_last=True - ) - return model, train_loader - - -@pytest.fixture(name="get_ref_one_parameter_model_and_dataloader") -def get_ref_one_parameter_model_and_dataloader_( - one_parameter_model_creation_params: dict, -) -> Callable[[], tuple[nn.Module, DataLoader]]: - def f(): - return create_initialized_one_parameter_model_and_dataloader( - nn.Parameter, **one_parameter_model_creation_params - ) - - return f - - -@pytest.fixture(name="get_target_one_parameter_model_and_dataloader") -def get_target_one_parameter_model_and_dataloader_( - one_parameter_model_creation_params: dict, -) -> Callable[[], tuple[nn.Module, DataLoader]]: - def f(): - return create_initialized_one_parameter_model_and_dataloader( - CompressionParameter, **one_parameter_model_creation_params - ) - - return f - - -def perform_model_training_steps(model: nn.Module, train_loader: DataLoader, num_steps: int = 1) -> nn.Module: - with set_torch_seed(): - train_loader = iter(train_loader) - optimizer = SGD(model.parameters(), lr=0.1) - - for _ in range(num_steps): - optimizer.zero_grad() - x, y_gt = next(train_loader) - y = model(x) - loss = F.mse_loss(y.sum(), y_gt) - - loss.backward() - optimizer.step() - - return model - - -def get_params_grouped_by_algorithms(model: nn.Module) -> dict[str, list[nn.Parameter]]: - cls_name_to_params = {} - for module in model.modules(): - params = list(module.parameters(recurse=False)) - full_cls_name = ".".join([module.__class__.__module__, module.__class__.__name__]) - if full_cls_name not in cls_name_to_params: - cls_name_to_params[full_cls_name] = [] - cls_name_to_params[full_cls_name].extend(params) - - algo_name_to_params = {} - for cls_name, params in cls_name_to_params.items(): - params = [param for param in params if param.requires_grad] - if len(params) == 0: - continue - - algo_name = "regular" - for cur_algo_name, cur_algo_path in ALGO_NAME_TO_PATH_MAP.items(): - if cur_algo_path in cls_name: - algo_name = cur_algo_name - - if algo_name not in algo_name_to_params: - algo_name_to_params[algo_name] = [] - algo_name_to_params[algo_name].extend(params) - - return algo_name_to_params - - -def get_lenet_params_after_training_steps( - model: nn.Module, train_loader: DataLoader, num_steps: int = 1 -) -> dict[str, list[nn.Parameter]]: - with set_torch_seed(): - model = perform_model_training_steps(model, train_loader, num_steps) - return get_params_grouped_by_algorithms(model) - - -def get_one_parameter_model_params_after_training_steps( - model: nn.Module, train_loader: DataLoader, num_steps: int = 1 -) -> list[nn.Parameter]: - with set_torch_seed(): - model = perform_model_training_steps(model, train_loader, num_steps) - return list(model.parameters()) - - -def test_if_algorithms_add_params( - get_target_lenet_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], ref_config: NNCFConfig -): - algo_to_params = get_lenet_params_after_training_steps(*get_target_lenet_model_and_dataloader(), num_steps=0) - algo_names = get_multipliers_from_config(ref_config).keys() - - assert sorted(algo_to_params.keys()) == sorted(list(algo_names) + ["regular"]) - - -@pytest.mark.parametrize("one_parameter_model_creation_params", get_one_parameter_model_creation_params()) -def test_if_parameter_is_initialized_correctly( - get_ref_one_parameter_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - get_target_one_parameter_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], -): - ref_model, _ref_loader = get_ref_one_parameter_model_and_dataloader() - target_model, target_loader = get_target_one_parameter_model_and_dataloader() - - assert pytest.approx(ref_model.param.data) == target_model.param.data - assert ref_model.param.requires_grad == target_model.param.requires_grad - - if ref_model.param.requires_grad: - get_one_parameter_model_params_after_training_steps(target_model, target_loader) - else: - with pytest.raises(Exception): - get_one_parameter_model_params_after_training_steps(target_model, target_loader) - - -def check_if_grads_are_multiplied(ref_params: list[nn.Parameter], target_params: list[nn.Parameter], multiplier: float): - ref_grads = get_grads(ref_params) - ref_grads = [multiplier * grad for grad in ref_grads] - target_grads = get_grads(target_params) - - PTTensorListComparator.check_equal(ref_grads, target_grads) - - -def test_if_setting_multipliers_in_config_multiplies_grads_values( - get_ref_lenet_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - get_target_lenet_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - target_config: NNCFConfig, -): - ref_params = get_lenet_params_after_training_steps(*get_ref_lenet_model_and_dataloader()) - target_params = get_lenet_params_after_training_steps(*get_target_lenet_model_and_dataloader()) - multipliers = get_multipliers_from_config(target_config) - multipliers["regular"] = 1 - - for algo, val in ref_params.items(): - check_if_grads_are_multiplied(val, target_params[algo], multipliers[algo]) - - -@pytest.mark.parametrize( - "one_parameter_model_creation_params", get_one_parameter_model_creation_params(for_training=True) -) -def test_if_setting_multiplier_in_parameter_multiplies_grads_values( - get_ref_one_parameter_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - get_target_one_parameter_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - one_parameter_model_creation_params: dict, -): - ref_params = get_one_parameter_model_params_after_training_steps(*get_ref_one_parameter_model_and_dataloader()) - target_params = get_one_parameter_model_params_after_training_steps( - *get_target_one_parameter_model_and_dataloader() - ) - - assert target_params[0].requires_grad - check_if_grads_are_multiplied(ref_params, target_params, one_parameter_model_creation_params["multiplier"]) - - -def check_if_zero_multiplier_freezes_training( - orig_params: list[nn.Parameter], params: list[nn.Parameter], multiplier: float -): - if multiplier == 0: - PTTensorListComparator.check_equal(orig_params, params) - else: - PTTensorListComparator.check_not_equal(orig_params, params) - - -def get_params_diff(orig_params: list[nn.Parameter], params: list[nn.Parameter]) -> list[torch.Tensor]: - param_diffs = [] - for param, orig_param in zip(params, orig_params): - param_diffs.append((param - orig_param).abs()) - return param_diffs - - -def check_params_affect_training_speed( - orig_params: list[nn.Parameter], - ref_params: list[nn.Parameter], - target_params: list[nn.Parameter], - compression_lr_multiplier: float, -): - assert len(ref_params) == len(orig_params) - assert len(target_params) == len(orig_params) - - ref_diff = get_params_diff(ref_params, orig_params) - target_diff = get_params_diff(target_params, orig_params) - - if pytest.approx(compression_lr_multiplier) == 1: - PTTensorListComparator.check_equal(target_diff, ref_diff) - elif compression_lr_multiplier < 1: - PTTensorListComparator.check_less(target_diff, ref_diff) - else: - PTTensorListComparator.check_greater(target_diff, ref_diff) - - -def test_if_setting_multipliers_in_config_affect_training_speed( - get_ref_lenet_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - get_target_lenet_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - target_config: NNCFConfig, -): - orig_params = get_lenet_params_after_training_steps(*get_ref_lenet_model_and_dataloader(), num_steps=0) - target_params = get_lenet_params_after_training_steps(*get_target_lenet_model_and_dataloader(), num_steps=1) - multipliers = get_multipliers_from_config(target_config) - multipliers["regular"] = 1 - - for algo, val in orig_params.items(): - check_if_zero_multiplier_freezes_training(val, target_params[algo], multipliers[algo]) - - -@pytest.mark.parametrize( - "one_parameter_model_creation_params", get_one_parameter_model_creation_params(for_training=True) -) -def test_if_setting_multiplier_in_parameter_affect_training_speed( - get_ref_one_parameter_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - get_target_one_parameter_model_and_dataloader: Callable[[], tuple[nn.Module, DataLoader]], - one_parameter_model_creation_params: dict, -): - orig_params = get_one_parameter_model_params_after_training_steps( - *get_ref_one_parameter_model_and_dataloader(), num_steps=0 - ) - ref_params = get_one_parameter_model_params_after_training_steps( - *get_ref_one_parameter_model_and_dataloader(), num_steps=1 - ) - target_params = get_one_parameter_model_params_after_training_steps( - *get_target_one_parameter_model_and_dataloader(), num_steps=1 - ) - - assert target_params[0].requires_grad - check_if_zero_multiplier_freezes_training( - orig_params, target_params, one_parameter_model_creation_params["multiplier"] - ) - check_params_affect_training_speed( - orig_params, ref_params, target_params, one_parameter_model_creation_params["multiplier"] - ) diff --git a/tests/torch/test_context_independence.py b/tests/torch/test_context_independence.py deleted file mode 100644 index 24d8b75ddd0..00000000000 --- a/tests/torch/test_context_independence.py +++ /dev/null @@ -1,49 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os - -import pytest - -from tests.torch import test_models -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.test_compressed_graph import QUANTIZERS -from tests.torch.test_compressed_graph import QuantizeTestCaseConfiguration -from tests.torch.test_compressed_graph import check_model_graph -from tests.torch.test_compressed_graph import get_basic_quantization_config - -pytestmark = pytest.mark.legacy - -TEST_MODELS = [ - (("alexnet.dot", "lenet.dot"), (test_models.AlexNet, test_models.LeNet), ([1, 3, 32, 32], [1, 3, 32, 32])) -] - - -@pytest.fixture(scope="function", params=QUANTIZERS) -def _case_config(request): - quantization_type = request.param - graph_dir = os.path.join("quantized", quantization_type) - return QuantizeTestCaseConfiguration(quantization_type, graph_dir) - - -@pytest.mark.parametrize("model_name, model_builder, input_size", TEST_MODELS) -def test_context_independence(model_name, model_builder, input_size, _case_config): - config = get_basic_quantization_config(_case_config.quant_type, input_sample_sizes=input_size[0]) - register_bn_adaptation_init_args(config) - - compressed_models = [ - create_compressed_model_and_algo_for_test(model_builder[0](), config)[0], - create_compressed_model_and_algo_for_test(model_builder[1](), config)[0], - ] - - for i, compressed_model in enumerate(compressed_models): - check_model_graph(compressed_model, model_name[i], _case_config.graph_dir) diff --git a/tests/torch/test_custom_modules.py b/tests/torch/test_custom_modules.py deleted file mode 100644 index 24949b07451..00000000000 --- a/tests/torch/test_custom_modules.py +++ /dev/null @@ -1,49 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -import torch -import torch.nn.functional - -from nncf import NNCFConfig -from nncf.torch import register_module -from tests.torch.helpers import create_compressed_model_and_algo_for_test - -pytestmark = pytest.mark.legacy - - -@register_module() -class CustomConvModule(torch.nn.Module): - def __init__(self): - super().__init__() - self.weight = torch.nn.Parameter(torch.ones([1, 1, 1, 1])) - - def forward(self, x): - return torch.nn.functional.conv2d(x, self.weight) - - -class ModelWithCustomConvModules(torch.nn.Module): - def __init__(self): - super().__init__() - self.regular_conv = torch.nn.Conv2d(1, 1, 1) - self.custom_conv = CustomConvModule() - - def forward(self, x): - x = self.regular_conv(x) - x = self.custom_conv(x) - return x - - -def test_custom_module_processing(): - nncf_config = NNCFConfig.from_dict({"input_info": {"sample_size": [1, 1, 1, 1]}}) - - # Should complete successfully without exceptions: - create_compressed_model_and_algo_for_test(ModelWithCustomConvModules(), nncf_config) diff --git a/tests/torch/test_distributed_data_parallel_mode.py b/tests/torch/test_distributed_data_parallel_mode.py deleted file mode 100644 index 0a83010d678..00000000000 --- a/tests/torch/test_distributed_data_parallel_mode.py +++ /dev/null @@ -1,101 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import time - -import pytest -import torch -import torch.multiprocessing as mp -from torch import nn - -from nncf import NNCFConfig -from nncf.torch import create_compressed_model -from nncf.torch import register_default_init_args -from tests.torch.helpers import create_random_mock_dataloader - -pytestmark = pytest.mark.legacy - - -class ModelWithChangedTrain(nn.Module): - def __init__( - self, in_out_channels: tuple[tuple[int, int]] = ((1, 3), (3, 5), (5, 7), (7, 10)), freezing_stages: int = -1 - ): - super().__init__() - self.freezing_stages = freezing_stages - self.features = nn.ModuleList() - for in_out_ch in in_out_channels: - block = nn.ModuleList() - block.append(nn.Conv2d(*in_out_ch, 3)) - block.append(nn.BatchNorm2d(in_out_ch[1])) - block.append(nn.ReLU()) - self.features.append(block) - - def forward(self, x: torch.Tensor) -> torch.Tensor: - for blocks in self.features: - for module in blocks: - x = module(x) - return x - - def train(self: nn.Module, mode: bool = True) -> nn.Module: - super().train(mode) - for i in range(self.freezing_stages): - for module in self.features[i]: - for p in module.parameters(): - p.requires_grad = False - - -def worker(rank: int, world_size: int) -> None: - torch.distributed.init_process_group( - backend="nccl", init_method="tcp://127.0.0.1:8999", world_size=world_size, rank=rank - ) - model = ModelWithChangedTrain(freezing_stages=1) - model.cuda() - model.to(rank) - - nncf_config = NNCFConfig() - nncf_config.update( - { - "input_info": {"sample_size": [1, 1, 30, 30]}, - "compression": { - "algorithm": "quantization", - "initializer": { - "range": {"num_init_samples": 10}, - "batchnorm_adaptation": {"num_bn_adaptation_samples": 10}, - }, - }, - } - ) - dataloader = create_random_mock_dataloader(nncf_config, num_samples=10) - register_default_init_args(nncf_config, dataloader) - - _, compressed_model = create_compressed_model(model, nncf_config) - - # At this part the additional processes may be freezing - - _ = torch.nn.parallel.DistributedDataParallel(compressed_model, device_ids=[rank]) - - -@pytest.mark.cuda -@pytest.mark.parametrize("waiting_time", [20.0]) -def test_is_ddp_freezing(waiting_time: float) -> None: - # Number of processes the same as GPU count - n_procs = torch.cuda.device_count() - ctx = mp.spawn(fn=worker, args=(n_procs,), nprocs=n_procs, join=False) - - start_time = time.monotonic() - while not ctx.join(waiting_time): - current_time = time.monotonic() - if current_time - start_time >= waiting_time: - for process in ctx.processes: - if process.is_alive(): - process.terminate() - msg = "DDP wrapper may be freezing" - raise TimeoutError(msg) diff --git a/tests/torch/test_graph_building.py b/tests/torch/test_graph_building.py deleted file mode 100644 index fdd98b798f6..00000000000 --- a/tests/torch/test_graph_building.py +++ /dev/null @@ -1,656 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from copy import deepcopy -from dataclasses import dataclass -from typing import Any, Union -from unittest.mock import MagicMock - -import pytest -import torch -import torch.nn.functional as F -from torch import nn - -import nncf -from nncf import NNCFConfig -from nncf.common.graph import NNCFGraphEdge -from nncf.common.graph.definitions import MODEL_INPUT_OP_NAME -from nncf.common.graph.definitions import MODEL_OUTPUT_OP_NAME -from nncf.common.graph.definitions import NNCFGraphNodeType -from nncf.common.graph.layer_attributes import Dtype -from nncf.config.structures import BNAdaptationInitArgs -from nncf.config.structures import NNCFExtraConfigStruct -from nncf.config.structures import QuantizationRangeInitArgs -from nncf.torch import create_compressed_model -from nncf.torch.dynamic_graph.context import TracingContext -from nncf.torch.dynamic_graph.context import get_current_context -from nncf.torch.dynamic_graph.context import no_nncf_trace -from nncf.torch.dynamic_graph.graph import DynamicGraph -from nncf.torch.dynamic_graph.graph_tracer import GraphTracer -from nncf.torch.dynamic_graph.graph_tracer import create_dummy_forward_fn -from nncf.torch.dynamic_graph.io_handling import EXTRA_STRUCTS_WITH_DATALOADERS -from nncf.torch.dynamic_graph.io_handling import ExampleInputInfo -from nncf.torch.dynamic_graph.io_handling import FillerInputElement -from nncf.torch.dynamic_graph.io_handling import FillerInputInfo -from nncf.torch.dynamic_graph.io_handling import LoaderInputInfo -from nncf.torch.dynamic_graph.io_handling import ModelInputInfo -from nncf.torch.dynamic_graph.trace_functions import trace_tensors -from nncf.torch.graph.graph_builder import GraphBuilder -from nncf.torch.initialization import PTInitializingDataLoader -from nncf.torch.nested_objects_traversal import objwalk -from nncf.torch.nncf_network import NNCFNetwork -from nncf.torch.nncf_network import NNCFNetworkMeta -from nncf.torch.utils import is_tensor -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.test_compressed_graph import get_basic_quantization_config -from tests.torch.test_get_modules_by_type import ModelForNameTest -from tests.torch.test_models.synthetic import ModelForGraphBuildingTest - -TEST_TRACING_CONTEXT = "test" - - -def test_no_nncf_trace_context_manager(): - assert get_current_context() is None - context = TracingContext() - - with context: - assert get_current_context().is_tracing - with no_nncf_trace(): - assert not get_current_context().is_tracing - with no_nncf_trace(): - assert not get_current_context().is_tracing - assert not get_current_context().is_tracing - assert get_current_context().is_tracing - assert get_current_context() is None - - -def test_ambiguous_function(): - class Model(nn.Module): - def __init__(self): - super().__init__() - self.layers = nn.ModuleList([nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)]) - - def forward(self, x): - for layer in self.layers: - x = F.relu(layer(x)) - - mod = Model() - - tracer = GraphTracer(custom_forward_fn=create_dummy_forward_fn(FillerInputInfo([FillerInputElement([1, 1, 1, 1])]))) - graph = tracer.trace_graph(mod) - - unique_op_exec_contexts = set() - - for node in graph._nx_graph.nodes.values(): - node_op_address = node[DynamicGraph.OP_EXEC_CONTEXT_NODE_ATTR].op_address - assert node_op_address not in unique_op_exec_contexts - unique_op_exec_contexts.add(node_op_address) - - -def test_forward_trace_function(): - from nncf.torch.dynamic_graph.trace_functions import forward_trace_only - from nncf.torch.dynamic_graph.trace_tensor import TensorMeta - from nncf.torch.dynamic_graph.trace_tensor import TracedTensor - - shape1, shape2 = ([32, 1, 4, 8], [1, 8, 12, 16]) - meta1, meta2 = (TensorMeta(5, 1, shape1), TensorMeta(3, 8, shape2)) - input_tensor1 = TracedTensor.from_torch_tensor(torch.Tensor(size=shape1), meta1) - input_tensor2 = TracedTensor.from_torch_tensor(torch.Tensor(size=shape2), meta2) - - # 1 -> 1 - output_tensor = forward_trace_only(torch.Tensor.view, input_tensor1, [-1]) - assert output_tensor.tensor_meta != input_tensor1.tensor_meta - assert output_tensor.tensor_meta.shape == (1024,) - - # 1 -> N - outputs = forward_trace_only(torch.Tensor.chunk, input_tensor1, 3) - for out in outputs: - assert out.tensor_meta == input_tensor1.tensor_meta - - # N -> N (2 -> 2) - outputs = forward_trace_only(lambda x: x + [5], [input_tensor1, input_tensor2]) - assert outputs[0].tensor_meta == input_tensor1.tensor_meta - assert outputs[1].tensor_meta == input_tensor2.tensor_meta - - # M -> N (2 -> 3) - with pytest.raises(nncf.ValidationError): - outputs = forward_trace_only(lambda x: x + [torch.Tensor(shape2)], [input_tensor1, input_tensor2]) - - # M -> N (2 -> 1) - with pytest.raises(nncf.ValidationError): - outputs = forward_trace_only(lambda x: x[0], [input_tensor1, input_tensor2]) - - -@pytest.mark.parametrize("input_shape", ModelForGraphBuildingTest.INPUT_SHAPES) -def test_activation_shape_tracing(input_shape: tuple[int, ...]): - model = ModelForGraphBuildingTest() - graph_builder = GraphBuilder( - create_dummy_forward_fn( - FillerInputInfo( - [ - FillerInputElement(input_shape), - ] - ), - with_input_tracing=True, - with_output_tracing=True, - ) - ) - graph = graph_builder.build_graph(model) - - shape1 = (input_shape[0], ModelForGraphBuildingTest.CONV1_OUT_CHANNELS, input_shape[2], input_shape[3]) - final_shape = (input_shape[0], ModelForGraphBuildingTest.OUT_CHANNELS, input_shape[2], input_shape[3]) - ref_maxpool_out_edge_shapes = [ - ( - shape1[0], - shape1[1], - shape1[2] // ModelForGraphBuildingTest.MAXPOOL_SIZE, - shape1[3] // ModelForGraphBuildingTest.MAXPOOL_SIZE, - ) - ] - ref_cat_out_edge_shapes = [ - (input_shape[0], ModelForGraphBuildingTest.CONV2_IN_CHANNELS, input_shape[2], input_shape[3]) - ] - ref_node_ids_and_io_edge_shapes = [ - (f"0 /{MODEL_INPUT_OP_NAME}_0", [], [input_shape]), - ("1 ModelForGraphBuildingTest/Conv2d[conv1]/conv2d_0", [input_shape], [shape1]), - ("2 ModelForGraphBuildingTest/BatchNorm2d[bn1]/batch_norm_0", [shape1], [shape1]), - ("3 ModelForGraphBuildingTest/ReLU[relu1]/relu_0", [shape1], [shape1, shape1]), - ("4 ModelForGraphBuildingTest/max_pool2d_0", [shape1], ref_maxpool_out_edge_shapes), - ( - "5 ModelForGraphBuildingTest/ConvTranspose2d[convt1]/conv_transpose2d_0", - ref_maxpool_out_edge_shapes, - [input_shape], - ), - ("6 ModelForGraphBuildingTest/cat_0", [shape1, input_shape], ref_cat_out_edge_shapes), - ("7 ModelForGraphBuildingTest/Conv2d[conv2]/conv2d_0", ref_cat_out_edge_shapes, [final_shape]), - (f"8 /{MODEL_OUTPUT_OP_NAME}_0", [final_shape], []), - ] - for node_id, ref_input_shapes, ref_output_shapes in ref_node_ids_and_io_edge_shapes: - input_edges = graph.get_nncf_graph_pattern_io( - [ - node_id, - ] - ).input_edges - output_edges = graph.get_nncf_graph_pattern_io( - [ - node_id, - ] - ).output_edges - input_tensor_shapes = [x.tensor_shape for x in input_edges] - output_tensor_shapes = [x.tensor_shape for x in output_edges] - assert input_tensor_shapes == ref_input_shapes, f"Failed for node ID: {node_id}" - assert output_tensor_shapes == ref_output_shapes, f"Failed for node ID: {node_id}" - - -class ParallelEdgesModel(nn.Module): - def forward(self, x): - mm_res = torch.mm(x, x) - return mm_res, x + mm_res - - -def test_parallel_edges_in_nncf_graph(): - def _get_default_nncf_graph_edge(from_node, to_node, input_port_id, output_port_id): - return NNCFGraphEdge( - from_node, - to_node, - input_port_id=input_port_id, - output_port_id=output_port_id, - tensor_shape=(3, 3), - dtype=Dtype.FLOAT, - parallel_input_port_ids=[], - ) - - input_shape = (3, 3) - model = ParallelEdgesModel() - graph_builder = GraphBuilder( - create_dummy_forward_fn( - FillerInputInfo( - [ - FillerInputElement(input_shape), - ] - ), - with_input_tracing=True, - with_output_tracing=True, - ) - ) - - nncf_graph = graph_builder.build_graph(model) - - input_node = nncf_graph.get_node_by_name("/nncf_model_input_0") - mm_node = nncf_graph.get_node_by_name("ParallelEdgesModel/mm_0") - ref_input_edges = { - _get_default_nncf_graph_edge(input_node, mm_node, input_port_id=0, output_port_id=0), - _get_default_nncf_graph_edge(input_node, mm_node, input_port_id=1, output_port_id=0), - } - mm_node_input_edges = nncf_graph.get_input_edges(mm_node) - assert set(mm_node_input_edges) == ref_input_edges - ref_output_edges = ref_input_edges.copy() - - add_node = nncf_graph.get_node_by_name("ParallelEdgesModel/__add___0") - ref_output_edges.add(_get_default_nncf_graph_edge(input_node, add_node, input_port_id=0, output_port_id=0)) - input_node_output_edges = nncf_graph.get_output_edges(input_node) - assert set(input_node_output_edges) == ref_output_edges - - -class MockModel(torch.nn.Module): - def __init__(self, stub_forward): - super().__init__() - self.param = torch.nn.Parameter(torch.ones([1])) - self.stub_forward = stub_forward - - def forward(self, *args, **kwargs): - return self.stub_forward(*args, **kwargs) - - -class RandomRefTensor: - def __init__(self, shape: list[int]): - self.tensor = torch.rand(shape) - - -def check_arg(test_arg: torch.Tensor, ref_arg: Union[torch.Tensor, RandomRefTensor]): - if isinstance(ref_arg, RandomRefTensor): - assert test_arg.shape == ref_arg.tensor.shape - assert not torch.allclose(test_arg, torch.ones_like(test_arg)) - assert not torch.allclose(test_arg, torch.zeros_like(test_arg)) - else: - assert torch.allclose(test_arg, ref_arg) - - -class MockInputInfo(ModelInputInfo): - MOCK_ARGS = (torch.Tensor([42.0]),) - MOCK_KWARGS = {"foo": torch.ones([1, 3])} - - def get_forward_inputs(self, device: str = None) -> tuple[tuple, dict]: - return MockInputInfo.MOCK_ARGS, MockInputInfo.MOCK_KWARGS - - -@pytest.fixture(scope="function") -def mock_model_with_stub_forward(mocker) -> MockModel: - stub_fn = mocker.stub() - mock_model = MockModel(stub_fn) - return mock_model - - -def test_input_info_args_are_passed_into_forward(mock_model_with_stub_forward: MockModel): - stub_fn = mock_model_with_stub_forward.stub_forward - - _ = NNCFNetwork(mock_model_with_stub_forward, input_info=MockInputInfo()) - forward_call_args = stub_fn.call_args[0] - forward_call_kwargs = stub_fn.call_args[1] - - ref_args, ref_kwargs = MockInputInfo.MOCK_ARGS, MockInputInfo.MOCK_KWARGS - - assert len(forward_call_args) == len(ref_args) - assert len(forward_call_kwargs) == len(ref_kwargs) - assert set(forward_call_kwargs.keys()) == set(ref_kwargs.keys()) - - for idx, arg in enumerate(forward_call_args): - check_arg(arg, ref_args[idx]) - - for keyword, arg in forward_call_kwargs.items(): - check_arg(arg, ref_kwargs[keyword]) - - -@dataclass -class FillerInputInfoGenerationTestStruct: - config_input_info_subdict: Union[list[dict], dict] - ref_args: tuple[torch.Tensor, ...] - ref_kwargs: dict[str, torch.Tensor] - - -TEST_KEYWORD_1 = "keyword1" -TEST_KEYWORD_2 = "keyword2" - -FILLER_GEN_TEST_STRUCTS = [ - FillerInputInfoGenerationTestStruct( - config_input_info_subdict={"sample_size": [2, 3, 300, 300], "type": "float", "filler": "zeros"}, - ref_args=(torch.zeros([2, 3, 300, 300]),), - ref_kwargs={}, - ), - FillerInputInfoGenerationTestStruct( - config_input_info_subdict=[ - {"sample_size": [1, 128], "type": "long", "filler": "ones"}, - {"sample_size": [1, 128], "type": "long", "filler": "ones"}, - {"sample_size": [1, 128], "type": "long", "filler": "zeros"}, - ], - ref_args=( - torch.ones([1, 128], dtype=torch.long), - torch.ones([1, 128], dtype=torch.long), - torch.zeros([1, 128], dtype=torch.long), - ), - ref_kwargs={}, - ), - FillerInputInfoGenerationTestStruct( - config_input_info_subdict=[ - {"sample_size": [2, 3, 300, 300], "type": "float", "filler": "zeros"}, - {"sample_size": [1, 128], "type": "long", "filler": "ones", "keyword": TEST_KEYWORD_1}, - ], - ref_args=(torch.zeros([2, 3, 300, 300]),), - ref_kwargs={TEST_KEYWORD_1: torch.ones([1, 128], dtype=torch.long)}, - ), - FillerInputInfoGenerationTestStruct( - config_input_info_subdict=[ - {"sample_size": [8, 7], "type": "float", "filler": "random", "keyword": TEST_KEYWORD_1}, - {"sample_size": [2, 3, 300, 300], "type": "float", "filler": "zeros"}, - {"sample_size": [1, 128], "type": "long", "filler": "ones", "keyword": TEST_KEYWORD_2}, - ], - ref_args=(torch.zeros([2, 3, 300, 300]),), - ref_kwargs={TEST_KEYWORD_1: RandomRefTensor([8, 7]), TEST_KEYWORD_2: torch.ones([1, 128], dtype=torch.long)}, - ), -] - - -@pytest.mark.parametrize("filler_gen_test_struct", FILLER_GEN_TEST_STRUCTS) -def test_filler_input_info_arg_generation(filler_gen_test_struct: FillerInputInfoGenerationTestStruct): - filler_input_info = FillerInputInfo.from_nncf_config( - NNCFConfig.from_dict({"input_info": filler_gen_test_struct.config_input_info_subdict}) - ) - test_args, test_kwargs = filler_input_info.get_forward_inputs() - - for test_arg, ref_arg in zip(test_args, filler_gen_test_struct.ref_args): - check_arg(test_arg, ref_arg) - - for test_kw_and_arg, ref_kw_and_arg in zip(test_kwargs.items(), filler_gen_test_struct.ref_kwargs.items()): - test_kw, test_kwarg = test_kw_and_arg - ref_kw, ref_kwarg = ref_kw_and_arg - assert test_kw == ref_kw - check_arg(test_kwarg, ref_kwarg) - - -@pytest.mark.parametrize( - "input_info", - [ - FillerInputInfo([FillerInputElement([1, 3, 3, 3])]), - ExampleInputInfo((torch.Tensor([1]), torch.Tensor([1])), {"a": torch.Tensor([1]), "b": torch.Tensor([1])}), - LoaderInputInfo((torch.Tensor([1]), torch.Tensor([1])), {"a": torch.Tensor([1]), "b": torch.Tensor([1])}), - ], - ids=["filler", "example", "loader"], -) -def test_input_infos_respect_device_setting(input_info: ModelInputInfo, use_cuda: bool): - if use_cuda and not torch.cuda.is_available(): - pytest.skip("Skipped checking CUDA device test cases on CPU-only hosts") - device = "cuda" if use_cuda else "cpu" - forward_inputs = input_info.get_forward_inputs(device) - - def assert_on_device(x: torch.Tensor): - assert device in str(x.device) - - objwalk(forward_inputs, is_tensor, assert_on_device) - - -class MockInitDataLoader(PTInitializingDataLoader): - def get_inputs(self, dataloader_output: Any) -> tuple[tuple, dict]: - return dataloader_output[0], dataloader_output[1] - - def get_target(self, dataloader_output: Any) -> Any: - return torch.ones([1]) - - -class MockDataset(torch.utils.data.Dataset): - def __init__(self): - super().__init__() - self._length = 2 - - def __getitem__(self, idx): - if idx >= self._length: - raise StopIteration - return MockInputInfo.MOCK_ARGS, MockInputInfo.MOCK_KWARGS - - def __len__(self): - return self._length - - -STRUCTS_FOR_TEST = [ - QuantizationRangeInitArgs(data_loader=MockInitDataLoader(torch.utils.data.DataLoader(MockDataset()))), - BNAdaptationInitArgs(data_loader=MockInitDataLoader(torch.utils.data.DataLoader(MockDataset()))), -] - - -@pytest.mark.parametrize("extra_struct_for_test", STRUCTS_FOR_TEST) -def test_compressed_model_creation_can_build_exact_input_infos_from_dataloader_in_config( - extra_struct_for_test: NNCFExtraConfigStruct, mock_model_with_stub_forward: MockModel, mocker -): - checked_types_set = {type(x) for x in STRUCTS_FOR_TEST} - assert checked_types_set == set( - EXTRA_STRUCTS_WITH_DATALOADERS - ) # all future structs with suitable dataloaders must be tested - config = NNCFConfig() - config.register_extra_structs([extra_struct_for_test]) - nncf_network_init_spy = mocker.spy(NNCFNetworkMeta, "__call__") # sic! - - _ = create_compressed_model(mock_model_with_stub_forward, config) - input_info_received_by_nncf_network_init = nncf_network_init_spy.call_args.kwargs["input_info"] # input_info - assert isinstance(input_info_received_by_nncf_network_init, LoaderInputInfo) - test_args, test_kwargs = input_info_received_by_nncf_network_init.get_forward_inputs() - - for idx, arg in enumerate(test_args): - check_arg(arg, MockInputInfo.MOCK_ARGS[idx]) - - for keyword, arg in test_kwargs.items(): - check_arg(arg, MockInputInfo.MOCK_KWARGS[keyword]) - - -def create_model_and_control_with_defaults(): - model = ModelForGraphBuildingTest() - config = get_basic_quantization_config( - "symmetric", input_sample_sizes=list(ModelForGraphBuildingTest.INPUT_SHAPES[0]) - ) - register_bn_adaptation_init_args(config) - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - return compressed_model, compression_ctrl - - -def create_model_with_user_dummy(): - model = ModelForGraphBuildingTest() - config = get_basic_quantization_config( - "symmetric", input_sample_sizes=list(ModelForGraphBuildingTest.INPUT_SHAPES[0]) - ) - register_bn_adaptation_init_args(config) - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - model, - config, - dummy_forward_fn=ModelForGraphBuildingTest.simple_user_dummy_forward, - wrap_inputs_fn=ModelForGraphBuildingTest.simple_wrap_fn, - ) - return compressed_model, compression_ctrl - - -def create_model_with_user_wrap_inputs_fn(): - model = ModelForGraphBuildingTest() - config = get_basic_quantization_config( - "symmetric", input_sample_sizes=list(ModelForGraphBuildingTest.INPUT_SHAPES[0]) - ) - register_bn_adaptation_init_args(config) - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test( - model, - config, - dummy_forward_fn=ModelForGraphBuildingTest.simple_user_dummy_forward, - wrap_inputs_fn=ModelForGraphBuildingTest.simple_wrap_fn, - ) - return compressed_model, compression_ctrl - - -class TestGraphStability: - MODEL_CREATORS_AND_IDS = [ - (create_model_and_control_with_defaults, "default"), - (create_model_with_user_dummy, "user_dummy"), - (create_model_with_user_wrap_inputs_fn, "user_wrap_inputs_fn"), - ] - - @pytest.fixture(params=[x[0] for x in MODEL_CREATORS_AND_IDS], ids=[x[1] for x in MODEL_CREATORS_AND_IDS]) - def model_and_ctrl_creator(self, request): - return request.param - - def test_dynamic_graph_does_not_inflate_during_multiple_forwards(self, model_and_ctrl_creator): - compressed_model, _ = model_and_ctrl_creator() - input_tensor = torch.zeros(ModelForGraphBuildingTest.INPUT_SHAPES[0]) - ref_graph = deepcopy(compressed_model.nncf.get_dynamic_graph()) - for _ in range(0, 10): - _ = compressed_model(input_tensor) - curr_graph = compressed_model.nncf.get_dynamic_graph() - assert curr_graph == ref_graph - - def test_dynamic_graph_is_the_same_after_export(self, model_and_ctrl_creator, tmp_path): - compressed_model, ctrl = model_and_ctrl_creator() - ref_graph = deepcopy(compressed_model.nncf.get_dynamic_graph()) - ctrl.export_model("tmp.onnx") - curr_graph = compressed_model.nncf.get_dynamic_graph() - assert curr_graph == ref_graph - - def test_dummy_forwards_do_not_inflate_dynamic_graph(self, model_and_ctrl_creator): - compressed_model, _ = model_and_ctrl_creator() - ref_graph = deepcopy(compressed_model.nncf.get_dynamic_graph()) - compressed_model.nncf.do_dummy_forward() - curr_graph = deepcopy(compressed_model.nncf.get_dynamic_graph()) - assert curr_graph == ref_graph - - def test_compressed_graph_with_user_wrap_fn(self): - # Create a model with a dummy forward analogous to - # the default dummy forward, compare original and compressed model graphs afterwards - - comp_model_wo_wrap, _ = create_model_and_control_with_defaults() - comp_model_w_wrap, _ = create_model_with_user_wrap_inputs_fn() - - ref_original_graph = comp_model_wo_wrap.nncf.get_graph() - ref_compressed_graph = comp_model_wo_wrap.nncf.get_graph() - - original_graph_with_wrap = comp_model_w_wrap.nncf.get_graph() - compressed_graph_with_wrap = comp_model_w_wrap.nncf.get_graph() - - assert ref_original_graph == original_graph_with_wrap - assert ref_compressed_graph == compressed_graph_with_wrap - - def test_compressed_graph_with_user_dummy_forward(self): - # Create a model with a dummy forward analogous to - # the default dummy forward, compare original and compressed model graphs afterwards - - comp_model_wo_dummy, _ = create_model_and_control_with_defaults() - comp_model_w_dummy, _ = create_model_with_user_dummy() - - ref_original_graph = comp_model_wo_dummy.nncf.get_graph() - ref_compressed_graph = comp_model_wo_dummy.nncf.get_graph() - - original_graph_with_dummy = comp_model_w_dummy.nncf.get_graph() - compressed_graph_with_dummy = comp_model_w_dummy.nncf.get_graph() - - assert ref_original_graph == original_graph_with_dummy - assert ref_compressed_graph == compressed_graph_with_dummy - - -def test_nncf_graph_auxiliary_node_structure(): - model = ModelForGraphBuildingTest() - config = get_basic_quantization_config( - "symmetric", input_sample_sizes=list(ModelForGraphBuildingTest.INPUT_SHAPES[0]) - ) - register_bn_adaptation_init_args(config) - compressed_model, _ = create_compressed_model_and_algo_for_test(model, config) - - nncf_graph = compressed_model.nncf.get_graph() - - input_nodes = nncf_graph.get_input_nodes() - output_nodes = nncf_graph.get_output_nodes() - - assert len(input_nodes) == 1 - assert len(output_nodes) == 1 - - assert input_nodes[0].node_type == NNCFGraphNodeType.INPUT_NODE - assert output_nodes[0].node_type == NNCFGraphNodeType.OUTPUT_NODE - - -def test_get_all_nodes(): - model = ModelForNameTest() - ref_list = [ - "ModelForNameTest/Conv2d[conv1]/conv2d_0", - "ModelForNameTest/BatchNorm2d[bn1]/batch_norm_0", - "ModelForNameTest/ReLU/relu_0", - "ModelForNameTest/relu_0", - "ModelForNameTest/BatchNorm2d[bn2]/batch_norm_0", - "ModelForNameTest/Sequential[layer2]/Sequential[layer1]/Conv2d[conv01]/conv2d_0", - "ModelForNameTest/Sequential[layer2]/Sequential[layer1]/BatchNorm2d[norm01]/batch_norm_0", - "ModelForNameTest/Sequential[layer2]/Sequential[layer1]/ReLU[relu01]/relu_0", - "ModelForNameTest/Sequential[layer2]/Sequential[layer1]/MaxPool2d[pool01]/max_pool2d_0", - "ModelForNameTest/Sequential[layer2]/Conv2d[conv02]/conv2d_0", - "ModelForNameTest/Sequential[layer2]/ReLU[relu02]/relu_0", - "ModelForNameTest/Sequential[layer2]/BatchNorm2d[norm02]/batch_norm_0", - "ModelForNameTest/Sequential[layer2]/MaxPool2d[pool02]/max_pool2d_0", - "ModelForNameTest/AvgPool2d[avgpool]/avg_pool2d_0", - ] - - builder = GraphBuilder( - create_dummy_forward_fn( - FillerInputInfo( - [ - FillerInputElement((1, 1, 4, 4)), - ] - ) - ) - ) - graph = builder.build_graph(model) - test_list = [node_name.split(" ", 1)[1] for node_name in graph.get_all_node_keys()] - assert ref_list == test_list - - -class ModelWithIntegerPaths(torch.nn.Module): - INPUT_SHAPE = [2, 2, 2, 2] - - def __init__(self): - super().__init__() - self.conv1 = torch.nn.Conv2d(2, 2, 1) - self.linear = torch.nn.Linear(2, 2) - - def forward(self, x: torch.Tensor): - x = self.conv1(x) - sz = torch.tensor(x.shape).to(x.device) - sz_tensor = torch.cat([sz]) - idx_tensor = sz_tensor // sz_tensor - single_idx = idx_tensor[0] - y = x[single_idx][single_idx] * torch.ones([1, 1]).to(x.device) - z = self.linear(y) - return z - - -def test_integer_path_marking(): - input_infos = FillerInputInfo( - [ - FillerInputElement(ModelWithIntegerPaths.INPUT_SHAPE), - ] - ) - builder = GraphBuilder(create_dummy_forward_fn(input_infos)) - nncf_graph = builder.build_graph(ModelWithIntegerPaths()) - edges = list(nncf_graph.get_all_edges()) - num_integer_edges = sum([1 for edge in edges if edge.dtype is Dtype.INTEGER]) - # cat -> __floordiv__, __floordiv__ -> __getitem__0 (to get single_idx), - # __getitem__0 -> __getitem__1 (first indexing by tensor), __getitem__0 -> __getitem__2 (second indexing by tensor) - assert num_integer_edges == 4 - - -def test_trace_output_with_no_tensors(): - output = None - with TracingContext() as ctx: - trace_tensors(output, MagicMock(), ctx) - - -class ModelWithRepeatInputs(torch.nn.Module): - def forward(self, x): - y = x * 2 - return torch.stack([x, y, x, y]) - - -def test_dynamic_graph_assigns_contiguous_input_ports_for_edges_with_multiplicity(): - input_info = FillerInputInfo([FillerInputElement([1, 3, 3, 3])]) - tracer = GraphTracer(create_dummy_forward_fn(input_info, with_input_tracing=True, with_output_tracing=True)) - dynamic_graph = tracer.trace_graph(ModelWithRepeatInputs()) - stack_in_edges = [e for e in dynamic_graph.get_all_edges() if e.to_node_id == 2] # node id 2 == torch.stack - all_input_port_ids = set() - for edge in stack_in_edges: - all_input_port_ids.add(edge.input_port_id) - all_input_port_ids.update(edge.parallel_input_port_ids) - assert all_input_port_ids == {0, 1, 2, 3} diff --git a/tests/torch/test_input_management.py b/tests/torch/test_input_management.py index e1ef7a6259b..6e10313ab62 100644 --- a/tests/torch/test_input_management.py +++ b/tests/torch/test_input_management.py @@ -22,9 +22,6 @@ from nncf.torch.dynamic_graph.io_handling import ModelInputInfo from tests.torch.helpers import MockModel from tests.torch.helpers import ModelWithReloadedForward -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.test_compressed_graph import get_basic_quantization_config pytestmark = pytest.mark.legacy @@ -184,26 +181,6 @@ def forward(self, x, y): return torch.cat([x, y]) -def test_same_input_tensor_replication(mocker): - config = get_basic_quantization_config( - input_info=[ - {"sample_size": [1, 1]}, - {"sample_size": [1, 1]}, - ] - ) - register_bn_adaptation_init_args(config) - model = CatModel() - model, _ = create_compressed_model_and_algo_for_test(model, config) - - test_tensor = torch.ones([1, 1]) - clone_spy = mocker.spy(test_tensor, "clone") - cat_spy = mocker.spy(torch, "cat") - _ = model(test_tensor, test_tensor) - assert clone_spy.call_count == 1 - cat_arg = cat_spy.call_args[0][0] - assert cat_arg[0] is not cat_arg[1] - - @pytest.mark.parametrize("use_kwargs", (False, True)) def test_reloaded_forward_inputs_wrapping(use_kwargs, mocker): """ diff --git a/tests/torch/test_no_compression_algorithm.py b/tests/torch/test_no_compression_algorithm.py deleted file mode 100644 index 0371218a4b3..00000000000 --- a/tests/torch/test_no_compression_algorithm.py +++ /dev/null @@ -1,33 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest - -from nncf import NNCFConfig -from tests.torch.helpers import PTTensorListComparator -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test - -pytestmark = pytest.mark.legacy - -EPS = 1e-9 -INPUT_SIZE = [1, 4, 4] - -NO_COMPRESSION_NNCF_CONFIG = NNCFConfig({"model": "basic_config", "input_info": {"sample_size": [1] + INPUT_SIZE}}) - - -def test_no_compression_algo_not_change_model_params(): - orig_model = TwoConvTestModel() - model, _algo = create_compressed_model_and_algo_for_test(orig_model, NO_COMPRESSION_NNCF_CONFIG) - - orig_model_state = orig_model.state_dict() - model_state = model.state_dict() - PTTensorListComparator.check_equal(list(orig_model_state.values()), list(model_state.values())) diff --git a/tests/torch/test_no_nncf_trace_patching.py b/tests/torch/test_no_nncf_trace_patching.py deleted file mode 100644 index 3fc0bb7db1c..00000000000 --- a/tests/torch/test_no_nncf_trace_patching.py +++ /dev/null @@ -1,62 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -import torch -from torch import nn -from torch.nn import functional as F - -from nncf.torch import create_compressed_model -from nncf.torch import disable_tracing -from nncf.torch.dynamic_graph.context import get_current_context -from tests.torch.helpers import create_conv -from tests.torch.helpers import get_empty_config - -pytestmark = pytest.mark.legacy - - -class SimpleModel(nn.Module): - """ - A test model with an operation resulting in an ambiguous graph. - Ambiguous operation output is put into the model output for testing convenience. - """ - - def __init__(self, correct_is_tracing): - super().__init__() - self.conv = create_conv(in_channels=1, out_channels=1, kernel_size=2) - self.correct_is_tracing = correct_is_tracing - - def forward(self, x): - x = F.sigmoid(self.conv(x - 0.5)) - output = self.ambiguous_op(x) - return x, output - - def ambiguous_op(self, x): - assert get_current_context().is_tracing == self.correct_is_tracing - - output = torch.zeros_like(x) - for _ in range(torch.greater(x, 0.5).sum()): - output = output + x - return output - - -def test_no_trace_model_patching(): - config = get_empty_config() - config["input_info"] = {"sample_size": [1, 1, 4, 4], "filler": "random"} - - # Not patching anything: all output nodes are traced - _, compressed_model = create_compressed_model(SimpleModel(True), config) - assert len(compressed_model.nncf.get_original_graph().get_output_nodes()) == 2 - - # Patching a function results with no_nncf_trace in method not producing an output node - disable_tracing(SimpleModel.ambiguous_op) - _, compressed_model = create_compressed_model(SimpleModel(False), get_empty_config()) - assert len(compressed_model.nncf.get_original_graph().get_output_nodes()) == 1 diff --git a/tests/torch/test_onnx_export.py b/tests/torch/test_onnx_export.py deleted file mode 100644 index fe4dc911111..00000000000 --- a/tests/torch/test_onnx_export.py +++ /dev/null @@ -1,200 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import Any - -import onnx -import pytest -import torch -from torch import nn - -from nncf import NNCFConfig -from nncf.torch import patch_torch_operators -from nncf.torch.dynamic_graph.patch_pytorch import unpatch_torch_operators -from nncf.torch.exporter import PTExporter -from tests.torch.helpers import MockModel -from tests.torch.helpers import create_bn -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_conv -from tests.torch.helpers import get_nodes_by_type -from tests.torch.helpers import load_exported_onnx_version -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.test_compressed_graph import SingleLayerModelDesc -from tests.torch.test_compressed_graph import get_basic_quantization_config - -pytestmark = pytest.mark.legacy - - -class ModelForIONamingTest(torch.nn.Module): - def __init__(self): - super().__init__() - self.conv = torch.nn.Conv1d(1, 1, 1) - self.linear = torch.nn.Linear(1, 1) - self.embedding = torch.nn.Embedding(1, 1) - - def forward(self, conv_input, linear_input, embedding_input): - return [ - self.conv(conv_input), - {"linear": self.linear(linear_input), "embedding": self.embedding(embedding_input)}, - ] - - -def test_io_nodes_naming_scheme(tmp_path): - config = NNCFConfig.from_dict( - { - "input_info": [ - { - "sample_size": [1, 1, 1], - }, - { - "sample_size": [1, 1], - }, - {"sample_size": [1, 1], "type": "long", "filler": "zeros"}, - ] - } - ) - onnx_model_proto = load_exported_onnx_version(config, ModelForIONamingTest(), tmp_path) - conv_node = next(iter(get_nodes_by_type(onnx_model_proto, "Conv"))) - linear_node = next(iter(get_nodes_by_type(onnx_model_proto, "Gemm"))) - embedding_node = next(iter(get_nodes_by_type(onnx_model_proto, "Gather"))) - - for idx, node in enumerate([conv_node, linear_node, embedding_node]): - input_tensor_ids = [x for x in node.input if "input" in x] - assert len(input_tensor_ids) == 1 - assert input_tensor_ids[0] == f"input.{idx}" - - assert len(node.output) == 1 - assert node.output[0] == f"output.{idx}" - - -@pytest.mark.parametrize( - "save_format, refs", - ( - ("onnx", ("onnx", {"opset_version": PTExporter._ONNX_DEFAULT_OPSET})), - ("onnx_9", ("onnx", {"opset_version": 9})), - ("onnx_10", ("onnx", {"opset_version": 10})), - ("onnx_11", ("onnx", {"opset_version": 11})), - ("onnx_0", ValueError), - ("onnx_onnx", ValueError), - ), -) -def test_exporter_parser_format(save_format: str, refs: Any): - try: - save_format, args = PTExporter.parse_format(save_format) - except Exception as e: - if not isinstance(refs, tuple): - assert isinstance(e, refs) - return - - assert save_format == refs[0] - assert args == refs[1] - - -@pytest.mark.parametrize( - "save_format, ref_opset", (("onnx", 14), ("onnx_9", 9), ("onnx_10", 10), ("onnx_11", 11), ("onnx_13", 13)) -) -def test_exported_version(tmp_path: str, save_format: str, ref_opset: int): - model = MockModel() - config = NNCFConfig() - config.update({"input_info": {"sample_size": [1, 1, 1, 1]}}) - - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - onnx_checkpoint_path = tmp_path / "model.onnx" - compression_ctrl.export_model(onnx_checkpoint_path, save_format) - model_proto = onnx.load_model(onnx_checkpoint_path) - - assert model_proto.opset_import[0].version == ref_opset - - -class MultiParamForwardModel(torch.nn.Module): - def forward(self, param1, param2, param3=None): - return param1, param2 - - -def test_can_export_single_batch_bn(tmp_path): - test_path = tmp_path.joinpath("test.onnx") - synthetic_model_desc = SingleLayerModelDesc(layer=nn.BatchNorm2d(4), input_sample_sizes=([1, 4, 1, 1])) - config = get_basic_quantization_config( - input_sample_sizes=synthetic_model_desc.get_input_sample_sizes(), - input_info=synthetic_model_desc.create_input_info(), - ) - register_bn_adaptation_init_args(config) - model = synthetic_model_desc.get_model() - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - compression_ctrl.export_model(str(test_path)) - assert test_path.exists() - - -def test_can_export_with_model_args(tmp_path): - # Torch now parses the function signature and sets up default parameters for unprovided - # arguments on its own. Need to rethink and possibly deprecate model_args parameter. - test_path = tmp_path.joinpath("test.onnx") - model = MultiParamForwardModel() - config = get_basic_quantization_config(input_info=[{"sample_size": [1, 1, 1, 1]}, {"sample_size": [1, 1, 1, 1]}]) - register_bn_adaptation_init_args(config) - _, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - compression_ctrl.export_model(str(test_path), model_args=({"param3": 42},)) - assert test_path.exists() - - -class LinearTestModel(nn.Module): - def __init__(self): - super().__init__() - self.conv1 = create_conv(3, 3, 1) - self.bn1 = create_bn(3) - self.relu = nn.ReLU() - self.avg_pool = nn.AdaptiveAvgPool2d(1) - self.conv2 = create_conv(3, 1, 1) - self.bn2 = create_bn(1) - - def forward(self, x): - # input_shape = [1, 3, 32, 32] - x = self.relu(self.conv1(x)) - x = self.bn1(x) - x = self.avg_pool(x) - x = self.relu(self.conv2(x)) - x = self.bn2(x) - return x - - -@pytest.mark.parametrize( - "compression_section", - [{}, {"compression": {"algorithm": "quantization"}}], - ids=["none", "quantization"], -) -def test_preserves_onnx_node_name_format(tmp_path, compression_section): - model = LinearTestModel() - model.eval().cpu() - try: - unpatch_torch_operators() - without_nncf_path = tmp_path / "without_nncf.onnx" - torch.onnx.export( - model, - torch.ones([1, 3, 32, 32]), - without_nncf_path, - export_params=True, - opset_version=13, - do_constant_folding=False, - dynamo=False, - ) - original_model_proto = onnx.load_model(str(without_nncf_path)) - patch_torch_operators() - - config = NNCFConfig.from_dict({"input_info": {"sample_size": [1, 3, 32, 32]}, **compression_section}) - compressed_model_proto = load_exported_onnx_version(config, model, tmp_path) - - compressed_model_onnx_node_names = {node.name for node in compressed_model_proto.graph.node} - for node in original_model_proto.graph.node: - if not node.name.startswith("Identity_"): - # Since torch==2.2.0 identity nodes have different indexes - assert node.name in compressed_model_onnx_node_names - finally: - patch_torch_operators() diff --git a/tests/torch/test_pytorch_patch.py b/tests/torch/test_pytorch_patch.py index 5c90c3bd16d..efec4047c07 100644 --- a/tests/torch/test_pytorch_patch.py +++ b/tests/torch/test_pytorch_patch.py @@ -9,18 +9,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -import inspect -import os import pytest import torch -from nncf.common.utils.os import is_windows -from nncf.config import NNCFConfig from nncf.torch import wrap_model from nncf.torch.dynamic_graph.context import TracingContext from nncf.torch.dynamic_graph.context import get_current_context -from nncf.torch.dynamic_graph.patch_pytorch import _ORIG_JIT_SCRIPT from nncf.torch.dynamic_graph.patch_pytorch import MagicFunctionsToPatch from nncf.torch.dynamic_graph.patch_pytorch import disable_patching from nncf.torch.dynamic_graph.patch_pytorch_state import PATCHING_STATE @@ -28,13 +23,6 @@ from nncf.torch.dynamic_graph.trace_tensor import TensorMeta from nncf.torch.dynamic_graph.trace_tensor import TracedTensor from nncf.torch.graph.operator_metatypes import PT_OPERATOR_METATYPES -from tests.cross_fw.shared.isolation_runner import run_pytest_case_function_in_separate_process -from tests.torch.helpers import BasicConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.pytorch_patch_isolated import test_compile -from tests.torch.pytorch_patch_isolated import test_jit_if_tracing_script_source_equals -from tests.torch.pytorch_patch_isolated import test_jit_script_exception_preserves_patching_isolated pytestmark = pytest.mark.legacy @@ -104,94 +92,6 @@ def forward(self, x: torch.Tensor): torch.onnx.export(TestModel(), (torch.zeros((1,)),), str(tmp_path / "jit_if_tracing_test_model.onnx"), dynamo=False) -def test_jit_if_tracing_script_source(): - # Run test case in a separate process to track patching of torch by NNCF - run_pytest_case_function_in_separate_process(test_jit_if_tracing_script_source_equals) - - -def test_jit_script_exception_preserves_patching(): - # Run test case in a separate process to track patching of torch by NNCF - run_pytest_case_function_in_separate_process(test_jit_script_exception_preserves_patching_isolated) - - -@pytest.mark.skipif(is_windows(), reason="https://github.com/pytorch/pytorch/issues/122094") -@pytest.mark.parametrize("compile_forward", [False, True]) -def test_torch_compile(compile_forward): - # Run test case in a separate process to track patching of torch by NNCF - os.environ["COMPILE_FORWARD"] = f"{int(compile_forward)}" - run_pytest_case_function_in_separate_process(test_compile) - - -def test_torch_compile_on_nncf_model(): - # Calling torch.compile on a regular torch model should work fine - model = BasicConvTestModel() - compiled_model = torch.compile(model) - compiled_model(torch.ones(model.INPUT_SIZE)) - - model = BasicConvTestModel() - config = get_test_quantization_config(model) - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - with pytest.raises( - TypeError, match="At the moment torch\\.compile\\(\\) is not supported for models optimized by NNCF\\." - ): - torch.compile(compressed_model) - - stripped_model = compression_ctrl.strip() - with pytest.raises( - TypeError, match="At the moment torch\\.compile\\(\\) is not supported for models optimized by NNCF\\." - ): - torch.compile(stripped_model) - - with pytest.raises( - TypeError, match="At the moment torch\\.compile\\(\\) is not supported for models optimized by NNCF\\." - ): - # Compiling this model would actually work, but inference of the compiled model will fail - torch.compile(model) - - -def test_jit_script_signature(): - # Check that torch.jit.script has the same signature as the wrapper was designed for - signature = inspect.signature(_ORIG_JIT_SCRIPT) - assert "obj" in signature.parameters and "_rcb" in signature.parameters and "_frames_up" in signature.parameters - - -def test_jit_script_class(): - # Define an outside function to test custom resolution callback inside torch_jit_script_wrapper - def outside_function(x): - return x + torch.tensor(1.0) - - class TestClass: - def class_method(self, x): - return outside_function(x) - - # Scripting a class instead of a method to trigger custom resolution callback usage - torch.jit.script(TestClass) - - -def test_jit_trace_model(): - model = BasicConvTestModel() - config = get_test_quantization_config(model) - - compressed_model, compression_ctrl = create_compressed_model_and_algo_for_test(model, config) - torch.jit.trace(compressed_model, example_inputs=torch.rand(model.INPUT_SIZE)) - - model = compression_ctrl.strip() - torch.jit.trace(model, example_inputs=torch.rand(model.INPUT_SIZE)) - - -def get_test_quantization_config(model): - config = NNCFConfig() - config.update( - { - "model": "model", - "input_info": {"sample_size": model.INPUT_SIZE}, - "compression": {"algorithm": "quantization"}, - } - ) - register_bn_adaptation_init_args(config) - return config - - def test_operator_unpatching(): unwrapped_operator_expected = False diff --git a/tests/torch/test_resume_from_checkpoint.py b/tests/torch/test_resume_from_checkpoint.py deleted file mode 100644 index c78711c57a9..00000000000 --- a/tests/torch/test_resume_from_checkpoint.py +++ /dev/null @@ -1,360 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import itertools -from copy import deepcopy -from functools import partial - -import pytest -import torch -import torchvision -from torch.nn import DataParallel - -import nncf -from nncf.common.logging import nncf_logger -from nncf.common.quantization.structs import NonWeightQuantizerId -from nncf.common.quantization.structs import WeightQuantizerId -from nncf.torch import load_state -from nncf.torch import register_default_init_args -from nncf.torch.compression_method_api import PTCompressionAlgorithmBuilder -from nncf.torch.quantization.algo import QuantizationBuilder -from nncf.torch.utils import safe_thread_call -from tests.torch.helpers import BasicConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test -from tests.torch.helpers import create_ones_mock_dataloader -from tests.torch.helpers import get_empty_config -from tests.torch.helpers import register_bn_adaptation_init_args -from tests.torch.quantization.quantization_helpers import get_quantization_config_without_range_init -from tests.torch.test_models.synthetic import AddTwoConv - -pytestmark = pytest.mark.legacy - - -def load_model(model, pretrained=True, num_classes=1000, model_params=None) -> torch.nn.Module: - if model_params is None: - model_params = {} - if model in torchvision.models.__dict__: - load_model_fn = partial( - torchvision.models.__dict__[model], num_classes=num_classes, pretrained=pretrained, **model_params - ) - else: - msg = "Undefined model name" - raise Exception(msg) - loaded_model = safe_thread_call(load_model_fn) - - return loaded_model - - -class BitwidthDistributionStatistics: - def __init__(self, num_wq_per_bitwidth, num_aq_per_bitwidth): - self.num_wq_per_bitwidth = num_wq_per_bitwidth - self.num_aq_per_bitwidth = num_aq_per_bitwidth - - -class PrecisionInitTestDesc: - def __init__(self): - self.model_creator = AddTwoConv - config = get_quantization_config_without_range_init() - config["compression"]["initializer"].update( - { - "precision": { - "bitwidth_per_scope": [ - [2, "AddTwoConv/NNCFConv2d[conv1]/conv2d_0|WEIGHT"], - [4, "AddTwoConv/NNCFConv2d[conv2]/conv2d_0|WEIGHT"], - ] - } - } - ) - config["target_device"] = "TRIAL" - config["compression"]["activations"] = {"bits": 6} - self.config = config - self.ref_bits = [ - (WeightQuantizerId(target_node_name="AddTwoConv/NNCFConv2d[conv1]/conv2d_0"), 2), - (WeightQuantizerId(target_node_name="AddTwoConv/NNCFConv2d[conv2]/conv2d_0"), 4), - (NonWeightQuantizerId(target_node_name="AddTwoConv/NNCFConv2d[conv2]/conv2d_0"), 6), - (NonWeightQuantizerId(target_node_name="AddTwoConv/NNCFConv2d[conv1]/conv2d_0"), 6), - (NonWeightQuantizerId("/nncf_model_input_0"), 6), - ] - self.expected_stats = BitwidthDistributionStatistics( - num_wq_per_bitwidth={4: 1, 2: 1}, num_aq_per_bitwidth={6: 3} - ) - self.config_to_resume = None - - def __str__(self): - return "resume_with_same" if self.config == self.config_to_resume else "resume_without_init" - - def config_with_all_inits(self): - self.config["compression"]["initializer"].update( - {"range": {"num_init_samples": 1}, "batchnorm_adaptation": {"num_bn_adaptation_samples": 1}} - ) - return self - - def resume_with_the_same_config(self): - self.config_to_resume = deepcopy(self.config) - return self - - def resume_with_the_same_config_without_init(self): - self.config_to_resume = deepcopy(self.config) - self.config_to_resume["compression"]["initializer"] = {} - return self - - @staticmethod - def setup_init_spies(mocker): - from nncf.common.initialization.batchnorm_adaptation import BatchnormAdaptationAlgorithm - from nncf.torch.quantization.algo import QuantizationBuilder - from nncf.torch.quantization.precision_init.manual_init import ManualPrecisionInitializer - - parse_range_init = mocker.spy(QuantizationBuilder, "_parse_range_init_params") - get_stats = mocker.spy(QuantizationBuilder, "_get_statistics_for_final_range_init") - run_bn_adapt = mocker.spy(BatchnormAdaptationAlgorithm, "run") - apply_manual_precision_init = mocker.spy(ManualPrecisionInitializer, "apply_init") - return [get_stats, parse_range_init, run_bn_adapt, apply_manual_precision_init] - - def check_precision_init(self, compression_ctrl): - for qid, quantizer in compression_ctrl.all_quantizations.items(): - expected_bit = [ref_bit for (ref_qid, ref_bit) in self.ref_bits if ref_qid == qid][0] - assert quantizer.num_bits == expected_bit, f"Unexpected number of bits for {str(qid)}" - - nncf_stats = compression_ctrl.statistics() - actual_stats = nncf_stats.quantization - - assert self.expected_stats.num_wq_per_bitwidth == actual_stats.num_wq_per_bitwidth - assert self.expected_stats.num_aq_per_bitwidth == actual_stats.num_aq_per_bitwidth - - -@pytest.fixture() -def _nncf_caplog(caplog): - nncf_logger.propagate = True - yield caplog - nncf_logger.propagate = False - - -LIST_MANUAL_INIT_CASES = [ - PrecisionInitTestDesc().config_with_all_inits().resume_with_the_same_config(), - PrecisionInitTestDesc().config_with_all_inits().resume_with_the_same_config_without_init(), -] - - -@pytest.mark.parametrize("desc", LIST_MANUAL_INIT_CASES, ids=map(str, LIST_MANUAL_INIT_CASES)) -def test_can_resume_with_manual_init(mocker, desc, _nncf_caplog): - config = desc.config - config_to_resume = desc.config_to_resume - - config = register_default_init_args(config, train_loader=create_ones_mock_dataloader(config)) - all_spies = desc.setup_init_spies(mocker) - init_spy = mocker.spy(PTCompressionAlgorithmBuilder, "__init__") - get_setup_spy = mocker.spy(QuantizationBuilder, "_get_quantizer_setup") - - _, compression_ctrl = create_compressed_model_and_algo_for_test(desc.model_creator(), config) - desc.check_precision_init(compression_ctrl) - - for m in all_spies: - m.assert_called() - m.reset_mock() - get_setup_spy.assert_called() - get_setup_spy.reset_mock() - - compression_state = compression_ctrl.get_compression_state() - register_bn_adaptation_init_args(config_to_resume) - _, compression_ctrl = create_compressed_model_and_algo_for_test( - desc.model_creator(), config_to_resume, compression_state=compression_state - ) - - if config_to_resume is not None and config_to_resume["compression"]["initializer"]: - assert not init_spy.call_args[0][2] - - for m in all_spies: - m.assert_not_called() - get_setup_spy.assert_not_called() - - desc.check_precision_init(compression_ctrl) - - -QUANTIZATION = "quantization" -SPARSITY_TYPES = ["magnitude", "rb", "const"] -SPARSITY_ALGOS = {"_".join([type, "sparsity"]) for type in SPARSITY_TYPES} # 3S - -LOAD_ALGOS = list(itertools.product([QUANTIZATION], SPARSITY_ALGOS)) # Q + 3S -LOAD_ALGOS += itertools.product(SPARSITY_ALGOS, [QUANTIZATION]) # 3S + Q - -SAVE_ALGOS = [[algo] for algo in SPARSITY_ALGOS] # 3S -SAVE_ALGOS += [[QUANTIZATION]] # Q -SAVE_ALGOS += LOAD_ALGOS # Q , 3S, 3S + Q, Q+3S - -ALGOS = list(sorted(itertools.product(SAVE_ALGOS, LOAD_ALGOS), key=lambda x: "_".join(x[0]) + "_".join(x[1]))) - - -@pytest.fixture( - scope="module", params=ALGOS, ids=["__".join(["save:" + "_".join(a[0]), "load:" + "_".join(a[1])]) for a in ALGOS] -) -def _algos(request): - pair_algos = request.param - save_algos = pair_algos[0] - load_algos = pair_algos[1] - resume_ok = False - # resume expects the same list of algorithms - if save_algos == load_algos: - resume_ok = True - - if len(save_algos) == len(load_algos): - for s, v in zip(save_algos, load_algos): - if s != v and ("magnitude" in s and "const" in v or "const" in s and "magnitude" in v): - resume_ok = True - - # Priority mechanism ensures that algo permutations are irrelevant - if set(save_algos) == set(load_algos): - resume_ok = True - else: - saved_sparsity = filter(lambda x: x != QUANTIZATION, save_algos) - loaded_sparsity = filter(lambda x: x != QUANTIZATION, load_algos) - - for s, v in zip(saved_sparsity, loaded_sparsity): - # resume works fine for magnitude <-> const combo, because they have similar parameters - if s != v and ("magnitude" in s and "const" in v or "const" in s and "magnitude" in v): - resume_ok = True - - return {"save_algos": save_algos, "load_algos": load_algos, "is_resume_ok": resume_ok} - - -MODEL_WRAPPER = ["CPU", "GPU"] -WRAPPERS = list(sorted(itertools.product(MODEL_WRAPPER, MODEL_WRAPPER), key=lambda x: "_".join(x))) - - -@pytest.fixture(scope="function", params=WRAPPERS, ids=["_".join(["from:" + w[0], "to:" + w[1]]) for w in WRAPPERS]) -def _model_wrapper(request): - modes = request.param - - def wrap_model(mode, model): - if mode == "GPU": - model = DataParallel(model, [0]) - return model - - return { - "save_model": partial(wrap_model, modes[0]), - "resume_model": partial(wrap_model, modes[1]), - } - - -@pytest.mark.parametrize("is_resume", (True, False), ids=["resume", "load_weights"]) -def test_load_state_interoperability(_algos, _model_wrapper, is_resume): - config_save = get_empty_config() - config_save["compression"] = [{"algorithm": algo} for algo in _algos["save_algos"]] - register_bn_adaptation_init_args(config_save) - compressed_model_save, _ = create_compressed_model_and_algo_for_test(BasicConvTestModel(), config_save) - model_save = _model_wrapper["save_model"](compressed_model_save) - saved_model_state = model_save.state_dict() - ref_num_loaded = len(saved_model_state) - - config_resume = get_empty_config() - config_resume["compression"] = [{"algorithm": algo} for algo in _algos["load_algos"]] - register_bn_adaptation_init_args(config_resume) - compressed_model_resume, _ = create_compressed_model_and_algo_for_test(BasicConvTestModel(), config_resume) - model_resume = _model_wrapper["resume_model"](compressed_model_resume) - - if not is_resume or (is_resume and _algos["is_resume_ok"]): - act_num_loaded = load_state(model_resume, saved_model_state, is_resume) - - if ( - "magnitude_sparsity" in _algos["load_algos"] or "const_sparsity" in _algos["load_algos"] - ) and "rb_sparsity" in _algos["save_algos"]: - # no need to load _mask and _uniform - ref_num_loaded -= 2 - assert act_num_loaded == ref_num_loaded - else: - with pytest.raises(nncf.InternalError): - load_state(model_resume, saved_model_state, is_resume) - - -RESUME_ALGOS = list(itertools.product([QUANTIZATION], SPARSITY_ALGOS)) # Q + 3S -RESUME_ALGOS += [[algo] for algo in SPARSITY_ALGOS] # 3S -RESUME_ALGOS += [[QUANTIZATION]] # Q -RESUME_ALGOS += [["EMPTY"]] # No Compression -RESUME_ALGOS = list( - sorted(itertools.product(RESUME_ALGOS, RESUME_ALGOS), key=lambda x: "_".join(x[0]) + "_".join(x[1])) -) -NUM_PARAMS_PER_ALGO = {QUANTIZATION: 8, "magnitude_sparsity": 1, "const_sparsity": 1, "rb_sparsity": 3, "EMPTY": 0} - - -@pytest.fixture( - scope="module", - params=RESUME_ALGOS, - ids=["__".join(["save:" + "_".join(a[0]), "load:" + "_".join(a[1])]) for a in RESUME_ALGOS], -) -def _resume_algos(request): - pair_algos = request.param - save_algos = pair_algos[0] - load_algos = pair_algos[1] - is_strict = True - - sparsity_on_save = SPARSITY_ALGOS.intersection(save_algos) - sparsity_on_load = SPARSITY_ALGOS.intersection(load_algos) - common_algos = set(save_algos).intersection(set(load_algos)) - different_algos = set(save_algos).symmetric_difference(set(load_algos)) - if different_algos: - is_strict = False - - ref_num_compression_params = sum(map(lambda x: NUM_PARAMS_PER_ALGO[x], common_algos)) - if not SPARSITY_ALGOS.intersection(common_algos) and (sparsity_on_save and sparsity_on_load): - ref_num_compression_params += 1 - - return { - "save_algos": save_algos, - "load_algos": load_algos, - "is_strict": is_strict, - "ref_num_compression_params": ref_num_compression_params, - } - - -def test_load_state__with_resume_checkpoint(_resume_algos, _model_wrapper, mocker): - config_save = get_empty_config() - config_save["compression"] = [{"algorithm": algo} for algo in _resume_algos["save_algos"] if algo != "EMPTY"] - register_bn_adaptation_init_args(config_save) - orig_model = BasicConvTestModel() - num_model_params = len(orig_model.state_dict()) - model_save, compressed_ctrl_save = create_compressed_model_and_algo_for_test(orig_model, config_save) - saved_model_state = model_save.state_dict() - saved_checkpoint = compressed_ctrl_save.get_compression_state() - ref_num_loaded = _resume_algos["ref_num_compression_params"] + num_model_params + 1 # padding_value - - config_resume = get_empty_config() - config_resume["compression"] = [{"algorithm": algo} for algo in _resume_algos["load_algos"] if algo != "EMPTY"] - register_bn_adaptation_init_args(config_resume) - from nncf.torch.checkpoint_loading import KeyMatcher - - key_matcher_run_spy = mocker.spy(KeyMatcher, "run") - model, _ = create_compressed_model_and_algo_for_test( - BasicConvTestModel(), config_resume, compression_state=saved_checkpoint - ) - load_state(model, saved_model_state, _resume_algos["is_strict"]) - key_matcher_run_spy.assert_called_once() - act_num_loaded = len(key_matcher_run_spy.spy_return) - assert act_num_loaded == ref_num_loaded - - -LIST_ALGOS = sorted(["", QUANTIZATION] + list(SPARSITY_ALGOS)) - - -@pytest.mark.parametrize("is_resume", (True, False), ids=["resume", "load_weights"]) -@pytest.mark.parametrize("algo", tuple(LIST_ALGOS)) -def test_ordinary_load(algo, _model_wrapper, is_resume): - config = get_empty_config() - if algo: - config["compression"] = {"algorithm": algo} - register_bn_adaptation_init_args(config) - - compressed_model_save, _ = create_compressed_model_and_algo_for_test(BasicConvTestModel(), config) - model_save = _model_wrapper["save_model"](compressed_model_save) - - compressed_model_resume, _ = create_compressed_model_and_algo_for_test(BasicConvTestModel(), config) - model_resume = _model_wrapper["resume_model"](compressed_model_resume) - - num_loaded = load_state(model_resume, model_save.state_dict(), is_resume) - - assert num_loaded == len(model_save.state_dict()) diff --git a/tests/torch/test_telemetry.py b/tests/torch/test_telemetry.py deleted file mode 100644 index aad30a34c62..00000000000 --- a/tests/torch/test_telemetry.py +++ /dev/null @@ -1,23 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from nncf import NNCFConfig -from tests.cross_fw.shared.helpers import telemetry_send_event_test_driver -from tests.torch.helpers import TwoConvTestModel -from tests.torch.helpers import create_compressed_model_and_algo_for_test - - -def test_telemetry_is_sent(mocker): - def use_nncf_fn(): - config = NNCFConfig({"input_info": {"sample_size": [1, 1, 32, 32]}}) - create_compressed_model_and_algo_for_test(TwoConvTestModel(), config) - - telemetry_send_event_test_driver(mocker, use_nncf_fn) diff --git a/tests/torch/test_utils.py b/tests/torch/test_utils.py index d763e0b3954..5f3e0699a67 100644 --- a/tests/torch/test_utils.py +++ b/tests/torch/test_utils.py @@ -28,8 +28,54 @@ from tests.torch.helpers import EmptyModel from tests.torch.helpers import MockModel from tests.torch.helpers import TwoConvTestModel -from tests.torch.quantization.test_overflow_issue_export import DepthWiseConvTestModel -from tests.torch.quantization.test_overflow_issue_export import EightConvTestModel +from tests.torch.helpers import create_conv + + +class EightConvTestModel(nn.Module): + def __init__(self, in_out_ch=((1, 3), (3, 5), (5, 7), (7, 10))): + super().__init__() + self.features = [] + self.features.append(create_conv(*in_out_ch[0], 2, -1, -2)) + self.features.append(nn.BatchNorm2d(in_out_ch[0][1])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*in_out_ch[1], 5, 1, 1)) + self.features.append(nn.BatchNorm2d(in_out_ch[1][1])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*in_out_ch[2], 1, 2, 2)) + self.features.append(nn.BatchNorm2d(in_out_ch[2][1])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*in_out_ch[3], 9, -1, 0)) + self.features.append(nn.BatchNorm2d(in_out_ch[3][1])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*reversed(in_out_ch[3]), 3, 0, 1)) + self.features.append(nn.BatchNorm2d(in_out_ch[3][0])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*reversed(in_out_ch[2]), 1, -1, 9)) + self.features.append(nn.BatchNorm2d(in_out_ch[2][0])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*reversed(in_out_ch[1]), 2, 10, 1)) + self.features.append(nn.BatchNorm2d(in_out_ch[1][0])) + self.features.append(nn.ReLU()) + self.features.append(create_conv(*reversed(in_out_ch[0]), 1, 1, 1)) + self.features.append(nn.BatchNorm2d(in_out_ch[0][0])) + self.features.append(nn.ReLU()) + self.features = nn.Sequential(*self.features) + + def forward(self, x): + return self.features(x) + + +class DepthWiseConvTestModel(nn.Module): + def __init__(self): + super().__init__() + self.features = [] + self.features.append(nn.Conv2d(1, 3, 3, groups=1)) + self.features.append(nn.Conv2d(3, 30, 3, groups=3)) + self.features.append(nn.Conv2d(30, 1, 3)) + self.features = nn.Sequential(*self.features) + + def forward(self, x): + return self.features(x) def compare_saved_model_state_and_current_model_state(model: nn.Module, model_state: _ModuleState): diff --git a/tests/torch2/function_hook/helpers.py b/tests/torch2/function_hook/helpers.py index 2e126e64277..d4240513a90 100644 --- a/tests/torch2/function_hook/helpers.py +++ b/tests/torch2/function_hook/helpers.py @@ -154,6 +154,19 @@ def forward(self, x): return self.module1(x) + self.module2(x) +class SharedLayersModel(torch.nn.Module): + def __init__(self): + super().__init__() + self.shared_conv = torch.nn.Conv2d(1, 1, 1) + + def forward(self, x): + x = self.shared_conv(x) + x = x + x + x = self.shared_conv(x) + x = x * x + return x + + class CounterHook(nn.Module): def __init__(self): super().__init__() diff --git a/tests/torch2/function_hook/quantization/test_quantized_graphs.py b/tests/torch2/function_hook/quantization/test_quantized_graphs.py index f39284c5318..731e7f47fa3 100644 --- a/tests/torch2/function_hook/quantization/test_quantized_graphs.py +++ b/tests/torch2/function_hook/quantization/test_quantized_graphs.py @@ -26,8 +26,8 @@ from tests.cross_fw.test_templates.helpers import RoPEModel from tests.cross_fw.test_templates.helpers import ScaledDotProductAttentionModel from tests.torch import test_models -from tests.torch.quantization.test_algo_quantization import SharedLayersModel from tests.torch.test_compressed_graph import ModelDesc +from tests.torch2.function_hook.helpers import SharedLayersModel from tests.torch2.utils import compare_with_reference_file from tests.torch2.utils import to_comparable_nx_graph diff --git a/tests/torch2/test_patching.py b/tests/torch2/test_patching.py deleted file mode 100644 index 61cd4477947..00000000000 --- a/tests/torch2/test_patching.py +++ /dev/null @@ -1,29 +0,0 @@ -# Copyright (c) 2025 Intel Corporation -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pytest -import torch - -import nncf -from nncf.torch import create_compressed_model - - -def test_patching(): - # Check that patching torch functions is disabled - import nncf.torch # noqa: F401 - - with pytest.raises(AttributeError): - getattr(torch.relu, "_original_op") - - -def test_create_compressed_model_error(): - with pytest.raises(nncf.InternalError, match="NNCF_TORCH_LEGACY_TRACING=1"): - create_compressed_model(None, {})