|
5 | 5 | 1. [Why does the size of the quantized model remain the same as the original model size?](#1-why-does-the-size-of-the-quantized-model-remain-the-same-as-the-original-model-size) |
6 | 6 | 2. [Why does loading a quantized exported model from a file fail?](#2-why-does-loading-a-quantized-exported-model-from-a-file-fail) |
7 | 7 | 3. [Why am I getting a torch.fx error?](#3-why-am-i-getting-a-torchfx-error) |
| 8 | +4. [Does MCT support both per-tensor and per-channel quantization?](#4-does-mct-support-both-per-tensor-and-per-channel-quantization) |
8 | 9 |
|
9 | 10 |
|
10 | 11 | ### 1. Why does the size of the quantized model remain the same as the original model size? |
@@ -54,3 +55,26 @@ Despite these limitations, some adjustments can be made to facilitate MCT quanti |
54 | 55 | Check the `torch.fx` error, and search for an identical replacement. Some examples: |
55 | 56 | * An `if` statement in a module's `forward` method might can be easily skipped. |
56 | 57 | * The `list()` Python method can be replaced with a concatenation operation [A, B, C]. |
| 58 | + |
| 59 | +### 4. Does MCT support both per-tensor and per-channel quantization? |
| 60 | + |
| 61 | +MCT supports both per-tensor and per-channel quantization, as [defined in TPC](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html#model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig.weights_per_channel_threshold). To change this, please set the following parameters. |
| 62 | + |
| 63 | +**Solution**: You can switch between per-tensor quantization and per-channel quantization by switching the parameter (weights_per_channel_threshold) as shown below. |
| 64 | + |
| 65 | +In the object that configures the quantizer below: |
| 66 | +* model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig() |
| 67 | + |
| 68 | +Set the following parameter: |
| 69 | +* weights_per_channel_threshold(bool) - Indicates whether to quantize the weights per-channel or per-tensor. |
| 70 | + |
| 71 | +For more details, please refer to [this page](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html#model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig.weights_per_channel_threshold). |
| 72 | + |
| 73 | + |
| 74 | +In QAT, the following object is used to set up a weight-learnable quantizer: |
| 75 | +* model_compression_toolkit.trainable_infrastructure.TrainableQuantizerWeightsConfig() |
| 76 | + |
| 77 | +Set the following parameter: |
| 78 | +* weights_per_channel_threshold (bool) – Whether to quantize the weights per-channel or not (per-tensor). |
| 79 | + |
| 80 | +For more details, please refer to [this page](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/trainable_infrastructure.html#trainablequantizerweightsconfig). |
0 commit comments