Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
1. [Why does the size of the quantized model remain the same as the original model size?](#1-why-does-the-size-of-the-quantized-model-remain-the-same-as-the-original-model-size)
2. [Why does loading a quantized exported model from a file fail?](#2-why-does-loading-a-quantized-exported-model-from-a-file-fail)
3. [Why am I getting a torch.fx error?](#3-why-am-i-getting-a-torchfx-error)
4. [Does MCT support both per-tensor and per-channel quantization?](#4-does-mct-support-both-per-tensor-and-per-channel-quantization)


### 1. Why does the size of the quantized model remain the same as the original model size?
Expand Down Expand Up @@ -54,3 +55,26 @@ Despite these limitations, some adjustments can be made to facilitate MCT quanti
Check the `torch.fx` error, and search for an identical replacement. Some examples:
* An `if` statement in a module's `forward` method might can be easily skipped.
* The `list()` Python method can be replaced with a concatenation operation [A, B, C].

### 4. Does MCT support both per-tensor and per-channel quantization?

MCT supports both per-tensor and per-channel quantization, as [defined in TPC](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html#model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig.weights_per_channel_threshold). To change this, please set the following parameters.

**Solution**: You can switch between per-tensor quantization and per-channel quantization by switching the parameter (weights_per_channel_threshold) as shown below.

In the object that configures the quantizer below:
* model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig()

Set the following parameter:
* weights_per_channel_threshold(bool) - Indicates whether to quantize the weights per-channel or per-tensor.

For more details, please refer to [this page](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html#model_compression_toolkit.target_platform_capabilities.schema.mct_current_schema.AttributeQuantizationConfig.weights_per_channel_threshold).


In QAT, the following object is used to set up a weight-learnable quantizer:
* model_compression_toolkit.trainable_infrastructure.TrainableQuantizerWeightsConfig()

Set the following parameter:
* weights_per_channel_threshold (bool) – Whether to quantize the weights per-channel or not (per-tensor).

For more details, please refer to [this page](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/trainable_infrastructure.html#trainablequantizerweightsconfig).
2 changes: 1 addition & 1 deletion docs/api/api_docs/classes/BitWidthConfig.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>BitWidthConfig &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down
2 changes: 1 addition & 1 deletion docs/api/api_docs/classes/DataGenerationConfig.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Data Generation Configuration &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down
16 changes: 8 additions & 8 deletions docs/api/api_docs/classes/DefaultDict.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>DefaultDict Class &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down Expand Up @@ -60,15 +60,15 @@ <h3>Navigation</h3>
<dd><p>Get the value of the inner dictionary by the given key, If key is not in dictionary,
it uses the default_factory to return a default value.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>key</strong> – Key to use in inner dictionary.</p>
<dt class="field-odd">Return type<span class="colon">:</span></dt>
<dd class="field-odd"><p><span class="sphinx_autodoc_typehints-type"><code class="xref py py-data docutils literal notranslate"><span class="pre">Any</span></code></span></p>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>Value of the inner dictionary by the given key, or a default value if not exist.
If default_factory was not passed at initialization, it returns None.</p>
<dt class="field-even">Parameters<span class="colon">:</span></dt>
<dd class="field-even"><p><strong>key</strong> – Key to use in inner dictionary.</p>
</dd>
<dt class="field-odd">Return type<span class="colon">:</span></dt>
<dd class="field-odd"><p><code class="xref py py-data docutils literal notranslate"><span class="pre">Any</span></code></p>
<dt class="field-odd">Returns<span class="colon">:</span></dt>
<dd class="field-odd"><p>Value of the inner dictionary by the given key, or a default value if not exist.
If default_factory was not passed at initialization, it returns None.</p>
</dd>
</dl>
</dd></dl>
Expand Down
4 changes: 2 additions & 2 deletions docs/api/api_docs/classes/FrameworkInfo.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>FrameworkInfo Class &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down Expand Up @@ -66,7 +66,7 @@ <h3>Navigation</h3>
<p class="rubric">Examples</p>
<p>When quantizing a Keras model, if we want to quantize the kernels of Conv2D layers only, we can
set, and we know it’s kernel out/in channel indices are (3, 2) respectivly:</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="k">as</span> <span class="nn">tf</span>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span><span class="w"> </span><span class="nn">tensorflow</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">tf</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">kernel_ops</span> <span class="o">=</span> <span class="p">[</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">kernel_channels_mapping</span> <span class="o">=</span> <span class="n">DefaultDict</span><span class="p">({</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Conv2D</span><span class="p">:</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">)})</span>
</pre></div>
Expand Down
2 changes: 1 addition & 1 deletion docs/api/api_docs/classes/GradientPTQConfig.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>GradientPTQConfig Class &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>MixedPrecisionQuantizationConfig &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down
2 changes: 1 addition & 1 deletion docs/api/api_docs/classes/PruningConfig.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Pruning Configuration &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down
8 changes: 1 addition & 7 deletions docs/api/api_docs/classes/PruningInfo.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Pruning Information &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down Expand Up @@ -65,9 +65,6 @@ <h3>Navigation</h3>
<dt class="field-even">Return type<span class="colon">:</span></dt>
<dd class="field-even"><p>Dict[BaseNode, np.ndarray]</p>
</dd>
<dt class="field-odd">Return type<span class="colon">:</span></dt>
<dd class="field-odd"><p><code class="xref py py-class docutils literal notranslate"><span class="pre">Dict</span></code>[<code class="xref py py-class docutils literal notranslate"><span class="pre">BaseNode</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">ndarray</span></code>]</p>
</dd>
</dl>
</dd></dl>

Expand All @@ -82,9 +79,6 @@ <h3>Navigation</h3>
<dt class="field-even">Return type<span class="colon">:</span></dt>
<dd class="field-even"><p>Dict[BaseNode, np.ndarray]</p>
</dd>
<dt class="field-odd">Return type<span class="colon">:</span></dt>
<dd class="field-odd"><p><code class="xref py py-class docutils literal notranslate"><span class="pre">Dict</span></code>[<code class="xref py py-class docutils literal notranslate"><span class="pre">BaseNode</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">ndarray</span></code>]</p>
</dd>
</dl>
</dd></dl>

Expand Down
4 changes: 2 additions & 2 deletions docs/api/api_docs/classes/QuantizationConfig.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>QuantizationConfig &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down Expand Up @@ -50,7 +50,7 @@ <h3>Navigation</h3>
activations using thresholds, with weight threshold selection based on MSE and activation threshold selection
using NOCLIPPING (min/max), while enabling relu_bound_to_power_of_2 and weights_bias_correction,
you can instantiate a quantization configuration like this:</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">model_compression_toolkit</span> <span class="k">as</span> <span class="nn">mct</span>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span><span class="w"> </span><span class="nn">model_compression_toolkit</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">mct</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">qc</span> <span class="o">=</span> <span class="n">mct</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">QuantizationConfig</span><span class="p">(</span><span class="n">activation_error_method</span><span class="o">=</span><span class="n">mct</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">QuantizationErrorMethod</span><span class="o">.</span><span class="n">NOCLIPPING</span><span class="p">,</span> <span class="n">weights_error_method</span><span class="o">=</span><span class="n">mct</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">QuantizationErrorMethod</span><span class="o">.</span><span class="n">MSE</span><span class="p">,</span> <span class="n">relu_bound_to_power_of_2</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">weights_bias_correction</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
Expand Down
40 changes: 36 additions & 4 deletions docs/api/api_docs/classes/QuantizationErrorMethod.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>QuantizationErrorMethod &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down Expand Up @@ -45,12 +45,44 @@ <h3>Navigation</h3>
<dt class="sig sig-object py" id="model_compression_toolkit.core.QuantizationErrorMethod">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">model_compression_toolkit.core.</span></span><span class="sig-name descname"><span class="pre">QuantizationErrorMethod</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#model_compression_toolkit.core.QuantizationErrorMethod" title="Link to this definition">¶</a></dt>
<dd><p>Method for quantization threshold selection:</p>
<p>NOCLIPPING - Use min/max values as thresholds.</p>
<p>MSE - Use mean square error for minimizing quantization noise.</p>
<p>NOCLIPPING - Use min/max values as thresholds. This avoids clipping bias but reduces quantization resolution.</p>
<p>MSE - <strong>(default)</strong> Use mean square error for minimizing quantization noise.</p>
<p>MAE - Use mean absolute error for minimizing quantization noise.</p>
<p>KL - Use KL-divergence to make signals distributions to be similar as possible.</p>
<p>Lp - Use Lp-norm to minimizing quantization noise.</p>
<p>Lp - Use Lp-norm to minimizing quantization noise. The parameter p is specified by QuantizationConfig.l_p_value (default: 2; integer only). It equals MAE when p = 1 and MSE when p = 2. If you want to use p≧3, please use this method.</p>
<p>HMSE - Use Hessian-based mean squared error for minimizing quantization noise. This method is using Hessian scores to factorize more valuable parameters when computing the error induced by quantization.</p>
<p><strong>How to select QuantizationErrorMethod</strong></p>
<table class="docutils align-default">
<colgroup>
<col style="width: 20.0%" />
<col style="width: 80.0%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Method</p></th>
<th class="head"><p>Recommended Situations</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>NOCLIPPING</p></td>
<td><p>Research and debugging phases where you want to observe behavior across the entire range. This is effective when you want to maintain the entire range, especially when the data is biased (for example, when there is an extremely small amount of data on the minimum side).</p></td>
</tr>
<tr class="row-odd"><td><p>MSE</p></td>
<td><p><strong>Basically, you should use this method.</strong> This method is effective when the data distribution is close to normal and there are few outliers. Effective when you want stable results, such as in regression tasks.</p></td>
</tr>
<tr class="row-even"><td><p>MAE</p></td>
<td><p>Effective for data with a lot of noise and outliers.</p></td>
</tr>
<tr class="row-odd"><td><p>KL</p></td>
<td><p>Useful for tasks where output distribution is important (such as Anomaly Detection).</p></td>
</tr>
<tr class="row-even"><td><p>LP</p></td>
<td><p>p≧3 is effective when you want to be more sensitive to outliers than MSE. (such as Sparse Data).</p></td>
</tr>
<tr class="row-odd"><td><p>HMSE</p></td>
<td><p>Recommended when using GPTQ. This is effective for models where specific layers strongly influence the overall accuracy. (such as Transformers).</p></td>
</tr>
</tbody>
</table>
</dd></dl>

</section>
Expand Down
2 changes: 1 addition & 1 deletion docs/api/api_docs/classes/ResourceUtilization.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>ResourceUtilization &#8212; MCT Documentation: ver 2.6.0</title>
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../../../static/pygments.css?v=03e43079" />
<link rel="stylesheet" type="text/css" href="../../../static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../../../static/css/custom.css?v=01243f34" />

Expand Down
Loading
Loading