|
6 | 6 | "id": "ag0MtvPUkc8i" |
7 | 7 | }, |
8 | 8 | "source": [ |
9 | | - "# Quantization Troubleshooting with the Model Compression Toolkit (MCT) Using the XQuant Extension Tool\n", |
| 9 | + "# Quantization Troubleshooting with the Model Compression Toolkit (MCT) Using the XQuant Extension Tool(Judgeable Troubleshoot)\n", |
10 | 10 | "\n", |
11 | 11 | "[Run this tutorial in Google Colab](https://colab.research.google.com/github/SonySemiconductorSolutions/mct-model-optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_XQuant_Extension_Tool.ipynb)\n", |
12 | 12 | "\n", |
13 | 13 | "## Overview\n", |
14 | 14 | "This notebook provides practical guidance for improving the quality of post‑training quantization for PyTorch models using the XQuant Extension Tool. \n", |
15 | 15 | "It computes the error for each layer by comparing the floating‑point model and the quantized model, and combines these results with the quantization log. The analysis is presented as a report that highlights the causes of detected errors and suggests appropriate corrective actions for each.\n", |
16 | 16 | "\n", |
| 17 | + "General corrective actions are listed [here](https://colab.research.google.com/github/SonySemiconductorSolutions/mct-model-optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_XQuant_Extension_Tool_2.ipynb).\n", |
| 18 | + "\n", |
17 | 19 | "## Summary\n", |
18 | 20 | "We will cover the following steps:\n", |
19 | 21 | "\n", |
|
24 | 26 | " - Quantization Troubleshooting for MCT \n", |
25 | 27 | "4. Judgeable Troubleshoot\n", |
26 | 28 | " - Outlier Removal\n", |
27 | | - "5. General Troubleshoot\n", |
28 | | - " - Representative Dataset Size & Diversity\n", |
29 | | - " - Bias Correction\n", |
30 | | - " - Using More Samples in Mixed Precision Quantization\n", |
31 | | - " - Threshold Selection Error Method\n", |
32 | | - " - Enabling Hessian Based Mixed Precision\n", |
33 | | - " - GPTQ - Gradient-Based Post Training Quantization\n", |
34 | | - "6. Conclusion\n", |
| 29 | + "5. Conclusion\n", |
35 | 30 | "\n", |
36 | | - " \n", |
37 | 31 | "## Setup\n", |
38 | 32 | "Install the relevant packages:" |
39 | 33 | ] |
|
187 | 181 | "cell_type": "markdown", |
188 | 182 | "metadata": {}, |
189 | 183 | "source": [ |
190 | | - "## General Troubleshoots\n", |
191 | | - "If there is no significant improvement, comprehensively evaluate other areas for improvement.\n", |
192 | | - "The following items are general troubleshoots for quantization accuracy improvement." |
193 | | - ] |
194 | | - }, |
195 | | - { |
196 | | - "cell_type": "markdown", |
197 | | - "metadata": {}, |
198 | | - "source": [ |
199 | | - "### Representative Dataset Size & Diversity\n", |
200 | | - "The representative dataset is used by MCT to derive the threshold for the model's activation tensor.\n", |
201 | | - "If the representative dataset is too small or not diverse enough, accuracy may decrease.\n", |
202 | | - "Increase the number of samples in the representative dataset or increase the diversity of the samples.\n", |
203 | | - "\n", |
204 | | - "See [TroubleShooting Documentation>>Representative Dataset Size & Diversity](https://sonysemiconductorsolutions.github.io/mct-model-optimization/docs_troubleshoot/troubleshoots/representative_dataset_size_and_diversity.html#ug-representative-dataset-size-and-diversity)" |
205 | | - ] |
206 | | - }, |
207 | | - { |
208 | | - "cell_type": "code", |
209 | | - "execution_count": null, |
210 | | - "metadata": {}, |
211 | | - "outputs": [], |
212 | | - "source": [ |
213 | | - "# Change representative dataset and re-run the quantization process." |
214 | | - ] |
215 | | - }, |
216 | | - { |
217 | | - "cell_type": "markdown", |
218 | | - "metadata": {}, |
219 | | - "source": [ |
220 | | - "### Bias Correction\n", |
221 | | - "MCT applies bias correction by default to overcome induced bias shift caused by weights quantization.\n", |
222 | | - "\n", |
223 | | - "You can check if the bias correction causes a degradation in accuracy, by disabling the bias correction (setting weights_bias_correction to False of the QuantizationConfig in CoreConfig).\n", |
224 | | - "\n", |
225 | | - "See [TroubleShooting Documentation>>Bias Correction](https://sonysemiconductorsolutions.github.io/mct-model-optimization/docs_troubleshoot/troubleshoots/bias_correction.html#ug-bias-correction)" |
226 | | - ] |
227 | | - }, |
228 | | - { |
229 | | - "cell_type": "code", |
230 | | - "execution_count": null, |
231 | | - "metadata": {}, |
232 | | - "outputs": [], |
233 | | - "source": [ |
234 | | - "# Change weights_bias_correction and re-run the quantization process." |
235 | | - ] |
236 | | - }, |
237 | | - { |
238 | | - "cell_type": "markdown", |
239 | | - "metadata": {}, |
240 | | - "source": [ |
241 | | - "### Using More Samples in Mixed Precision Quantization\n", |
242 | | - "\n", |
243 | | - "In Mixed Precision quantization, MCT will assign a different bit width to each weight in the model, depending on the weight’s layer sensitivity and a resource constraint defined by the user, such as target model size.\n", |
244 | | - "\n", |
245 | | - "By default, MCT employs 32 samples from the provided representative dataset for the Mixed Precision search. Leveraging a larger dataset could enhance results, particularly when dealing with datasets exhibiting high variance.\n", |
246 | | - "\n", |
247 | | - "Set the num_of_images attribute to a larger value of the MixedPrecisionQuantizationConfig in CoreConfig.\n", |
248 | | - "\n", |
249 | | - "See [TroubleShooting Documentation>>Using More Samples in Mixed Precision Quantization](https://sonysemiconductorsolutions.github.io/mct-model-optimization/docs_troubleshoot/troubleshoots/using_more_samples_in_mixed_precision_quantization.html#ug-using-more-samples-in-mixed-precision-quantization" |
250 | | - ] |
251 | | - }, |
252 | | - { |
253 | | - "cell_type": "code", |
254 | | - "execution_count": null, |
255 | | - "metadata": {}, |
256 | | - "outputs": [], |
257 | | - "source": [ |
258 | | - "# Change num_of_images and re-run the quantization process." |
259 | | - ] |
260 | | - }, |
261 | | - { |
262 | | - "cell_type": "markdown", |
263 | | - "metadata": {}, |
264 | | - "source": [ |
265 | | - "### Threshold Selection Error Method\n", |
266 | | - "MCT defaults to employing the Mean-Squared Error (MSE) metric for threshold optimization, however, it offers a range of alternative error metrics (e.g. using min/max values, KL-divergence, etc.) to accommodate different network requirements.\n", |
267 | | - "\n", |
268 | | - "We advise you to consider other error metrics if your model is suffering from significant accuracy degradation, especially if it contains unorthodox activation layers.\n", |
269 | | - "\n", |
270 | | - "For example, set NOCLIPPING to the activation_error_method attribute of the QuantizationConfig in CoreConfig.\n", |
271 | | - "\n", |
272 | | - "See [TroubleShooting Documentation>>Threshold Selection Error Method](https://sonysemiconductorsolutions.github.io/mct-model-optimization/docs_troubleshoot/troubleshoots/threhold_selection_error_method.html#ug-threshold-selection-error-method)" |
273 | | - ] |
274 | | - }, |
275 | | - { |
276 | | - "cell_type": "code", |
277 | | - "execution_count": null, |
278 | | - "metadata": {}, |
279 | | - "outputs": [], |
280 | | - "source": [ |
281 | | - "# Change activation_error_method and re-run the quantization process." |
282 | | - ] |
283 | | - }, |
284 | | - { |
285 | | - "cell_type": "markdown", |
286 | | - "metadata": {}, |
287 | | - "source": [ |
288 | | - "### Enabling Hessian Based Mixed Precision\n", |
289 | | - "MCT offers a Hessian-based scoring mechanism to assess the importance of layers during the Mixed Precision search.\n", |
290 | | - "This feature can notably enhance Mixed Precision outcomes for certain network architectures.\n", |
291 | | - "\n", |
292 | | - "Set the use_hessian_based_scores flag to True in the MixedPrecisionQuantizationConfig of the CoreConfig.\n", |
293 | | - "\n", |
294 | | - "See [TroubleShooting Documentation>>Enabling Hessian-Based Mixed Precision](https://sonysemiconductorsolutions.github.io/mct-model-optimization/docs_troubleshoot/troubleshoots/enabling_hessian-based_mixed_precision.html#ug-enabling-hessian-based-mixed-precision)\n" |
295 | | - ] |
296 | | - }, |
297 | | - { |
298 | | - "cell_type": "code", |
299 | | - "execution_count": null, |
300 | | - "metadata": {}, |
301 | | - "outputs": [], |
302 | | - "source": [ |
303 | | - "# Change use_hessian_based_scores and re-run the quantization process." |
304 | | - ] |
305 | | - }, |
306 | | - { |
307 | | - "cell_type": "markdown", |
308 | | - "metadata": {}, |
309 | | - "source": [ |
310 | | - "### GPTQ - Gradient-Based Post Training Quantization\n", |
311 | | - "When PTQ (either with or without Mixed Precision) fails to deliver the required accuracy, GPTQ is potentially the remedy.\n", |
312 | | - "\n", |
313 | | - "MCT can configure GPTQ optimization options, such as the number of epochs for the optimization process.\n", |
314 | | - "\n", |
315 | | - "See [GPTQ - Gradient-Based Post Training Quantization](https://sonysemiconductorsolutions.github.io/mct-model-optimization/docs_troubleshoot/troubleshoots/gptq-gradient_based_post_training_quantization.html#ug-gptq-gradient-based-post-training-quantization)" |
316 | | - ] |
317 | | - }, |
318 | | - { |
319 | | - "cell_type": "code", |
320 | | - "execution_count": null, |
321 | | - "metadata": {}, |
322 | | - "outputs": [], |
323 | | - "source": [ |
324 | | - "# Re-run the quantization process using GPTQ." |
| 184 | + "## Conclusion\n", |
| 185 | + "Through this XQuant analysis, accuracy improved by XX%\n", |
| 186 | + "By doing this, you can improve the accuracy of your model with the XQuant Extension Tool." |
325 | 187 | ] |
326 | 188 | }, |
327 | 189 | { |
328 | 190 | "cell_type": "markdown", |
329 | 191 | "metadata": {}, |
330 | 192 | "source": [ |
331 | | - "## Conclusion\n", |
332 | | - "Through this XQuant analysis, accuracy improved by XX%\n", |
333 | | - "By doing this, you can improve the accuracy of your model with the XQuant Extension Tool.\n", |
334 | | - "For more information, please see [this document](https://sonysemiconductorsolutions.github.io/mct-model-optimization/guidelines/XQuant_Extension_Tool.html#overall-process-flow\n", |
335 | | - ")" |
| 193 | + "To see general corrective actions and practical guidance to improve the post-training quantization quality of your PyTorch models, check out [the notebook](https://colab.research.google.com/github/SonySemiconductorSolutions/mct-model-optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_XQuant_Extension_Tool_2.ipynb)." |
336 | 194 | ] |
337 | 195 | }, |
338 | 196 | { |
|
0 commit comments