Skip to content

Commit 31faabe

Browse files
author
zhangqi3
committed
[Update] Update docs and tests.
1 parent 3567c06 commit 31faabe

File tree

15 files changed

+322
-91
lines changed

15 files changed

+322
-91
lines changed

docs/source/algorithm/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ PTQ Algorithms
9696

9797
\[ E[ L(x,y,\mathbf{w}) - L(x,y,\mathbf{w}+\Delta \mathbf{w}) ] \approx \Delta \mathbf{w}^T g^{(\mathbf{w})} + \frac12 \Delta \mathbf{w}^T H^{(\mathbf{w})} \Delta \mathbf{w} \approx \Delta \mathbf{w}_1^2 + \Delta \mathbf{w}_2^2 + \Delta \mathbf{w}_1 \Delta \mathbf{w}_2 \]
9898

99-
Hence, it's benifitial to learn a rounding mask for each layer. One well-designed object function is given by the authors:
99+
Hence, it's benificial to learn a rounding mask for each layer. One well-designed object function is given by the authors:
100100

101101
.. raw:: latex html
102102

docs/source/example/deploy.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Advance instructions for different hardware platforms
2+
======================================================
3+
4+
.. toctree::
5+
:titlesonly:
6+
7+
TensorRT <platforms/tensorrt.rst>
8+
SNPE <platforms/snpe.rst>
9+
10+

docs/source/example/index.rst

Lines changed: 8 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,11 @@
11
Get Started
2-
==========================
3-
We follow the `PyTorch official example <https://github.com/pytorch/examples/tree/master/imagenet/>`_ to build the example of Model Quantization Benchmark for ImageNet classification task.
4-
5-
Requirements
6-
-------------
2+
============
3+
This tutorial will give details about the whole work-through to do quantization with MQBench, including:
74

8-
- Install PyTorch following `pytorch.org <http://pytorch.org/>`_
9-
- Install dependencies::
5+
.. toctree::
6+
:maxdepth: 1
7+
:titlesonly:
108

11-
pip install -r requirements.txt
12-
13-
- Download the ImageNet dataset from `the official website <http://www.image-net.org/>`_
14-
15-
- Then, and move validation images to labeled subfolders, using `the following shell script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh/>`_
16-
17-
- Install TensorRT=7.2.1.6 from `NVIDIA <https://developer.nvidia.com/tensorrt/>`_
18-
19-
Usage
20-
---------
21-
22-
- **Quantization-Aware Training:**
23-
24-
- Training hyper-parameters:
25-
26-
- batch size = 128
27-
- epochs = 1
28-
- lr = 1e-4
29-
- others like weight decay, momentum are kept as default.
30-
31-
- ResNet18 / ResNet50 / MobileNet_v2::
32-
33-
python main.py -a [model_name] --epochs 1 --lr 1e-4 --b 128 --seed 99 --pretrained
34-
35-
36-
- **Deployment**
37-
We provide the example to deploy the quantized model to TensorRT.
38-
39-
1. First export the quantized model to ONNX [tensorrt_deploy_model.onnx] and dump the clip ranges [tensorrt_clip_ranges.json] for activations.::
40-
41-
python main.py -a [model_name] --resume [model_save_path]
42-
43-
44-
2. Second build the TensorRT INT8 engine and evaluate, please make sure [dataset_path] contains subfolder [val]::
45-
46-
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --clip [tensorrt_clip_ranges.json] --data [dataset_path] --evaluate
47-
48-
3. If you don’t pass in external clip ranges [tensorrt_clip_ranges.json], TenosrRT will do calibration using default algorithm IInt8EntropyCalibrator2 with 100 images. So, please make sure [dataset_path] contains subfolder [cali]::
49-
50-
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --data [dataset_path] --evaluate
51-
52-
Results
53-
-----------
54-
55-
+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+
56-
| Model | accuracy\@fp32 | accuracy\@int8 |
57-
| | +----------------------------------------+---------------------------------+---------------------------------------+
58-
| | | TensoRT Calibration | MQBench QAT | TensorRT SetRange |
59-
+===================+================================+========================================+=================================+=======================================+
60-
| **ResNet18** | Acc\@1 69.758 Acc\@5 89.078 | Acc\@1 69.612 Acc\@5 88.980 | Acc\@1 69.912 Acc\@5 89.150 | Acc\@1 69.904 Acc\@5 89.182 |
61-
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
62-
| **ResNet50** | Acc\@1 76.130 Acc\@5 92.862 | Acc\@1 76.074 Acc\@5 92.892 | Acc\@1 76.114 Acc\@5 92.946 | Acc\@1 76.320 Acc\@5 93.006 |
63-
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
64-
| **MobileNet_v2** | Acc\@1 71.878 Acc\@5 90.286 | Acc\@1 70.700 Acc\@5 89.708 | Acc\@1 70.826 Acc\@5 89.874 | Acc\@1 70.724 Acc\@5 89.870 |
65-
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
9+
setup
10+
quantization
11+
deploy
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
SNPE
2+
=============
3+
Example of QAT and deployment on SNPE.
4+
5+
**Requirements**:
6+
7+
- Install SNPE SDK from `QualComm <https://developer.qualcomm.com/sites/default/files/docs/snpe/setup.html>`_ (Suggest Ubuntu 18.04)
8+
9+
**QAT**:
10+
11+
- Follow the QAT procedures to get a model checkpoint, and suggest learning rate as 5e-5 with cosine scheduler and Adam optimizer for tens of epochs.
12+
13+
**Deployment**:
14+
15+
- Convert PyTorch checkpoint to `snpe_deploy.onnx` and dump clip ranges to `snpe_clip_ranges.json`::
16+
17+
from mqbench.convert_deploy import convert_deploy
18+
input_dict = {'x': [1, 3, 224, 224]}
19+
convert_deploy(solver.model.module, BackendType.SNPE, input_dict)
20+
21+
- Convert `.onnx` file to `.dlc` format (supported by SNPE)::
22+
23+
snpe-onnx-to-dlc --input_network ./snpe_deploy.onnx --output_path ./snpe_deploy.dlc --quantization_overrides ./snpe_clip_ranges.json
24+
25+
- Note that, the `.json` file contains activation ranges for quantization, but it's required here although the model hasn't been quantized now.
26+
27+
- Quantize the model with parameters overridden::
28+
29+
snpe-dlc-quantize --input_dlc ./snpe_deploy.dlc --input_list ./data.txt --override_params --bias_bitwidth 32
30+
31+
- The `data.txt` records paths to image data for calibration (not important since we will override parameters) which will be loaded by `numpy.fromfile(dtype=np.float32)` and have shape of `(224, 224, 3)`. And this file is required for test.
32+
33+
- Now we get the final model `snpe_deploy_quantized.dlc`
34+
35+
**Results**:
36+
37+
The test is done by SNPE SDK tools, with the quantized model and a text file recording paths to test data in shape of (224, 224, 3)::
38+
39+
snpe-net-run --container ./snpe_deploy_quantized.dlc --input_list ./test_data.txt
40+
41+
The results on several tested models:
42+
43+
+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+
44+
| Model | accuracy\@fp32 | accuracy\@int8 |
45+
| | +-------------------------------------------------------+----------------------------------------------------------+
46+
| | | MQBench QAT | SNPE |
47+
+===================+================================+=======================================================+==========================================================+
48+
| **ResNet18** | 70.65% | 70.75% | 70.74% |
49+
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+
50+
| **ResNet50** | 77.94% | 77.75% | 77.92% |
51+
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+
52+
| **MobileNet_v2** | 72.67% | 72.31% | 72.65% |
53+
+-------------------+--------------------------------+-------------------------------------------------------+----------------------------------------------------------+
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
TensorRT
2+
================
3+
Example of QAT and deployment on TensorRT.
4+
5+
**Requirements**:
6+
7+
- Install TensorRT=7.2.1.6 from `NVIDIA <https://developer.nvidia.com/tensorrt/>`_
8+
9+
**QAT**:
10+
11+
- Training hyper-parameters:
12+
13+
- batch size = 128
14+
- epochs = 1
15+
- learning rate = 1e-4 (for ResNet series) / 1e-5 (for MobileNet series)
16+
- weight decay = 1e-4 (for ResNet series) / 0 (for MobileNet series)
17+
- optimizer: SGD (for ResNet series) / Adam (for MobileNet series)
18+
- others like momentum are kept as default.
19+
20+
- [model_name] = ResNet18 / ResNet50 / MobileNet_v2 / ... ::
21+
22+
git clone https://github.com/TheGreatCold/MQBench.git
23+
cd application/imagenet_example
24+
python main.py -a [model_name] --epochs 1 --lr 1e-4 --b 128 --pretrained
25+
26+
27+
**Deployment**:
28+
29+
We provide the example to deploy the quantized model to TensorRT.
30+
31+
- First export the quantized model to ONNX [tensorrt_deploy_model.onnx] and dump the clip ranges [tensorrt_clip_ranges.json] for activations. ::
32+
33+
python main.py -a [model_name] --resume [model_save_path]
34+
35+
- Second build the TensorRT INT8 engine and evaluate, please make sure [dataset_path] contains subfolder [val]. ::
36+
37+
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --clip [tensorrt_clip_ranges.json] --data [dataset_path] --evaluate
38+
39+
- If you don’t pass in external clip ranges [tensorrt_clip_ranges.json], TenosrRT will do calibration using default algorithm IInt8EntropyCalibrator2 with 100 images. So, please make sure [dataset_path] contains subfolder [cali]. ::
40+
41+
python onnx2trt.py --onnx [tensorrt_deploy_model.onnx] --trt [model_name.trt] --data [dataset_path] --evaluate
42+
43+
**Results**:
44+
45+
46+
+-------------------+--------------------------------+------------------------------------------------------------------------------------------------------------------+
47+
| Model | accuracy\@fp32 | accuracy\@int8 |
48+
| | +----------------------------------------+---------------------------------+---------------------------------------+
49+
| | | TensoRT Calibration | MQBench QAT | TensorRT SetRange |
50+
+===================+================================+========================================+=================================+=======================================+
51+
| **ResNet18** | Acc\@1 69.758 Acc\@5 89.078 | Acc\@1 69.612 Acc\@5 88.980 | Acc\@1 69.912 Acc\@5 89.150 | Acc\@1 69.904 Acc\@5 89.182 |
52+
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
53+
| **ResNet50** | Acc\@1 76.130 Acc\@5 92.862 | Acc\@1 76.074 Acc\@5 92.892 | Acc\@1 76.114 Acc\@5 92.946 | Acc\@1 76.320 Acc\@5 93.006 |
54+
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
55+
| **MobileNet_v2** | Acc\@1 71.878 Acc\@5 90.286 | Acc\@1 70.700 Acc\@5 89.708 | Acc\@1 71.158 Acc\@5 89.990 | Acc\@1 71.102 Acc\@5 89.932 |
56+
+-------------------+--------------------------------+----------------------------------------+---------------------------------+---------------------------------------+
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
How to do quantization with MQBench
2+
======================================
3+
4+
QAT (Quantization-Aware Training)
5+
---------------------------------------------
6+
The training only requires some additional operations compared to ordinary fine-tune.
7+
8+
::
9+
10+
import torchvision.models as models
11+
from mqbench.convert_deploy import convert_deploy
12+
from mqbench.prepare_by_platform import prepare_qat_fx_by_platform, BackendType
13+
from mqbench.utils.state import enable_calibration, enable_quantization
14+
15+
# first, initialize the FP32 model with pretrained parameters.
16+
model = models.__dict__["resnet18"](pretrained=True)
17+
model.train()
18+
19+
# then, we will trace the original model using torch.fx and \
20+
# insert fake quantize nodes according to different hardware backends (e.g. TensorRT).
21+
model = prepare_qat_fx_by_platform(model, BackendType.Tensorrt)
22+
23+
# before training, we recommend to enable observers for calibration in several batches, and then enable quantization.
24+
model.eval()
25+
enable_calibration(model)
26+
calibration_flag = True
27+
28+
# training loop
29+
for i, batch in enumerate(data):
30+
# do forward procedures
31+
...
32+
33+
if calibration_flag:
34+
if i >= 0:
35+
calibration_flag = False
36+
model.zero_grad()
37+
model.train()
38+
enable_quantization(model)
39+
else:
40+
continue
41+
42+
# do backward and optimization
43+
...
44+
45+
# deploy model, remove fake quantize nodes and dump quantization params like clip ranges.
46+
convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]})
47+
48+
49+
PTQ (Post-Training Quantization)
50+
---------------------------------------------
51+
To be finished.
52+
53+
54+

docs/source/example/setup.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Preparations
2+
===================================
3+
Generally, we follow the `PyTorch official example <https://github.com/pytorch/examples/tree/master/imagenet/>`_ to build the example of Model Quantization Benchmark for ImageNet classification task.
4+
5+
6+
- Install PyTorch following `pytorch.org <http://pytorch.org/>`_
7+
- Install dependencies ::
8+
9+
pip install -r requirements.txt
10+
11+
- Specific requirements for hardware platforms will be introduced later
12+
13+
- Download the ImageNet dataset from `the official website <http://www.image-net.org/>`_
14+
15+
- Then, and move validation images to labeled subfolders, using `the following shell script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh/>`_
16+
17+
- Or process other datasets in the similar way.
18+
19+
- Full precision pretrained models are preferred, but sometimes it's possible to do QAT from scratch.
20+
21+

docs/source/hardware/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ Quantization Hardware
66

77
nnie
88
tensorrt
9+
snpe

docs/source/hardware/snpe.rst

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
SNPE
2+
=========
3+
4+
`Snapdragon Neural Processing Engine (SNPE) <https://developer.qualcomm.com/sites/default/files/docs/snpe//index.html/>`_ is a Qualcomm Snapdragon software accelerated runtime for the execution of deep neural networks.
5+
6+
.. _SNPE Quantization Scheme:
7+
8+
Quantization Scheme
9+
--------------------
10+
8/16 bit per-layer asymmetric linear quantization.
11+
12+
.. math::
13+
14+
\begin{equation}
15+
q = \mathtt{clamp}\left(\left\lfloor R * \dfrac{x - cmin}{cmax - cmin} \right\rceil, lb, ub\right)
16+
\end{equation}
17+
18+
where :math:`R` is the integer range after quantization, :math:`cmax` and :math:`cmin` are calculated range of the floating values, :math:`lb` and :math:`ub` are bounds of integer range.
19+
Taking 8bit as an example, R=255, [lb, ub]=[0,255].
20+
21+
22+
In fact, when building the SNPE with the official tools, it will firstly convert the model into *.dlc* model file of full precision, and then optionally change it into a quantized version.
23+
24+
.. attention::
25+
Users can provide a .json file to override the parameters.
26+
27+
The values of *scale* and *offset* are not required, but can be overrided.
28+
29+
SNPE will adjust the values of *cmin* and *cmax* to ensure zero is representable.

0 commit comments

Comments
 (0)