Skip to content

Commit f65f407

Browse files
Tracinzhangqi3李普xuexingyuanweixiuying
authored
Tensort explicit mode && ConvFreezeBN && Brecq and QDrop (#45)
* [Update] Add ConvFreezeBn. * Add vitis doc * Add new features and fix bugs: * Change default value of leaf_module from dict to list * Support specifying output_names for onnx conversion * Add additional_node_name for ModelQuantizer * Support conversion to ONNXQLinear for TensorRT Backend (So that model can be run with ONNX-runtime) * Set dynamic range calculation to scale \* max(-qmin, qmax) for TensorRT Backend * [Doc] Update benchmark. * [Feature] Update Brecq qdrop. Co-authored-by: zhangqi3 <zhangqi3@sensetime.com> Co-authored-by: 李普 <SENSETIME\lipu1@cn0214002439l.domain.sensetime.com> Co-authored-by: xuexingyuan <xuexingyuan@sensetime.com> Co-authored-by: weixiuying <weixiuying@sensetime.com>
1 parent 2db4acf commit f65f407

File tree

27 files changed

+1018
-95
lines changed

27 files changed

+1018
-95
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
------------
77
[![Documentation Status](https://readthedocs.org/projects/mqbench/badge/?version=latest)](https://mqbench.readthedocs.io/en/latest/?badge=latest)
88
[![Lint and test.](https://github.com/ModelTC/MQBench/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/ModelTC/MQBench/actions/workflows/python-package-conda.yml)
9+
[![license](https://img.shields.io/github/license/ModelTC/MQBench)](https://github.com/ModelTC/MQBench/blob/main/LICENSE)
910

1011
## Introduction
1112

docs/source/benchmark/ImageClassification/Benchmark.rst

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,40 @@ Generally, we follow the `PyTorch official example <https://github.com/pytorch/e
3030
| AdaRound | EMAMSE | Academic | 4 | 8 | 70.35 | 76.87 | 71.82 | 72.32 | 73.58 |
3131
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
3232

33-
- Backend: TensorRT
3433

3534
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
36-
| W_calibration | A_calibration | Backend | wbit | abit | resnet18 | resnet50 | mobilenetv2_1.0 | regnetx600m | regnetx800m |
35+
| W_calibration | A_calibration | Backend | wbit | abit | resnet18 | resnet50 | mobilenetv2_1.0 | regnetx600m | regnetx800m |
3736
+===============+===============+==========+======+======+==========+==========+=================+=============+=============+
38-
| None | None | TensorRT | 32 | 32 | 70.63 | 77.94 | 72.68 | 73.60 | 74.83 |
37+
| None | None | Academic | 32 | 32 | 71.06 | 77.00 | 72.68 | 73.60 | 74.83 |
38+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
39+
| AdaRound | EMAMSE | Academic | 4 | 4 | 68.67 | 74.21 | 65.11 | 70.24 | 71.54 |
40+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
41+
| BRECQ | EMAMSE | Academic | 4 | 4 | 68.52 | 74.47 | 66.90 | 70.30 | 72.04 |
42+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
43+
| QDrop | EMAMSE | Academic | 4 | 4 | 68.84 | 74.97 | 67.60 | 70.85 | 72.62 |
44+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
45+
| AdaRound | EMAMSE | Academic | 2 | 4 | 62.31 | 65.23 | 34.14 | 57.14 | 58.33 |
46+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
47+
| BRECQ | EMAMSE | Academic | 2 | 4 | 63.56 | 68.64 | 49.18 | 62.36 | 64.53 |
3948
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
40-
| MinMax | EMAMinMax | TensorRT | 8 | 8 | 70.33 | 76.72 | 72.50 | 73.28 | 74.75 |
49+
| QDrop | EMAMSE | Academic | 2 | 4 | 64.49 | 69.30 | 51.37 | 63.51 | 65.84 |
4150
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
42-
| MSE | EMAMSE | TensorRT | 8 | 8 | 70.55 | 77.79 | 72.56 | 73.41 | 74.70 |
51+
| AdaRound | EMAMSE | Academic | 3 | 3 | 64.18 | 66.76 | 28.41 | 59.57 | 61.45 |
4352
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
53+
| BRECQ | EMAMSE | Academic | 3 | 3 | 64.24 | 70.09 | 48.65 | 62.83 | 65.49 |
54+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
55+
| QDrop | EMAMSE | Academic | 3 | 3 | 65.42 | 70.81 | 53.09 | 64.78 | 67.45 |
56+
+---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
57+
58+
.. note::
59+
Although AdaRound and BRECQ first learn the weight rounding with FP32 activation then determine the quantization parameters,
60+
we find let weight face activation quantization behaves better,
61+
extremely for ultra-low bit as proposed in QDrop.
62+
Therefore, here we take the same training strategy as QDrop for fair comparisons among these three methods.
63+
Hyperparameters are also kept the same except that AdaRound uses 10000 iters to do layer reconstruction
64+
and BRECQ, QDrop use 20000 iters for block reconstruction.
65+
66+
.. note::
67+
About block partition in MobileNetV2 in BRECQ and QDrop, we achieve it somewhere different with their original paper
68+
for the sake of more general and automatic partition way.
69+
Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
1-
AdaRound
1+
Advanced PTQ
22
========
3+
This part, we introduce some advanced post-training quantization methods including AdaRound, BRECQ and QDrop.
4+
Fair experimental comparisons can be found in Benchmark.
5+
6+
**Adaround**
37

48
`AdaRound <https://arxiv.org/pdf/2004.10568.pdf>`_ aims to find the global optimal strategy of rounding the quantized values. In common sense, rounding-to-nearest is optimal for each individual value, but through threoretical analysis on the quantization loss, it's not the case for the entire network or the whole layer. The second order term in the difference contains cross term of the round error, illustrated in a layer of two weights:
59

@@ -62,4 +66,28 @@ where :math:`h(\mathbf{V}_{i,j})=clip(\sigma(\mathbf{V}_{i,j})(\zeta-\gamma)+\ga
6266
...
6367
6468
# deploy model, remove fake quantize nodes and dump quantization params like clip ranges.
65-
convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]})
69+
convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]})
70+
71+
72+
**BRECQ**
73+
74+
Unlike AdaRound, which learn to reconstruct the output and tune the weight layer by layer,
75+
BRECQ discusses different granularity of output reconstruction including layer, block, stage and net.
76+
Combined with experimental results and theoretical analysis, BRECQ recommends to learn weight rounding block by block,
77+
where a block is viewed as collection of layers.
78+
79+
Here, we obey the following rules to determine a block:
80+
81+
1. A layer is a Conv or Linear module, BN and ReLU are attached to that layer.
82+
83+
2. Residual connection should be in the block, such as BasicBlock in ResNet.
84+
85+
3. If there is no residual connection, singles layers should be combined unless there are 3 single layers or next layer meets condition 2.
86+
87+
**QDrop**
88+
89+
Based on BRECQ, QDrop first compares different orders of optimization procedure (weight and activation) and concludes that
90+
first weight then activation behaves poorly especially at ultra-low bit. It recommends to let the weight face activation quantization
91+
such as learn the step size of activation and weight rounding together. However, it also points out that there are better ways to do
92+
activation quantization to find a good calibrated weight. Finally, they replace the activation quantization value by FP32 one randomly at netron level
93+
during reconstruction. And they use the probability 0.5 to drop activation quantization.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
Vitis
2+
========
3+
4+
Introduction
5+
^^^^^^^^^^^^
6+
7+
'Xilinx Vitis <https://github.com/Xilinx/Vitis-AI/>`_ is a platform for high-performance deep learning inference on Xilinx FPGA device.
8+
9+
.. _Vitis Quantization Scheme:
10+
11+
**Quantization Scheme**
12+
13+
8bit per-tensor power-of-two symmetric linear quantization.
14+
15+
.. math::
16+
17+
\begin{equation}
18+
q = \mathtt{clamp}(\lfloor x / s \rceil, lb, ub) * s
19+
\end{equation}
20+
21+
22+
where :math:`s` is power-of-two scaling factor to quantize a number from floating range to integer range, :math:`lb` and :math:`ub` are bounds of integer range.
23+
For weights and activations, [lb, ub] = [-128, 127].
24+
25+
Deploy on Vitis
26+
^^^^^^^^^^^^^^^^^^
27+
28+
**Requirements**:
29+
30+
- Build Docker from /docker
31+
32+
**Deployment**:
33+
34+
We provide the example to deploy the quantized EOD model to Vitis.
35+
36+
- First modify the configuration file, add quantization, taks yolox_tiny as an example, save new configuration file as "yolox_tiny_quant.yaml".
37+
38+
.. code-block:: shell
39+
:linenos:
40+
41+
quant:
42+
deploy_backend: vitis
43+
cali_batch_size: 50
44+
45+
- Second change optimizer and lr_scheduler.
46+
47+
.. code-block:: shell
48+
:linenos:
49+
50+
trainer:
51+
max_epoch: &max_epoch 5
52+
save_freq: 1
53+
test_freq: 1
54+
optimizer:
55+
register_type: qat_weights
56+
type: Adam
57+
kwargs:
58+
lr: 0.00000015625
59+
weight_decay: 0.000
60+
lr_scheduler:
61+
type: MultiStepLR
62+
kwargs:
63+
milestones: [1,2]
64+
gamma: 0.1
65+
66+
- Third quantize model.
67+
68+
.. code-block:: shell
69+
:linenos:
70+
71+
python -m eod train --config configs/det/yolox/yolox_tiny_quant.yaml --nm 1 --ng 1 --launch pytorch
72+
73+
- Fourth use function deploy() in ./eod/runner/quantexport eport deployed model xmodel[mqbench_qmodel.xmodel].
74+
75+
.. code-block:: shell
76+
:linenos:
77+
78+
from mqbench.convert_deploy import convert_deploy
79+
deploy_backend = self.config['quant']['deploy_backend']
80+
dummy_input = self.get_batch('train')
81+
self.model.eval()
82+
convert_deploy(self.model, self.backend_type[deploy_backend], dummy_input={'image': dummy_input['image']})

docs/source/user_guide/howtoptq.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ How to conduct PTQ
55
:titlesonly:
66

77
Naive PTQ <PTQ/naive.rst>
8-
AdaRound <PTQ/adaround.rst>
8+
Advanced PTQ <PTQ/advanced.rst>

0 commit comments

Comments
 (0)