ModelTC
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/benchmark/ImageClassification/Benchmark.rst‎
Lines changed: 31 additions & 5 deletions b/‎docs/source/benchmark/ImageClassification/Benchmark.rst‎
Lines changed: 31 additions & 5 deletions
diff --git a/‎docs/source/user_guide/PTQ/adaround.rst‎ ‎docs/source/user_guide/PTQ/advanced.rst‎docs/source/user_guide/PTQ/adaround.rst renamed to docs/source/user_guide/PTQ/advanced.rst
Lines changed: 30 additions & 2 deletions b/‎docs/source/user_guide/PTQ/adaround.rst‎ ‎docs/source/user_guide/PTQ/advanced.rst‎docs/source/user_guide/PTQ/adaround.rst renamed to docs/source/user_guide/PTQ/advanced.rst
Lines changed: 30 additions & 2 deletions
diff --git a/‎docs/source/user_guide/deploy/vitis.rst‎
Lines changed: 82 additions & 0 deletions b/‎docs/source/user_guide/deploy/vitis.rst‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎docs/source/user_guide/howtoptq.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/user_guide/howtoptq.rst‎
Lines changed: 1 addition & 1 deletion
@@ -6,6 +6,7 @@
 ------------
 [![Documentation Status](https://readthedocs.org/projects/mqbench/badge/?version=latest)](https://mqbench.readthedocs.io/en/latest/?badge=latest)
 [![Lint and test.](https://github.com/ModelTC/MQBench/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/ModelTC/MQBench/actions/workflows/python-package-conda.yml)
+[![license](https://img.shields.io/github/license/ModelTC/MQBench)](https://github.com/ModelTC/MQBench/blob/main/LICENSE)
 
 ## Introduction
 
 
@@ -30,14 +30,40 @@ Generally, we follow the `PyTorch official example <https://github.com/pytorch/e
 | AdaRound      | EMAMSE        | Academic | 4    | 8    | 70.35    | 76.87    | 71.82           | 72.32       | 73.58       |
 +---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
 
-- Backend: TensorRT
 
 +---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
-| W_calibration | A_calibration | Backend  | wbit | abit | resnet18 | resnet50 | mobilenetv2_1.0 | regnetx600m | regnetx800m |
+| W_calibration | A_calibration | Backend  | wbit | abit | resnet18 | resnet50 | mobilenetv2_1.0 | regnetx600m | regnetx800m |   
 +===============+===============+==========+======+======+==========+==========+=================+=============+=============+
-| None          | None          | TensorRT | 32   | 32   | 70.63    | 77.94    | 72.68           | 73.60       | 74.83       |
+| None          | None          | Academic | 32   | 32   | 71.06    | 77.00    | 72.68           | 73.60       | 74.83       |     
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| AdaRound      | EMAMSE        | Academic | 4    | 4    | 68.67    | 74.21    | 65.11           | 70.24       | 71.54       |    
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| BRECQ         | EMAMSE        | Academic | 4    | 4    | 68.52    | 74.47    | 66.90           | 70.30       | 72.04       |     
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| QDrop         | EMAMSE        | Academic | 4    | 4    | 68.84    | 74.97    | 67.60           | 70.85       | 72.62       |    
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| AdaRound      | EMAMSE        | Academic | 2    | 4    | 62.31    | 65.23    | 34.14           | 57.14       | 58.33       |     
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| BRECQ         | EMAMSE        | Academic | 2    | 4    | 63.56    | 68.64    | 49.18           | 62.36       | 64.53       |     
 +---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
-| MinMax        | EMAMinMax     | TensorRT | 8    | 8    | 70.33    | 76.72    | 72.50           | 73.28       | 74.75       |
+| QDrop         | EMAMSE        | Academic | 2    | 4    | 64.49    | 69.30    | 51.37           | 63.51       | 65.84       |     
 +---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
-| MSE           | EMAMSE        | TensorRT | 8    | 8    | 70.55    | 77.79    | 72.56           | 73.41       | 74.70       |
+| AdaRound      | EMAMSE        | Academic | 3    | 3    | 64.18    | 66.76    | 28.41           | 59.57       | 61.45       |    
 +---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| BRECQ         | EMAMSE        | Academic | 3    | 3    | 64.24    | 70.09    | 48.65           | 62.83       | 65.49       |   
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+| QDrop         | EMAMSE        | Academic | 3    | 3    | 65.42    | 70.81    | 53.09           | 64.78       | 67.45       |
++---------------+---------------+----------+------+------+----------+----------+-----------------+-------------+-------------+
+
+.. note::
+  Although AdaRound and BRECQ first learn the weight rounding with FP32 activation then determine the quantization parameters,
+  we find let weight face activation quantization behaves better,
+  extremely for ultra-low bit as proposed in QDrop.
+  Therefore, here we take the same training strategy as QDrop for fair comparisons among these three methods.
+  Hyperparameters are also kept the same except that AdaRound uses 10000 iters to do layer reconstruction 
+  and BRECQ, QDrop use 20000 iters for block reconstruction.
+
+.. note::
+  About block partition in MobileNetV2 in BRECQ and QDrop, we achieve it somewhere different with their original paper
+  for the sake of more general and automatic partition way.
+
@@ -1,5 +1,9 @@
-AdaRound
+Advanced PTQ
 ========
+This part, we introduce some advanced post-training quantization methods including AdaRound, BRECQ and QDrop.
+Fair experimental comparisons can be found in Benchmark.
+
+**Adaround**
 
 `AdaRound <https://arxiv.org/pdf/2004.10568.pdf>`_ aims to find the global optimal strategy of rounding the quantized values. In common sense, rounding-to-nearest is optimal for each individual value, but through threoretical analysis on the quantization loss, it's not the case for the entire network or the whole layer. The second order term in the difference contains cross term of the round error, illustrated in a layer of two weights:
 
@@ -62,4 +66,28 @@ where :math:`h(\mathbf{V}_{i,j})=clip(\sigma(\mathbf{V}_{i,j})(\zeta-\gamma)+\ga
     ...
 
     # deploy model, remove fake quantize nodes and dump quantization params like clip ranges.
-    convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]})
+    convert_deploy(model.eval(), BackendType.Tensorrt, input_shape_dict={'data': [10, 3, 224, 224]})
+
+
+**BRECQ**
+
+Unlike AdaRound, which learn to reconstruct the output and tune the weight layer by layer,
+BRECQ discusses different granularity of output reconstruction including layer, block, stage and net.
+Combined with experimental results and theoretical analysis, BRECQ recommends to learn weight rounding block by block,
+where a block is viewed as collection of layers.
+
+Here, we obey the following rules to determine a block:
+
+    1. A layer is a Conv or Linear module, BN and ReLU are attached to that layer. 
+
+    2. Residual connection should be in the block, such as BasicBlock in ResNet.
+
+    3. If there is no residual connection, singles layers should be combined unless there are 3 single layers or next layer meets condition 2.
+
+**QDrop**
+
+Based on BRECQ, QDrop first compares different orders of optimization procedure (weight and activation) and concludes that 
+first weight then activation behaves poorly especially at ultra-low bit. It recommends to let the weight face activation quantization
+such as learn the step size of activation and weight rounding together. However, it also points out that there are better ways to do
+activation quantization to find a good calibrated weight. Finally, they replace the activation quantization value by FP32 one randomly at netron level
+during reconstruction. And they use the probability 0.5 to drop activation quantization.
@@ -0,0 +1,82 @@
+Vitis
+========
+
+Introduction
+^^^^^^^^^^^^
+
+'Xilinx Vitis <https://github.com/Xilinx/Vitis-AI/>`_ is a platform for high-performance deep learning inference on Xilinx FPGA device.
+
+.. _Vitis Quantization Scheme:
+
+**Quantization Scheme**
+
+8bit per-tensor power-of-two symmetric linear quantization.
+
+.. math::
+
+    \begin{equation}
+        q = \mathtt{clamp}(\lfloor x / s \rceil, lb, ub) * s
+    \end{equation}
+
+
+where :math:`s` is power-of-two scaling factor to quantize a number from floating range to integer range, :math:`lb` and :math:`ub` are bounds of integer range.
+For weights and activations, [lb, ub] = [-128, 127].
+
+Deploy on Vitis
+^^^^^^^^^^^^^^^^^^
+
+**Requirements**:
+
+- Build Docker from /docker
+
+**Deployment**:
+
+We provide the example to deploy the quantized EOD model to Vitis.
+
+- First modify the configuration file, add quantization, taks yolox_tiny as an example, save new configuration file as "yolox_tiny_quant.yaml".
+    
+    .. code-block:: shell
+        :linenos:
+
+        quant:
+  	  deploy_backend: vitis
+          cali_batch_size: 50
+
+- Second change optimizer and lr_scheduler.
+    
+    .. code-block:: shell
+        :linenos:
+   
+    	trainer: 
+   	  max_epoch: &max_epoch 5
+  	  save_freq: 1
+  	  test_freq: 1
+  	  optimizer:             
+            register_type: qat_weights
+            type: Adam
+            kwargs:
+              lr: 0.00000015625
+              weight_decay: 0.000
+          lr_scheduler:
+            type: MultiStepLR
+              kwargs:
+                milestones: [1,2]
+                gamma: 0.1
+
+- Third quantize model.
+
+    .. code-block:: shell
+        :linenos:
+
+	python -m eod train --config configs/det/yolox/yolox_tiny_quant.yaml --nm 1 --ng 1 --launch pytorch
+
+- Fourth use function deploy() in ./eod/runner/quantexport eport deployed model xmodel[mqbench_qmodel.xmodel].
+
+    .. code-block:: shell
+        :linenos:
+
+        from mqbench.convert_deploy import convert_deploy
+        deploy_backend = self.config['quant']['deploy_backend']
+        dummy_input = self.get_batch('train')
+        self.model.eval()
+        convert_deploy(self.model, self.backend_type[deploy_backend], dummy_input={'image': dummy_input['image']})
@@ -5,4 +5,4 @@ How to conduct PTQ
    :titlesonly:
 
    Naive PTQ <PTQ/naive.rst>
-   AdaRound <PTQ/adaround.rst>
+   Advanced PTQ <PTQ/advanced.rst>