|
| 1 | +# Offline INT8 Calibration Tool |
| 2 | + |
| 3 | +PaddlePaddle supports offline INT8 calibration to accelerate the inference speed. In this document, we provide the instructions on how to enable INT8 calibration and show the ResNet-50 and MobileNet-V1 results in accuracy. |
| 4 | + |
| 5 | +## 0. Prerequisite |
| 6 | +You need to install at least PaddlePaddle-1.3 python package `pip install paddlepaddle==1.3`. |
| 7 | + |
| 8 | +## 1. How to generate INT8 model |
| 9 | +You can refer to the unit test in [test_calibration.py](../tests/test_calibration.py). Basically, there are three steps: |
| 10 | +* Construct calibration object. |
| 11 | + |
| 12 | +```python |
| 13 | +calibrator = int8_utility.Calibrator( # Step 1 |
| 14 | + program=infer_program, # required, FP32 program |
| 15 | + pretrained_model=model_path, # required, FP32 pretrained model |
| 16 | + algo=algo, # required, calibration algorithm; default is max, the alternative is KL (Kullback–Leibler divergence) |
| 17 | + exe=exe, # required, executor |
| 18 | + output=int8_model, # required, INT8 model |
| 19 | + feed_var_names=feed_dict, # required, feed dict |
| 20 | + fetch_list=fetch_targets) # required, fetch targets |
| 21 | +``` |
| 22 | + |
| 23 | +* Call the calibrator.sample_data() after executor run. |
| 24 | +```python |
| 25 | +_, acc1, _ = exe.run( |
| 26 | + program, |
| 27 | + feed={feed_dict[0]: image, |
| 28 | + feed_dict[1]: label}, |
| 29 | + fetch_list=fetch_targets) |
| 30 | + |
| 31 | +calibrator.sample_data() # Step 2 |
| 32 | +``` |
| 33 | + |
| 34 | +* Call the calibrator.save_int8_model() after sampling over specified iterations (e.g., iterations = 50) |
| 35 | +```python |
| 36 | +calibrator.save_int8_model() # Step 3 |
| 37 | +``` |
| 38 | + |
| 39 | +## 2. How to run INT8 model |
| 40 | +You can load INT8 model by load_inference_model [API](https://github.com/PaddlePaddle/Paddle/blob/8b50ad80ff6934512d3959947ac1e71ea3fb9ea3/python/paddle/fluid/io.py#L991) and run INT8 inference similar as [FP32](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/eval.py "FP32"). |
| 41 | + |
| 42 | +```python |
| 43 | +[infer_program, feed_dict, |
| 44 | + fetch_targets] = fluid.io.load_inference_model(model_path, exe) |
| 45 | +``` |
| 46 | + |
| 47 | +## 3. Result |
| 48 | +We provide the results of accuracy measurd on [Intel® Xeon® Platinum Gold Processor](https://ark.intel.com/products/120489/Intel-Xeon-Gold-6148-Processor-27-5M-Cache-2-40-GHz- "Intel® Xeon® Gold 6148 Processor") (also known as Intel® Xeon® Skylake6148). |
| 49 | + |
| 50 | +| Model | Dataset | FP32 Accuracy | INT8 Accuracy | Accuracy Diff | |
| 51 | +| ------------ | ------------ | ------------ | ------------ | ------------ | |
| 52 | +| ResNet-50 | Small | 72.00% | 72.00% | 0.00% | |
| 53 | +| MobileNet-V1 | Small | 62.00% | 62.00% | 0.00% | |
| 54 | +| ResNet-50 | Full ImageNet Val | 76.63% | 76.17% | 0.46% | |
| 55 | +| MobileNet-V1 | Full ImageNet Val | 70.78% | 70.49% | 0.29% | |
| 56 | + |
| 57 | +Please note that [Small](http://paddle-inference-dist.cdn.bcebos.com/int8/calibration_test_data.tar.gz "Small") is a subset of [full ImageNet validation dataset](http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar "full ImageNet validation dataset"). |
| 58 | + |
| 59 | +Notes: |
| 60 | +* The accuracy measurement requires the model with `label`. |
| 61 | +* The INT8 theoretical speedup is ~1.33X on Intel® Xeon® Skylake Server (please refer to `This allows for 4x more input at the cost of 3x more instructions or 33.33% more compute` in [Reference](https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training "Reference")). |
| 62 | + |
| 63 | +## 4. How to reproduce the results |
| 64 | +* Small dataset |
| 65 | +```bash |
| 66 | +python python/paddle/fluid/contrib/tests/test_calibration.py |
| 67 | +``` |
| 68 | + |
| 69 | +* Full dataset |
| 70 | +```bash |
| 71 | +DATASET=full python python/paddle/fluid/contrib/tests/test_calibration.py |
| 72 | +``` |
0 commit comments