add int8 bert model (#481)

mengniwang95 · wenbingl · web-flow · commit 2063d79467c0 · 2021-11-12T10:20:52.000-08:00
* add int8 bert model

Signed-off-by: mengniwa &lt;mengni.wang@intel.com&gt;

* update readme

Signed-off-by: mengniwa &lt;mengni.wang@intel.com&gt;

Co-authored-by: Wenbing Li &lt;10278425+wenbingl@users.noreply.github.com&gt;
diff --git a/text/machine_comprehension/bert-squad/README.md b/text/machine_comprehension/bert-squad/README.md
@@ -10,10 +10,17 @@ BERT (Bidirectional Encoder Representations from Transformers) applies Transform
 
 ## Model
 
- |Model        |Download  |Download (with sample test data)| ONNX version |Opset version|
-| ------------- | ------------- | ------------- | ------------- | ------------- |
-|BERT-Squad| [416 MB](model/bertsquad-8.onnx) |  [385 MB](model/bertsquad-8.tar.gz) |  1.3 | 8|
-|BERT-Squad| [416 MB](model/bertsquad-10.onnx) |  [384 MB](model/bertsquad-10.tar.gz) |  1.5 | 10|
+ |Model        |Download  |Download (with sample test data)| ONNX version |Opset version| Accuracy|
+| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
+|BERT-Squad| [416 MB](model/bertsquad-8.onnx) |  [385 MB](model/bertsquad-8.tar.gz) |  1.3 | 8| |
+|BERT-Squad| [416 MB](model/bertsquad-10.onnx) |  [384 MB](model/bertsquad-10.tar.gz) |  1.5 | 10| |
+|BERT-Squad| [416 MB](model/bertsquad-12.onnx) |  [384 MB](model/bertsquad-12.tar.gz) |  1.9 | 12| 80.67171|
+|BERT-Squad-int8| [119 MB](model/bertsquad-12-int8.onnx) |  [101 MB](model/bertsquad-12-int8.tar.gz) |  1.9 | 12| 80.43519|
+> Compared with the fp32 BERT-Squad, BERT-Squad-int8's accuracy drop ratio is 0.29%, performance improvement is 1.81x.
+>
+> Note the performance depends on the test hardware. 
+> 
+> Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
 
 Dependencies
 * [tokenization.py](dependencies/tokenization.py)
@@ -110,13 +117,39 @@ Metric is Exact Matching (EM) of 80.7, computed over SQuAD v1.1 dev data, for th
 ## Training
 Fine-tuned the model using SQuAD-1.1 dataset. Look at [BertTutorial.ipynb](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb) for more information for converting the model from tensorflow to onnx and for fine-tuning
 
+## Quantization
+BERT-Squad-int8 is obtained by quantizing BERT-Squad model (opset=12). We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel-innersource/frameworks.ai.lpot.intel-lpot/blob/master/examples/onnxrt/onnx_model_zoo/bert-squad/readme.md) to understand how to use Intel® Neural Compressor for quantization.
+
+### Environment
+onnx: 1.9.0 
+onnxruntime: 1.8.0
+
+### Prepare model
+```shell
+wget https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
+```
+
+### Model quantize
+```bash
+bash run_tuning.sh --input_model=/path/to/model \ # model path as *.onnx
+                   --output_model=/path/to/model_tune \
+                   --dataset_location=/path/to/SQuAD/dataset \
+                   --config=bert.yaml
+```
 
 ## References
 * **BERT** Model from the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
 
 * [BERT Tutorial](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb)
+
+* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
+
 ## Contributors
-[Kundana Pillari](https://github.com/kundanapillari)
+* [Kundana Pillari](https://github.com/kundanapillari)
+* [mengniwang95](https://github.com/mengniwang95) (Intel)
+* [airMeng](https://github.com/airMeng) (Intel)
+* [ftian1](https://github.com/ftian1) (Intel)
+* [hshen14](https://github.com/hshen14) (Intel)
 
 ## License
 Apache 2.0
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.onnx b/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.onnx
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:24325bdf732256859e07a385d2f363a6839b20b2bb4a57ac362b3982fc8ce121
+size 124565601
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.tar.gz b/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.tar.gz
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5f4bd418f2ee55310788fed78842297ec0b3ecda6669c563196f20396cb4d401
+size 105881973
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx b/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5f0d96a9e6b8a4a2e59f9636f8bbad09ee8a0c58c8212027cc17c5fcc9659e55
+size 435852736
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12.tar.gz b/text/machine_comprehension/bert-squad/model/bertsquad-12.tar.gz
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4cd041010ab4ad11c23c0eb7f056c4e4286a894b6d712ef09445b955763fb1b1
+size 403082198

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:24325bdf732256859e07a385d2f363a6839b20b2bb4a57ac362b3982fc8ce121`
	`3`	`+size 124565601`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:5f4bd418f2ee55310788fed78842297ec0b3ecda6669c563196f20396cb4d401`
	`3`	`+size 105881973`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:5f0d96a9e6b8a4a2e59f9636f8bbad09ee8a0c58c8212027cc17c5fcc9659e55`
	`3`	`+size 435852736`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:4cd041010ab4ad11c23c0eb7f056c4e4286a894b6d712ef09445b955763fb1b1`
	`3`	`+size 403082198`