Skip to content

Commit 2063d79

Browse files
add int8 bert model (#481)
* add int8 bert model Signed-off-by: mengniwa <[email protected]> * update readme Signed-off-by: mengniwa <[email protected]> Co-authored-by: Wenbing Li <[email protected]>
1 parent 5f7b9ca commit 2063d79

File tree

5 files changed

+50
-5
lines changed

5 files changed

+50
-5
lines changed

text/machine_comprehension/bert-squad/README.md

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,17 @@ BERT (Bidirectional Encoder Representations from Transformers) applies Transform
1010

1111
## Model
1212

13-
|Model |Download |Download (with sample test data)| ONNX version |Opset version|
14-
| ------------- | ------------- | ------------- | ------------- | ------------- |
15-
|BERT-Squad| [416 MB](model/bertsquad-8.onnx) | [385 MB](model/bertsquad-8.tar.gz) | 1.3 | 8|
16-
|BERT-Squad| [416 MB](model/bertsquad-10.onnx) | [384 MB](model/bertsquad-10.tar.gz) | 1.5 | 10|
13+
|Model |Download |Download (with sample test data)| ONNX version |Opset version| Accuracy|
14+
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
15+
|BERT-Squad| [416 MB](model/bertsquad-8.onnx) | [385 MB](model/bertsquad-8.tar.gz) | 1.3 | 8| |
16+
|BERT-Squad| [416 MB](model/bertsquad-10.onnx) | [384 MB](model/bertsquad-10.tar.gz) | 1.5 | 10| |
17+
|BERT-Squad| [416 MB](model/bertsquad-12.onnx) | [384 MB](model/bertsquad-12.tar.gz) | 1.9 | 12| 80.67171|
18+
|BERT-Squad-int8| [119 MB](model/bertsquad-12-int8.onnx) | [101 MB](model/bertsquad-12-int8.tar.gz) | 1.9 | 12| 80.43519|
19+
> Compared with the fp32 BERT-Squad, BERT-Squad-int8's accuracy drop ratio is 0.29%, performance improvement is 1.81x.
20+
>
21+
> Note the performance depends on the test hardware.
22+
>
23+
> Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
1724
1825
Dependencies
1926
* [tokenization.py](dependencies/tokenization.py)
@@ -110,13 +117,39 @@ Metric is Exact Matching (EM) of 80.7, computed over SQuAD v1.1 dev data, for th
110117
## Training
111118
Fine-tuned the model using SQuAD-1.1 dataset. Look at [BertTutorial.ipynb](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb) for more information for converting the model from tensorflow to onnx and for fine-tuning
112119

120+
## Quantization
121+
BERT-Squad-int8 is obtained by quantizing BERT-Squad model (opset=12). We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel-innersource/frameworks.ai.lpot.intel-lpot/blob/master/examples/onnxrt/onnx_model_zoo/bert-squad/readme.md) to understand how to use Intel® Neural Compressor for quantization.
122+
123+
### Environment
124+
onnx: 1.9.0
125+
onnxruntime: 1.8.0
126+
127+
### Prepare model
128+
```shell
129+
wget https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
130+
```
131+
132+
### Model quantize
133+
```bash
134+
bash run_tuning.sh --input_model=/path/to/model \ # model path as *.onnx
135+
--output_model=/path/to/model_tune \
136+
--dataset_location=/path/to/SQuAD/dataset \
137+
--config=bert.yaml
138+
```
113139

114140
## References
115141
* **BERT** Model from the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
116142

117143
* [BERT Tutorial](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb)
144+
145+
* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
146+
118147
## Contributors
119-
[Kundana Pillari](https://github.com/kundanapillari)
148+
* [Kundana Pillari](https://github.com/kundanapillari)
149+
* [mengniwang95](https://github.com/mengniwang95) (Intel)
150+
* [airMeng](https://github.com/airMeng) (Intel)
151+
* [ftian1](https://github.com/ftian1) (Intel)
152+
* [hshen14](https://github.com/hshen14) (Intel)
120153

121154
## License
122155
Apache 2.0
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:24325bdf732256859e07a385d2f363a6839b20b2bb4a57ac362b3982fc8ce121
3+
size 124565601
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:5f4bd418f2ee55310788fed78842297ec0b3ecda6669c563196f20396cb4d401
3+
size 105881973
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:5f0d96a9e6b8a4a2e59f9636f8bbad09ee8a0c58c8212027cc17c5fcc9659e55
3+
size 435852736
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:4cd041010ab4ad11c23c0eb7f056c4e4286a894b6d712ef09445b955763fb1b1
3+
size 403082198

0 commit comments

Comments
 (0)