Skip to content

Commit e49c41d

Browse files
yuwenzhojcwchen
andauthored
Add bidaf and arcface int8 model (#598)
* upload arcface and bidaf int8 onnx model Signed-off-by: yuwenzho <[email protected]> * add bidaf and arcface int8 model Signed-off-by: yuwenzho <[email protected]> --------- Signed-off-by: yuwenzho <[email protected]> Co-authored-by: Chun-Wei Chen <[email protected]>
1 parent 3d125fa commit e49c41d

File tree

7 files changed

+292
-3
lines changed

7 files changed

+292
-3
lines changed

ONNX_HUB_MANIFEST.json

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,154 @@
311311
"model_with_data_bytes": 403400046
312312
}
313313
},
314+
{
315+
"model": "BiDAF-int8",
316+
"model_path": "text/machine_comprehension/bidirectional_attention_flow/model/bidaf-11-int8.onnx",
317+
"onnx_version": "1.13.1",
318+
"opset_version": 11,
319+
"metadata": {
320+
"model_sha": "c2bbfd7568f4f19c8db82395c81d8d6199f3c0237f49e0f669d47c82643ef29e",
321+
"model_bytes": 12452924,
322+
"tags": [
323+
"text",
324+
"machine comprehension",
325+
"bidirectional attention flow"
326+
],
327+
"io_ports": {
328+
"inputs": [
329+
{
330+
"name": "context_word",
331+
"shape": [
332+
"c",
333+
1
334+
],
335+
"type": "tensor(string)"
336+
},
337+
{
338+
"name": "context_char",
339+
"shape": [
340+
"c",
341+
1,
342+
1,
343+
16
344+
],
345+
"type": "tensor(string)"
346+
},
347+
{
348+
"name": "query_word",
349+
"shape": [
350+
"q",
351+
1
352+
],
353+
"type": "tensor(string)"
354+
},
355+
{
356+
"name": "query_char",
357+
"shape": [
358+
"q",
359+
1,
360+
1,
361+
16
362+
],
363+
"type": "tensor(string)"
364+
}
365+
],
366+
"outputs": [
367+
{
368+
"name": "start_pos",
369+
"shape": [
370+
1
371+
],
372+
"type": "tensor(int32)"
373+
},
374+
{
375+
"name": "end_pos",
376+
"shape": [
377+
1
378+
],
379+
"type": "tensor(int32)"
380+
}
381+
]
382+
},
383+
"model_with_data_path": "text/machine_comprehension/bidirectional_attention_flow/model/bidaf-11-int8.tar.gz",
384+
"model_with_data_sha": "571410c31445882ea9ed7b9f48fe8c2ed6ccb72b925281a1be82a75c0c12b6ab",
385+
"model_with_data_bytes": 9086295
386+
}
387+
},
388+
{
389+
"model": "BiDAF",
390+
"model_path": "text/machine_comprehension/bidirectional_attention_flow/model/bidaf-9.onnx",
391+
"onnx_version": "1.4",
392+
"opset_version": 9,
393+
"metadata": {
394+
"model_sha": "dfc317b56d065a3e297240a9e9b9118ff2260790b5850f4be2bc6ea1bcc65e80",
395+
"model_bytes": 43522228,
396+
"tags": [
397+
"text",
398+
"machine comprehension",
399+
"bidirectional attention flow"
400+
],
401+
"io_ports": {
402+
"inputs": [
403+
{
404+
"name": "context_word",
405+
"shape": [
406+
"c",
407+
1
408+
],
409+
"type": "tensor(string)"
410+
},
411+
{
412+
"name": "context_char",
413+
"shape": [
414+
"c",
415+
1,
416+
1,
417+
16
418+
],
419+
"type": "tensor(string)"
420+
},
421+
{
422+
"name": "query_word",
423+
"shape": [
424+
"q",
425+
1
426+
],
427+
"type": "tensor(string)"
428+
},
429+
{
430+
"name": "query_char",
431+
"shape": [
432+
"q",
433+
1,
434+
1,
435+
16
436+
],
437+
"type": "tensor(string)"
438+
}
439+
],
440+
"outputs": [
441+
{
442+
"name": "start_pos",
443+
"shape": [
444+
1
445+
],
446+
"type": "tensor(int32)"
447+
},
448+
{
449+
"name": "end_pos",
450+
"shape": [
451+
1
452+
],
453+
"type": "tensor(int32)"
454+
}
455+
]
456+
},
457+
"model_with_data_path": "text/machine_comprehension/bidirectional_attention_flow/model/bidaf-9.tar.gz",
458+
"model_with_data_sha": "c74387eec257f2cb37cefc2846e1c4078bfebf06cd6486e9dafe6c9f7cdc1ef3",
459+
"model_with_data_bytes": 39092248
460+
}
461+
},
314462
{
315463
"model": "GPT-2",
316464
"model_path": "text/machine_comprehension/gpt-2/model/gpt2-10.onnx",
@@ -841,6 +989,48 @@
841989
"model_with_data_bytes": 194535656
842990
}
843991
},
992+
{
993+
"model": "LResNet100E-IR-int8",
994+
"model_path": "vision/body_analysis/arcface/model/arcfaceresnet100-11-int8.onnx",
995+
"onnx_version": "1.13.1",
996+
"opset_version": 11,
997+
"metadata": {
998+
"model_sha": "c625ca68a422418c48aa84f73341337e0a92b111f327909005d1eec07c95f936",
999+
"model_bytes": 65764892,
1000+
"tags": [
1001+
"vision",
1002+
"body analysis",
1003+
"arcface"
1004+
],
1005+
"io_ports": {
1006+
"inputs": [
1007+
{
1008+
"name": "data",
1009+
"shape": [
1010+
1,
1011+
3,
1012+
112,
1013+
112
1014+
],
1015+
"type": "tensor(float)"
1016+
}
1017+
],
1018+
"outputs": [
1019+
{
1020+
"name": "fc1",
1021+
"shape": [
1022+
1,
1023+
512
1024+
],
1025+
"type": "tensor(float)"
1026+
}
1027+
]
1028+
},
1029+
"model_with_data_path": "vision/body_analysis/arcface/model/arcfaceresnet100-11-int8.tar.gz",
1030+
"model_with_data_sha": "d560f59c57fa4784771ba520b5b2f380097d7b2210e6c8b02ca203c2e9784f8a",
1031+
"model_with_data_bytes": 47945269
1032+
}
1033+
},
8441034
{
8451035
"model": "LResNet100E-IR",
8461036
"model_path": "vision/body_analysis/arcface/model/arcfaceresnet100-8.onnx",

text/machine_comprehension/bidirectional_attention_flow/README.md

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@ This model is a neural network for answering a query about a given context parag
99

1010
|Model |Download |Download (with sample test data)|ONNX version|Opset version|Accuracy |
1111
|-------------|:--------------|:--------------|:--------------|:--------------|:--------------|
12-
|BiDAF |[41.5 MB](model/bidaf-9.onnx) |[37.3 MB](model/bidaf-9.tar.gz)|1.4 |ONNX 9, ONNX.ML 1 |EM of 68.1 in SQuAD v1.1 |
12+
|BiDAF |[41.5 MB](model/bidaf-9.onnx) |[37.3 MB](model/bidaf-9.tar.gz)|1.4 | 9 |EM of 68.1 in SQuAD v1.1 |
13+
|BiDAF-int8 |[12 MB](model/bidaf-11-int8.onnx) |[8.7 MB](model/bidaf-11-int8.tar.gz)|1.13.1 |11 |EM of 65.93 in SQuAD v1.1 |
14+
> Compared with the fp32 BiDAF, int8 BiDAF accuracy drop ratio is 0.23% and performance improvement is 0.89x in SQuAD v1.1.
15+
>
16+
> The performance depends on the test hardware. Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
1317
1418
<hr>
1519

@@ -77,6 +81,40 @@ The model is trained with [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explore
7781

7882
## Validation accuracy
7983
Metric is Exact Matching (EM) of 68.1, computed over SQuAD v1.1 dev data.
84+
<hr>
85+
86+
## Quantization
87+
BiDAF-int8 is obtained by quantizing fp32 BiDAF model. We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/nlp/onnx_model_zoo/BiDAF/quantization/ptq_dynamic/README.md) to understand how to use Intel® Neural Compressor for quantization.
88+
89+
90+
### Prepare Model
91+
Download model from [ONNX Model Zoo](https://github.com/onnx/models).
92+
93+
```shell
94+
wget https://github.com/onnx/models/raw/main/text/machine_comprehension/bidirectional_attention_flow/model/bidaf-9.onnx
95+
```
96+
97+
Convert opset version to 11 for more quantization capability.
98+
99+
```python
100+
import onnx
101+
from onnx import version_converter
102+
103+
model = onnx.load('bidaf-9.onnx')
104+
model = version_converter.convert_version(model, 11)
105+
onnx.save_model(model, 'bidaf-11.onnx')
106+
```
107+
108+
### Model quantize
109+
110+
Dynamic quantization:
111+
112+
```bash
113+
bash run_tuning.sh --input_model=path/to/model \ # model path as *.onnx
114+
--dataset_location=path/to/squad/dev-v1.1.json
115+
--output_model=path/to/model_tune
116+
```
117+
80118
<hr>
81119

82120
## Publication/Attribution
@@ -85,7 +123,16 @@ Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi. Bidirectional
85123
<hr>
86124

87125
## References
88-
This model is converted from a CNTK model trained from [this implementation](https://github.com/microsoft/CNTK/tree/nikosk/bidaf/Examples/Text/BidirectionalAttentionFlow/squad).
126+
* This model is converted from a CNTK model trained from [this implementation](https://github.com/microsoft/CNTK/tree/nikosk/bidaf/Examples/Text/BidirectionalAttentionFlow/squad).
127+
* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
128+
<hr>
129+
130+
## Contributors
131+
* [mengniwang95](https://github.com/mengniwang95) (Intel)
132+
* [yuwenzho](https://github.com/yuwenzho) (Intel)
133+
* [airMeng](https://github.com/airMeng) (Intel)
134+
* [ftian1](https://github.com/ftian1) (Intel)
135+
* [hshen14](https://github.com/hshen14) (Intel)
89136
<hr>
90137

91138
## License
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:c2bbfd7568f4f19c8db82395c81d8d6199f3c0237f49e0f669d47c82643ef29e
3+
size 12452924
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:571410c31445882ea9ed7b9f48fe8c2ed6ccb72b925281a1be82a75c0c12b6ab
3+
size 9086295

vision/body_analysis/arcface/README.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ The model LResNet100E-IR is an ArcFace model that uses ResNet100 as a backend wi
1414
|Model |Download |Download (with sample test data)|ONNX version|Opset version|LFW * accuracy (%)|CFP-FF * accuracy (%)|CFP-FP * accuracy (%)|AgeDB-30 * accuracy (%)|
1515
|-------------|:--------------|:--------------|:--------------|:--------------|:--------------|:--------------|:--------------|:--------------|
1616
|LResNet100E-IR| [248.9 MB](model/arcfaceresnet100-8.onnx)|[226.6 MB](model/arcfaceresnet100-8.tar.gz) | 1.3 |8|99.77 | 99.83 | 94.21 | 97.87|
17+
|LResNet100E-IR-int8| [63 MB](model/arcfaceresnet100-11-int8.onnx)|[46 MB](model/arcfaceresnet100-11-int8.tar.gz) | 1.13.1 |11|99.80 | | | |
18+
> Compared with the fp32 LResNet100E-IR, int8 LResNet100E-IR accuracy drop ratio is 0% and performance improvement is 1.78x in LFW dataset.
19+
>
20+
> The performance depends on the test hardware. Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
1721
1822
\* each of the accuracy metrics correspond to accuracies on different [validation sets](#val_data) each with their own [validation methods](#val_method).
1923

@@ -66,13 +70,49 @@ The validation techniques for the three validation sets are described below:
6670

6771
We used MXNet as framework to perform validation. Use the notebook [arcface_validation](dependencies/arcface_validation.ipynb) to verify the accuracy of the model on the validation set. Make sure to specify the appropriate model name in the notebook.
6872

73+
## Quantization
74+
LResNet100E-IR-int8 is obtained by quantizing fp32 LResNet100E-IR model. We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/body_analysis/onnx_model_zoo/arcface/quantization/ptq_static/README.md) to understand how to use Intel® Neural Compressor for quantization.
75+
76+
77+
### Prepare Model
78+
Download model from [ONNX Model Zoo](https://github.com/onnx/models).
79+
80+
```shell
81+
wget https://github.com/onnx/models/raw/main/vision/body_analysis/arcface/model/arcfaceresnet100-8.onnx
82+
```
83+
84+
Convert opset version to 11 for more quantization capability.
85+
86+
```python
87+
import onnx
88+
from onnx import version_converter
89+
model = onnx.load('arcfaceresnet100-8.onnx')
90+
model = version_converter.convert_version(model, 11)
91+
onnx.save_model(model, 'arcfaceresnet100-11.onnx')
92+
```
93+
94+
### Model quantize
95+
96+
```bash
97+
cd neural-compressor/examples/onnxrt/body_analysis/onnx_model_zoo/arcface/quantization/ptq_static
98+
bash run_tuning.sh --input_model=path/to/model \ # model path as *.onnx
99+
--dataset_location=/path/to/faces_ms1m_112x112/task.bin \
100+
--output_model=path/to/save
101+
```
102+
69103
## References
70104
* All models are from the paper [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698).
71105
* Original training dataset from the paper [MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition](https://arxiv.org/abs/1607.08221).
72106
* [InsightFace repo](https://github.com/deepinsight/insightface), [MXNet](http://mxnet.incubator.apache.org)
107+
* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
73108

74109
## Contributors
75-
[abhinavs95](https://github.com/abhinavs95) (Amazon AI)
110+
* [abhinavs95](https://github.com/abhinavs95) (Amazon AI)
111+
* [mengniwang95](https://github.com/mengniwang95) (Intel)
112+
* [yuwenzho](https://github.com/yuwenzho) (Intel)
113+
* [airMeng](https://github.com/airMeng) (Intel)
114+
* [ftian1](https://github.com/ftian1) (Intel)
115+
* [hshen14](https://github.com/hshen14) (Intel)
76116

77117
## License
78118
Apache 2.0
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:c625ca68a422418c48aa84f73341337e0a92b111f327909005d1eec07c95f936
3+
size 65764892
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:d560f59c57fa4784771ba520b5b2f380097d7b2210e6c8b02ca203c2e9784f8a
3+
size 47945269

0 commit comments

Comments
 (0)