Skip to content

Commit a026198

Browse files
Update README
1 parent a258158 commit a026198

File tree

1 file changed

+20
-6
lines changed

1 file changed

+20
-6
lines changed

quantization/nlp/bert/migraphx/README.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ The **e2e_migraphx_bert_example.py** is an end-to-end example for you to referen
99

1010
## Requirements
1111
* Please build from latest ONNX Runtime source (see [here](https://onnxruntime.ai/docs/build/eps.html#migraphx)) for now.
12-
We plan to include TensorRT QDQ support later in ONNX Runtime 1.11 for [ORT Python GPU Package](https://pypi.org/project/onnxruntime-gpu/)
1312
* MIGraphX 2.8 and above
1413
* ROCm 5.7 and above (For calibration data)
1514
* Python 3+
@@ -20,24 +19,39 @@ We plan to include TensorRT QDQ support later in ONNX Runtime 1.11 for [ORT Pyth
2019
Some utility functions for dataset processing, data reader and evaluation are from Nvidia TensorRT demo BERT repo,
2120
https://github.com/NVIDIA/TensorRT/tree/master/demo/BERT
2221

23-
Code TensorRT example has been reused for the MIGraphX Execution Provider to showcase how simple it is to convert over CUDA and TensorRT code into MIGraphX and ROCm within Onnxruntime. Just change the desired Execution Provider and install the proper requirements (ROCm and MIGraphX) and run your script as you did with CUDA.
22+
Code from the TensorRT example has been reused for the MIGraphX Execution Provider to showcase how simple it is to convert over CUDA and TensorRT code into MIGraphX and ROCm within Onnxruntime. Just change the desired Execution Provider and install the proper requirements (ROCm and MIGraphX) and run your script as you did with CUDA.
2423

2524
We've also added a few more input args to the script to help finetune the inference you'd like to run. Feel free to use the --help when running
2625

27-
28-
usage: e2e_migraphx_bert_example.py [-h] [--fp16] [--int8] [--model] [--version VERSION] [--batch BATCH] [--seq_len SEQ_LEN] [--doc_stride DOC_STRIDE] [--cal_num CAL_NUM] [--verbose]
26+
usage: e2e_migraphx_bert_example.py [-h] [--fp16] [--int8] [--ep EP] [--cal_ep CAL_EP] [--model MODEL]
27+
[--vocab VOCAB] [--token TOKEN] [--version VERSION] [--no_eval]
28+
[--ort_verbose] [--ort_quant] [--save_load] [--batch BATCH]
29+
[--seq_len SEQ_LEN] [--query_len QUERY_LEN] [--doc_stride DOC_STRIDE]
30+
[--cal_num CAL_NUM] [--samples SAMPLES] [--verbose]
2931

3032
options:
3133
-h, --help show this help message and exit
3234
--fp16 Perform fp16 quantization on the model before running inference
3335
--int8 Perform int8 quantization on the model before running inference
34-
--model Path to the desired model to be run. Default ins ./model.onnx
36+
--ep EP The desired execution provider [MIGraphX, ROCm] are the options; Default is MIGraphX
37+
--cal_ep CAL_EP The desired execution provider [MIGraphX, ROCm, CPU] for int8 quantization; Default is
38+
MIGraphX
39+
--model MODEL Path to the desired model to be run. Default ins ./model.onnx
40+
--vocab VOCAB Path to the vocab of the model. Default is ./squad/vocab.txt
41+
--token TOKEN Path to the tokenized inputs. Default is None and will be taken from vocab file
3542
--version VERSION Squad dataset version. Default is 1.1. Choices are 1.1 and 2.0
43+
--no_eval Turn off evaluate output result for f1 and exact match score. Default False
44+
--ort_verbose Turn on onnxruntime verbose flags
45+
--ort_quant Turn on Onnxruntime Quantizer instead of MIGraphX Quantizer
46+
--save_load Turn on Onnxruntime Model save loading to speed up inference
3647
--batch BATCH Batch size per inference
3748
--seq_len SEQ_LEN sequence length of the model. Default is 384
49+
--query_len QUERY_LEN
50+
max querry length of the model. Default is 64
3851
--doc_stride DOC_STRIDE
3952
document stride of the model. Default is 128
4053
--cal_num CAL_NUM Number of calibration for QDQ Quantiation in int8. Default is 100
54+
--samples SAMPLES Number of samples to test with. Default is 0 (All the samples in the dataset)
4155
--verbose Show verbose output
4256

4357

@@ -60,7 +74,7 @@ In order to get best performance from MIGraphX, there are some optimizations bei
6074
Once QDQ model generation is done, the qdq_model.onnx will be saved.
6175

6276
## QDQ Model Evaluation
63-
Remember to set env variables, ORT_TENSORRT_FP16_ENABLE=1 and ORT_TENSORRT_INT8_ENABLE=1, to run QDQ model.
77+
Remember to set env variables, ORT_MIGRAPHX_FP16_ENABLE=1 and ORT_MIGRAPHX_INT8_ENABLE=1, to run QDQ model.
6478
We use evaluation tool from Nvidia TensorRT demo BERT repo to evaluate the result based on SQuAD v1.0 and SQuAD v2.0.
6579

6680
Note: The input names of model in the e2e example is based on Hugging Face Model's naming. If model input names are not correct in your model, please modify the code ort_session.run(["output_start_logits","output_end_logits"], inputs) in the example.

0 commit comments

Comments
 (0)