You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Code TensorRT example has been reused for the MIGraphX Execution Provider to showcase how simple it is to convert over CUDA and TensorRT code into MIGraphX and ROCm within Onnxruntime. Just change the desired Execution Provider and install the proper requirements (ROCm and MIGraphX) and run your script as you did with CUDA.
22
+
Code from the TensorRT example has been reused for the MIGraphX Execution Provider to showcase how simple it is to convert over CUDA and TensorRT code into MIGraphX and ROCm within Onnxruntime. Just change the desired Execution Provider and install the proper requirements (ROCm and MIGraphX) and run your script as you did with CUDA.
24
23
25
24
We've also added a few more input args to the script to help finetune the inference you'd like to run. Feel free to use the --help when running
--fp16 Perform fp16 quantization on the model before running inference
33
35
--int8 Perform int8 quantization on the model before running inference
34
-
--model Path to the desired model to be run. Default ins ./model.onnx
36
+
--ep EP The desired execution provider [MIGraphX, ROCm] are the options; Default is MIGraphX
37
+
--cal_ep CAL_EP The desired execution provider [MIGraphX, ROCm, CPU] for int8 quantization; Default is
38
+
MIGraphX
39
+
--model MODEL Path to the desired model to be run. Default ins ./model.onnx
40
+
--vocab VOCAB Path to the vocab of the model. Default is ./squad/vocab.txt
41
+
--token TOKEN Path to the tokenized inputs. Default is None and will be taken from vocab file
35
42
--version VERSION Squad dataset version. Default is 1.1. Choices are 1.1 and 2.0
43
+
--no_eval Turn off evaluate output result for f1 and exact match score. Default False
44
+
--ort_verbose Turn on onnxruntime verbose flags
45
+
--ort_quant Turn on Onnxruntime Quantizer instead of MIGraphX Quantizer
46
+
--save_load Turn on Onnxruntime Model save loading to speed up inference
36
47
--batch BATCH Batch size per inference
37
48
--seq_len SEQ_LEN sequence length of the model. Default is 384
49
+
--query_len QUERY_LEN
50
+
max querry length of the model. Default is 64
38
51
--doc_stride DOC_STRIDE
39
52
document stride of the model. Default is 128
40
53
--cal_num CAL_NUM Number of calibration for QDQ Quantiation in int8. Default is 100
54
+
--samples SAMPLES Number of samples to test with. Default is 0 (All the samples in the dataset)
41
55
--verbose Show verbose output
42
56
43
57
@@ -60,7 +74,7 @@ In order to get best performance from MIGraphX, there are some optimizations bei
60
74
Once QDQ model generation is done, the qdq_model.onnx will be saved.
61
75
62
76
## QDQ Model Evaluation
63
-
Remember to set env variables, ORT_TENSORRT_FP16_ENABLE=1 and ORT_TENSORRT_INT8_ENABLE=1, to run QDQ model.
77
+
Remember to set env variables, ORT_MIGRAPHX_FP16_ENABLE=1 and ORT_MIGRAPHX_INT8_ENABLE=1, to run QDQ model.
64
78
We use evaluation tool from Nvidia TensorRT demo BERT repo to evaluate the result based on SQuAD v1.0 and SQuAD v2.0.
65
79
66
80
Note: The input names of model in the e2e example is based on Hugging Face Model's naming. If model input names are not correct in your model, please modify the code ort_session.run(["output_start_logits","output_end_logits"], inputs) in the example.
0 commit comments