Update README

TedThemistokleous · TedThemistokleous · commit a0261984a5fe · 2024-10-23T13:11:25.000Z
diff --git a/quantization/nlp/bert/migraphx/README.md b/quantization/nlp/bert/migraphx/README.md
@@ -9,7 +9,6 @@ The **e2e_migraphx_bert_example.py** is an end-to-end example for you to referen
 
 ## Requirements
 * Please build from latest ONNX Runtime source (see [here](https://onnxruntime.ai/docs/build/eps.html#migraphx)) for now.
-We plan to include TensorRT QDQ support later in ONNX Runtime 1.11 for [ORT Python GPU Package](https://pypi.org/project/onnxruntime-gpu/)
 * MIGraphX 2.8 and above
 * ROCm 5.7 and above (For calibration data)
 * Python 3+
@@ -20,24 +19,39 @@ We plan to include TensorRT QDQ support later in ONNX Runtime 1.11 for [ORT Pyth
 Some utility functions for dataset processing, data reader and evaluation are from Nvidia TensorRT demo BERT repo,
 https://github.com/NVIDIA/TensorRT/tree/master/demo/BERT
 
-Code TensorRT example has been reused for the MIGraphX Execution Provider to showcase how simple it is to convert over CUDA and TensorRT code into MIGraphX and ROCm within Onnxruntime. Just change the desired Execution Provider and install the proper requirements (ROCm and MIGraphX) and run your script as you did with CUDA.
+Code from the TensorRT example has been reused for the MIGraphX Execution Provider to showcase how simple it is to convert over CUDA and TensorRT code into MIGraphX and ROCm within Onnxruntime. Just change the desired Execution Provider and install the proper requirements (ROCm and MIGraphX) and run your script as you did with CUDA.
 
 We've also added a few more input args to the script to help finetune the inference you'd like to run. Feel free to use the --help when running
 
-
-usage: e2e_migraphx_bert_example.py [-h] [--fp16] [--int8] [--model] [--version VERSION] [--batch BATCH] [--seq_len SEQ_LEN] [--doc_stride DOC_STRIDE] [--cal_num CAL_NUM] [--verbose]
+usage: e2e_migraphx_bert_example.py [-h] [--fp16] [--int8] [--ep EP] [--cal_ep CAL_EP] [--model MODEL]
+                                    [--vocab VOCAB] [--token TOKEN] [--version VERSION] [--no_eval]
+                                    [--ort_verbose] [--ort_quant] [--save_load] [--batch BATCH]
+                                    [--seq_len SEQ_LEN] [--query_len QUERY_LEN] [--doc_stride DOC_STRIDE]
+                                    [--cal_num CAL_NUM] [--samples SAMPLES] [--verbose]
 
 options:
   -h, --help            show this help message and exit
   --fp16                Perform fp16 quantization on the model before running inference
   --int8                Perform int8 quantization on the model before running inference
-  --model               Path to the desired model to be run. Default ins ./model.onnx
+  --ep EP               The desired execution provider [MIGraphX, ROCm] are the options; Default is MIGraphX
+  --cal_ep CAL_EP       The desired execution provider [MIGraphX, ROCm, CPU] for int8 quantization; Default is
+                        MIGraphX
+  --model MODEL         Path to the desired model to be run. Default ins ./model.onnx
+  --vocab VOCAB         Path to the vocab of the model. Default is ./squad/vocab.txt
+  --token TOKEN         Path to the tokenized inputs. Default is None and will be taken from vocab file
   --version VERSION     Squad dataset version. Default is 1.1. Choices are 1.1 and 2.0
+  --no_eval             Turn off evaluate output result for f1 and exact match score. Default False
+  --ort_verbose         Turn on onnxruntime verbose flags
+  --ort_quant           Turn on Onnxruntime Quantizer instead of MIGraphX Quantizer
+  --save_load           Turn on Onnxruntime Model save loading to speed up inference
   --batch BATCH         Batch size per inference
   --seq_len SEQ_LEN     sequence length of the model. Default is 384
+  --query_len QUERY_LEN
+                        max querry length of the model. Default is 64
   --doc_stride DOC_STRIDE
                         document stride of the model. Default is 128
   --cal_num CAL_NUM     Number of calibration for QDQ Quantiation in int8. Default is 100
+  --samples SAMPLES     Number of samples to test with. Default is 0 (All the samples in the dataset)
   --verbose             Show verbose output
 
 
@@ -60,7 +74,7 @@ In order to get best performance from MIGraphX, there are some optimizations bei
 Once QDQ model generation is done, the qdq_model.onnx will be saved.
 
 ## QDQ Model Evaluation
-Remember to set env variables, ORT_TENSORRT_FP16_ENABLE=1 and ORT_TENSORRT_INT8_ENABLE=1, to run QDQ model.
+Remember to set env variables, ORT_MIGRAPHX_FP16_ENABLE=1 and ORT_MIGRAPHX_INT8_ENABLE=1, to run QDQ model.
 We use evaluation tool from Nvidia TensorRT demo BERT repo to evaluate the result based on SQuAD v1.0 and SQuAD v2.0.
 
 Note: The input names of model in the e2e example is based on Hugging Face Model's naming. If model input names are not correct in your model, please modify the code ort_session.run(["output_start_logits","output_end_logits"], inputs) in the example.