You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG-Windows.rst
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,14 @@
2
2
Model Optimizer Changelog (Windows)
3
3
===================================
4
4
5
+
0.33 (2025-09-03)
6
+
^^^^^^^^^^^^^^^^^
7
+
8
+
**New Features**
9
+
10
+
- TensorRT Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
Copy file name to clipboardExpand all lines: docs/source/getting_started/windows/_installation_with_olive.rst
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,8 +24,9 @@ Setup Steps for Olive with ModelOpt-Windows
24
24
$ pip install onnxruntime-genai-directml>=0.4.0
25
25
$ pip install onnxruntime-directml==1.20.0
26
26
27
+
- Above onnxruntime and onnxruntime-genai packages enable Olive workflow with DirectML Execution-Provider (EP). To use other EPs, install corresponding packages.
27
28
28
-
Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the :ref:`Install-Page-Standalone-Windows`.
29
+
- Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the :ref:`Install-Page-Standalone-Windows`.
29
30
30
31
**2. Configure Olive for TensorRT Model Optimizer – Windows**
31
32
@@ -36,7 +37,11 @@ Setup Steps for Olive with ModelOpt-Windows
36
37
37
38
- **Add Other Passes:** Add additional passes to the Olive configuration file as needed for the desired Olive workflow of your input model. [Refer `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_ Olive example]
38
39
39
-
**4. Run the Optimization**
40
+
**4. Install other dependencies**
41
+
42
+
- Install other requirements as needed by the Olive scripts and config.
43
+
44
+
**5. Run the Optimization**
40
45
41
46
- **Execute Optimization:** To start the optimization process, run the following commands:
42
47
@@ -56,4 +61,5 @@ Setup Steps for Olive with ModelOpt-Windows
56
61
57
62
**Note**:
58
63
59
-
#. Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based models in the Olive workflow.
64
+
#. Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow.
65
+
#. To try out different LLMs and EPs in the Olive workflow of ModelOpt-Windows, refer the details provided in `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_ Olive example.
For more details, please refer to the [detailed installation instructions](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html).
62
+
For more details, please refer to the [detailed installation instructions](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/windows/_installation_for_Windows.html).
Copy file name to clipboardExpand all lines: examples/windows/accuracy_benchmark/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ This repository provides scripts, popular third-party benchmarks, and instructio
29
29
30
30
The MMLU benchmark assesses LLM performance across a wide range of tasks, producing a score between 0 and 1, where a higher score indicates better accuracy. Please refer the [MMLU Paper](https://arxiv.org/abs/2009.03300) for more details on this.
31
31
32
-
### MMLU Setup
32
+
### Setup
33
33
34
34
The table below lists the setup steps to prepare your environment for evaluating LLMs using the MMLU benchmark.
|`--awqclip_bsz_col`| 1024 (default) | Chunk size in columns during weight clipping, user-defined |
57
-
|`--calibration_eps`| dml, cuda, cpu, NvTensorRtRtx (default: [dml,cpu]) | List of calibration endpoints. |
57
+
|`--calibration_eps`| dml, cuda, cpu, NvTensorRtRtx (default: [dml,cpu]) | List of execution-providers to use for session run during calibration |
58
+
|`--no_position_ids`| Default: position_ids input enabled | Use this option to disable position_ids input in calibration data|
58
59
59
60
Run the following command to view all available parameters in the script:
60
61
61
62
```bash
62
63
python quantize.py --help
63
64
```
64
65
66
+
Note:
67
+
68
+
1. For the `algo` argument, we have following options to choose form: awq_lite, awq_clip, rtn, rtn_dq.
69
+
- The 'awq_lite' option does core AWQ scale search and INT4 quantization.
70
+
- The 'awq_clip' option primarily does weight clipping and INT4 quantization.
71
+
- The 'rtn' option does INT4 RTN quantization with Q->DQ nodes for weights.
72
+
- The 'rtn_dq' option does INT4 RTN quantization with only DQ nodes for weights.
73
+
1. RTN algorithm doesn't use calibration-data.
74
+
1. If needed for the input base model, use `--no_position_ids` command-line option to disable
75
+
generating position_ids calibration input. The GenAI built LLM models produced with DML EP has
76
+
position_ids input but ones produced with CUDA EP, NvTensorRtRtx EP don't have position_ids input.
77
+
Use `--help` or command-line options table above to inspect default values.
78
+
65
79
Please refer to `quantize.py` for further details on command-line parameters.
66
80
67
81
### Evaluate the Quantized Model
68
82
69
-
To evaluate the quantized model, please refer to the [accuracy benchmarking](../accuracy_benchmark/README.md) and [onnxruntime-genai performance benchmarking](https://github.com/microsoft/onnxruntime-genai/tree/main/benchmark/python).
83
+
To evaluate the quantized model, please refer to the [accuracy benchmarking](../../accuracy_benchmark/README.md) and [onnxruntime-genai performance benchmarking](https://github.com/microsoft/onnxruntime-genai/tree/main/benchmark/python).
70
84
71
85
### Deployment
72
86
@@ -107,3 +121,11 @@ Please refer to [support matrix](https://nvidia.github.io/TensorRT-Model-Optimiz
107
121
1.**Check Input Model**
108
122
109
123
During INT4 AWQ execution, the input onnx model (one mentioned in `--onnx_path` argument) will be run with onnxruntime (ORT) for calibration (using ORT EP mentioned in `--calibration_eps` argument). So, make sure that input onnx model is running fine with the specified ORT EP.
124
+
125
+
1.**Config availability for calibration with NvTensorRtRtx EP**
126
+
127
+
Note that while using `NvTensorRtRtx` for INT4 AWQ quantization, profile (min/max/opt ranges) of input-shapes of the model is created internally using the details from the model's config (e.g. config.json in HuggingFace model card). This input-shapes-profile is used during onnxruntime session creation. Make sure that config.json is available in the model-directory if `model_name` is a local model path (instead of HuggingFace model-name).
128
+
129
+
1.**Error - Invalid Position-IDs input to the ONNX model**
130
+
131
+
The ONNX models produced using ONNX GenerativeAI (GenAI) have different IO bindings for models produced using different execution-providers (EPs). For instance, model built with DML EP has position-ids input in the ONNX model but models builts using CUDA EP or NvTensorRtRtx EP don't have position-ids inputs. So, if base model requires, use `no_position_ids` command-line argument for disabling position_ids calibration input or set "add_position_ids" variable to `False` value (hard-code) in the quantize script if required.
0 commit comments