NVIDIA
diff --git a/‎CHANGELOG-Windows.rst‎
Lines changed: 8 additions & 0 deletions b/‎CHANGELOG-Windows.rst‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/source/getting_started/windows/_installation_with_olive.rst‎
Lines changed: 9 additions & 3 deletions b/‎docs/source/getting_started/windows/_installation_with_olive.rst‎
Lines changed: 9 additions & 3 deletions
diff --git a/‎examples/windows/README.md‎
Lines changed: 2 additions & 2 deletions b/‎examples/windows/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/windows/accuracy_benchmark/README.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/windows/accuracy_benchmark/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/windows/onnx_ptq/genai_llm/README.md‎
Lines changed: 29 additions & 7 deletions b/‎examples/windows/onnx_ptq/genai_llm/README.md‎
Lines changed: 29 additions & 7 deletions
@@ -2,6 +2,14 @@
 Model Optimizer Changelog (Windows)
 ===================================
 
+0.33 (2025-09-03)
+^^^^^^^^^^^^^^^^^
+
+**New Features**
+
+- TensorRT Model Optimizer for Windows now supports `NvTensorRtRtx <https://onnxruntime.ai/docs/execution-providers/TensorRTRTX-ExecutionProvider.html>`_ execution-provider.
+
+
 0.27 (2025-04-30)
 ^^^^^^^^^^^^^^^^^
 
 
@@ -24,8 +24,9 @@ Setup Steps for Olive with ModelOpt-Windows
             $ pip install onnxruntime-genai-directml>=0.4.0
             $ pip install onnxruntime-directml==1.20.0
 
+   - Above onnxruntime and onnxruntime-genai packages enable Olive workflow with DirectML Execution-Provider (EP). To use other EPs, install corresponding packages.
 
-     Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the :ref:`Install-Page-Standalone-Windows`.
+   - Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the :ref:`Install-Page-Standalone-Windows`.
 
 **2. Configure Olive for TensorRT Model Optimizer – Windows**
 
@@ -36,7 +37,11 @@ Setup Steps for Olive with ModelOpt-Windows
 
    - **Add Other Passes:** Add additional passes to the Olive configuration file as needed for the desired Olive workflow of your input model. [Refer `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_ Olive example]
 
-**4. Run the Optimization**
+**4. Install other dependencies**
+
+   - Install other requirements as needed by the Olive scripts and config.
+
+**5. Run the Optimization**
 
    - **Execute Optimization:** To start the optimization process, run the following commands:
 
@@ -56,4 +61,5 @@ Setup Steps for Olive with ModelOpt-Windows
 
 **Note**:
 
-#. Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based models in the Olive workflow.
+#. Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow.
+#. To try out different LLMs and EPs in the Olive workflow of ModelOpt-Windows, refer the details provided in `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_ Olive example.
@@ -5,7 +5,7 @@
 #### A Library to Quantize and Compress Deep Learning Models for Optimized Inference on Native Windows RTX GPUs
 
 [![Documentation](https://img.shields.io/badge/Documentation-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-Model-Optimizer/)
-[![version](https://img.shields.io/badge/v0.27.0-orange?label=Release)](https://pypi.org/project/nvidia-modelopt/0.27.0/)
+[![version](https://img.shields.io/badge/v0.33.0-orange?label=Release)](https://pypi.org/project/nvidia-modelopt/)
 [![license](https://img.shields.io/badge/License-Apache%202.0-blue)](../../LICENSE)
 
 [Examples](#examples) |
@@ -59,7 +59,7 @@ pip install onnxruntime-genai-directml>=0.4.0
 pip install onnxruntime-directml==1.20.0
 ```
 
-For more details, please refer to the [detailed installation instructions](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html).
+For more details, please refer to the [detailed installation instructions](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/windows/_installation_for_Windows.html).
 
 ## Techniques
 
 
@@ -29,7 +29,7 @@ This repository provides scripts, popular third-party benchmarks, and instructio
 
 The MMLU benchmark assesses LLM performance across a wide range of tasks, producing a score between 0 and 1, where a higher score indicates better accuracy. Please refer the [MMLU Paper](https://arxiv.org/abs/2009.03300) for more details on this.
 
-### MMLU Setup
+### Setup
 
 The table below lists the setup steps to prepare your environment for evaluating LLMs using the MMLU benchmark.
 
 
@@ -8,7 +8,7 @@ This example takes an ONNX model as input, along with the necessary quantization
 
 ### Setup
 
-1. Install ModelOpt-Windows. Refer [installation instructions](../README.md).
+1. Install ModelOpt-Windows. Refer [installation instructions](../../README.md).
 
 1. Install required dependencies
 
@@ -43,30 +43,44 @@ The table below lists key command-line arguments of the ONNX PTQ example script.
 |---------------------------|------------------------------------------------------|-------------------------------------------------------------|
 | `--calib_size` | 32 (default), 64, 128 | Specifies the calibration size. |
 | `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
-| `--algo` | awq_lite (default), awq_clip | Select the quantization algorithm. |
+| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
 | `--onnx_path` | input .onnx file path | Path to the input ONNX model. |
 | `--output_path` | output .onnx file path | Path to save the quantized ONNX model. |
-| `--use_zero_point` | True, False (default) | Enable zero-point based quantization. |
+| `--use_zero_point` | Default: zero-point is disabled | Use this option to enable zero-point based quantization. |
 | `--block-size` | 32, 64, 128 (default) | Block size for AWQ. |
 | `--awqlite_alpha_step` | 0.1 (default) | Step-size for AWQ scale search, user-defined |
-| `--awqlite_run_per_subgraph` | True, False (default) | If True, runs AWQ scale search at the subgraph level |
-| `--awqlite_fuse_nodes` | True (default), False | If True, fuses input scales in parent nodes. |
+| `--awqlite_run_per_subgraph` | Default: run_per_subgraph is disabled | Use this option to run AWQ scale search at the subgraph level |
+| `--awqlite_disable_fuse_nodes` | Default: fuse_nodes enabled | Use this option to disable fusion of input scales in parent nodes. |
 | `--awqclip_alpha_step` | 0.05 (default) | Step-size for AWQ weight clipping, user-defined |
 | `--awqclip_alpha_min` | 0.5 (default) | Minimum AWQ weight-clipping threshold, user-defined |
 | `--awqclip_bsz_col` | 1024 (default) | Chunk size in columns during weight clipping, user-defined |
-| `--calibration_eps` | dml, cuda, cpu, NvTensorRtRtx (default: [dml,cpu]) | List of calibration endpoints. |
+| `--calibration_eps` | dml, cuda, cpu, NvTensorRtRtx (default: [dml,cpu]) | List of execution-providers to use for session run during calibration |
+| `--no_position_ids` | Default: position_ids input enabled | Use this option to disable position_ids input in calibration data|
 
 Run the following command to view all available parameters in the script:
 
 ```bash
 python quantize.py --help
 ```
 
+Note:
+
+1. For the `algo` argument, we have following options to choose form: awq_lite, awq_clip, rtn, rtn_dq.
+   - The 'awq_lite' option does core AWQ scale search and INT4 quantization.
+   - The 'awq_clip' option primarily does weight clipping and INT4 quantization.
+   - The 'rtn' option does INT4 RTN quantization with Q->DQ nodes for weights.
+   - The 'rtn_dq' option does INT4 RTN quantization with only DQ nodes for weights.
+1. RTN algorithm doesn't use calibration-data.
+1. If needed for the input base model, use `--no_position_ids` command-line option to disable
+   generating position_ids calibration input. The GenAI built LLM models produced with DML EP has
+   position_ids input but ones produced with CUDA EP, NvTensorRtRtx EP don't have position_ids input.
+   Use `--help` or command-line options table above to inspect default values.
+
 Please refer to `quantize.py` for further details on command-line parameters.
 
 ### Evaluate the Quantized Model
 
-To evaluate the quantized model, please refer to the [accuracy benchmarking](../accuracy_benchmark/README.md) and [onnxruntime-genai performance benchmarking](https://github.com/microsoft/onnxruntime-genai/tree/main/benchmark/python).
+To evaluate the quantized model, please refer to the [accuracy benchmarking](../../accuracy_benchmark/README.md) and [onnxruntime-genai performance benchmarking](https://github.com/microsoft/onnxruntime-genai/tree/main/benchmark/python).
 
 ### Deployment
 
@@ -107,3 +121,11 @@ Please refer to [support matrix](https://nvidia.github.io/TensorRT-Model-Optimiz
 1. **Check Input Model**
 
    During INT4 AWQ execution, the input onnx model (one mentioned in `--onnx_path` argument) will be run with onnxruntime (ORT) for calibration (using ORT EP mentioned in `--calibration_eps` argument). So, make sure that input onnx model is running fine with the specified ORT EP.
+
+1. **Config availability for calibration with NvTensorRtRtx EP**
+
+   Note that while using `NvTensorRtRtx` for INT4 AWQ quantization, profile (min/max/opt ranges) of input-shapes of the model is created internally using the details from the model's config (e.g. config.json in HuggingFace model card). This input-shapes-profile is used during onnxruntime session creation. Make sure that config.json is available in the model-directory if `model_name` is a local model path (instead of HuggingFace model-name).
+
+1. **Error - Invalid Position-IDs input to the ONNX model**
+
+   The ONNX models produced using ONNX GenerativeAI (GenAI) have different IO bindings for models produced using different execution-providers (EPs). For instance, model built with DML EP has position-ids input in the ONNX model but models builts using CUDA EP or NvTensorRtRtx EP don't have position-ids inputs. So, if base model requires, use `no_position_ids` command-line argument for disabling position_ids calibration input or set "add_position_ids" variable to `False` value (hard-code) in the quantize script if required.