Skip to content

Commit 2bca299

Browse files
Update windows readme
1 parent 9affa87 commit 2bca299

File tree

4 files changed

+17
-11
lines changed

4 files changed

+17
-11
lines changed

windows/Benchmark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TensorRT Model Optimizer - Windows: Benchmark Reference
22

3-
This document provides a summary of the performance and accuracy measurements of [TensorRT Model Optimizer - Windows](https://github.com/NVIDIA/TensorRT-Model-Optimizer) for several popular models. The benchmark results in the following tables serve as reference points and **should not be viewed as the maximum performance** achievable by Model Optimizer - Windows.
3+
This document provides a summary of the performance and accuracy measurements of [TensorRT Model Optimizer - Windows](./README.md) for several popular models. The benchmark results in the following tables serve as reference points and **should not be viewed as the maximum performance** achievable by Model Optimizer - Windows.
44

55
### 1 Performance And Accuracy Comparison: ONNX INT4 vs ONNX FP16 Models
66

windows/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
[![Documentation](https://img.shields.io/badge/Documentation-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-Model-Optimizer/)
88
[![version](https://img.shields.io/pypi/v/nvidia-modelopt?label=Release)](https://pypi.org/project/nvidia-modelopt/)
9-
[![license](https://img.shields.io/badge/License-MIT-blue)](./LICENSE)
9+
[![license](https://img.shields.io/badge/License-MIT-blue)](../LICENSE)
1010

1111
[Examples](#examples) |
1212
[Benchmark Results](#benchmark-results)
@@ -42,10 +42,10 @@ ModelOpt-Windows can be installed either as a standalone toolkit or through Micr
4242

4343
### Standalone Toolkit Installation (with CUDA 12.x)
4444

45-
To install ModelOpt-Windows as a standalone toolkit with CUDA 12.x support, run the following commands:
45+
To install ModelOpt-Windows as a standalone toolkit on CUDA 12.x systems, run the following commands:
4646

4747
```bash
48-
pip install nvidia-modelopt[onnx]~=0.19.0 --extra-index-url https://pypi.nvidia.com
48+
pip install nvidia-modelopt[onnx] --extra-index-url https://pypi.nvidia.com
4949
pip install cupy-cuda12x
5050
```
5151

@@ -71,12 +71,12 @@ For more details, please refer to the [detailed quantization guide](https://nvid
7171

7272
## Examples
7373

74-
- [PTQ for LLMs](./onnx_ptq/README.md) covers how to use Post-training quantization (PTQ) and deployment with DirectML
74+
- [PTQ for LLMs](./onnx_ptq/README.md) covers how to use ONNX Post-Training Quantization (PTQ) and deployment with DirectML
7575
- [MMLU Benchmark](./accuracy_benchmark/README.md) provides an example script for MMLU benchmark and demonstrates how to run it with various popular backends like DirectML, TensorRT-LLM\* and model formats like ONNX and PyTorch\*.
7676

7777
## Support Matrix
7878

79-
Please refer to [feature support matrix](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/windows/_feature_support_matrix.html) for a full list of supported features.
79+
Please refer to [support matrix](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/windows/_feature_support_matrix.html) for a full list of supported features and models.
8080

8181
## Benchmark Results
8282

windows/accuracy_benchmark/README.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,6 @@
66
- [MMLU (Massive Multitask Language Understanding)](#mmlu-massive-multitask-language-understanding)
77
- [Setup](#setup)
88
- [Evaluation Methods](#evaluation-methods)
9-
- [1. Evaluate with ORT-DML using GenAI APIs](#1-evaluate-with-ort-dml-using-genai-apis)
10-
- [2. Evaluate with ORT-DML, CUDA, and CPU using Native ORT Path](#2-evaluate-with-ort-dml-cuda-and-cpu-using-native-ort-path)
11-
- [3. Evaluate the PyTorch Model of HF Weights](#3-evaluate-the-pytorch-model-of-hf-weights)
12-
- [4. Evaluate the TensorRT-LLM](#4-evaluate-the-tensorrt-llm)
13-
- [5. Evaluate the PyTorch Model Quantized with AutoAWQ](#5-evaluate-the-pytorch-model-quantized-with-autoawq)
149

1510
## Overview
1611

windows/onnx_ptq/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,17 @@ Refer to the following example scripts and tutorials for deployment:
7777
1. [ORT GenAI examples](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python)
7878
1. [ONNX Runtime documentation](https://onnxruntime.ai/docs/api/python/)
7979

80+
### Model Support Matrix
81+
82+
Model | int4_awq
83+
--- | ---
84+
Llama3.1-8B-Instruct | Yes
85+
Phi3.5-mini-Instruct | Yes
86+
Mistral-7B-Instruct-v0.3 | Yes
87+
Llama3.2-3B-Instruct| Yes
88+
Gemma-2b-it | Yes
89+
Nemotron Mini 4B Instruct | Yes
90+
8091
### Troubleshoot
8192

8293
1. **Configure Directories**

0 commit comments

Comments
 (0)