Skip to content

Commit dd39595

Browse files
committed
addressed PR comments
1 parent cf054d2 commit dd39595

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

docs/source/en/quantization/modelopt.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
1111

1212
# NVIDIA ModelOpt
1313

14-
[nvidia_modelopt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
14+
[NVIDIA-ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
1515

1616
Before you begin, make sure you have nvidia_modelopt installed.
1717

@@ -53,6 +53,12 @@ image = pipe(
5353
image.save("output.png")
5454
```
5555

56+
> **Note:**
57+
>
58+
> The quantization methods in NVIDIA-ModelOpt are designed to reduce the memory footprint of model weights using various QAT (Quantization-Aware Training) and PTQ (Post-Training Quantization) techniques while maintaining model performance. However, the actual performance gain during inference depends on the deployment framework (e.g., TRT-LLM, TensorRT) and the specific hardware configuration.
59+
>
60+
> More details can be found [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples).
61+
5662
## NVIDIAModelOptConfig
5763

5864
The `NVIDIAModelOptConfig` class accepts three parameters:

0 commit comments

Comments
 (0)