Skip to content

Commit 7af33d2

Browse files
Update for 0.31.0 release
1 parent 3039f76 commit 7af33d2

File tree

378 files changed

+10128
-5829
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

378 files changed

+10128
-5829
lines changed

.dockerignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ coverage.xml
2828
.pytest_cache/
2929

3030
# Sphinx documentation
31-
docs/_build
3231
docs/build
3332
docs/source/reference/generated
3433

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ coverage.xml
2222
.pytest_cache/
2323

2424
# Sphinx documentation
25-
docs/_build
2625
docs/build
2726
docs/source/reference/generated
2827

.pre-commit-config.yaml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ repos:
1212
args: [--maxkb=500, --enforce-all]
1313
exclude: >
1414
(?x)^(
15+
internal/experimental/GTC_2024_demo/SDXL_PTQ.ipynb|
16+
internal/experimental/vae_training/.*|
17+
internal/examples/diffusers/quantization/assets/.*.png|
1518
examples/diffusers/quantization/assets/.*.png|
1619
examples/diffusers/cache_diffusion/assets/.*.png|
1720
)$
@@ -22,6 +25,7 @@ repos:
2225
- id: check-toml
2326
- id: check-yaml
2427
args: [--allow-multiple-documents]
28+
exclude: ^internal/.gitlab/ # !references are not supported
2529
- id: debug-statements
2630
- id: end-of-file-fixer
2731
- id: mixed-line-ending
@@ -36,7 +40,7 @@ repos:
3640
exclude: ^.github/
3741

3842
- repo: https://github.com/astral-sh/ruff-pre-commit
39-
rev: v0.11.6
43+
rev: v0.11.9
4044
hooks:
4145
- id: ruff
4246
args: [--fix, --exit-non-zero-on-fix]
@@ -110,7 +114,15 @@ repos:
110114
examples/llm_sparsity/finetune.py|
111115
examples/speculative_decoding/main.py|
112116
examples/speculative_decoding/medusa_utils.py|
113-
examples/speculative_decoding/vllm_generate.py|
117+
examples/speculative_decoding/server_generate.py|
118+
internal/examples/diffusers/cache_diffusion/cache_diffusion/module.py|
119+
internal/examples/diffusers/cache_diffusion/pipeline/models/sdxl.py|
120+
internal/examples/mlperf/infer.py|
121+
internal/examples/onnx_ptq/quantize_llama.py|
122+
internal/examples/torchvision/modelopt_torchvision.py|
123+
internal/examples/vlm_eval/data_utils.py|
124+
internal/examples/vlm_eval/eval_utils.py|
125+
internal/examples/vlm_eval/mmmu.py|
114126
)$
115127
116128
# Default hook for Apache 2.0 in core c/c++/cuda files
@@ -155,3 +167,4 @@ repos:
155167
- id: lychee
156168
args: ["--no-progress", "--exclude-loopback"]
157169
stages: [manual] # Only run with `pre-commit run --all-files --hook-stage manual lychee`
170+
exclude: internal/

CHANGELOG.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,34 @@
11
Model Optimizer Changelog (Linux)
22
=================================
33

4+
0.31 (2025-06-04)
5+
^^^^^^^^^^^^^^^^^
6+
7+
**Backward Breaking Changes**
8+
9+
- NeMo and Megatron-LM distributed checkpoint (``torch-dist``) stored with legacy version can no longer be loaded. The remedy is to load the legacy distributed checkpoint with 0.29 and store a ``torch`` checkpoint and resume with 0.31 to convert to a new format. The following changes only apply to storing and resuming distributed checkpoint.
10+
- ``quantizer_state`` of :class:`TensorQuantizer <modelopt.torch.quantization.nn.modules.TensorQuantizer>` is now stored in ``extra_state`` of :class:`QuantModule <modelopt.torch.quantization.nn.module.QuantModule>` where it used to be stored in the sharded ``modelopt_state``.
11+
- The dtype and shape of ``amax`` and ``pre_quant_scale`` stored in the distributed checkpoint are now retored. Some dtype and shape are previously changed to make all decoder layers to have homogeneous structure in the checkpoint.
12+
- Togather with megatron.core-0.13, quantized model will store and resume distributed checkpoint in a heterogenous format.
13+
- auto_quantize API now accepts a list of quantization config dicts as the list of quantization choices.
14+
- This API previously accepts a list of strings of quantization format names. It was therefore limited to only pre-defined quantization formats unless through some hacks.
15+
- With this change, now user can easily use their own custom quantization formats for auto_quantize.
16+
- In addition, the ``quantization_formats`` now exclude ``None`` (indicating "do not quantize") as a valid format because the auto_quantize internally always add "do not quantize" as an option anyway.
17+
- Model export config is refactored. The quant config in ``hf_quant_config.json`` is converted and saved to ``config.json``. ``hf_quant_config.json`` will be deprecated soon.
18+
19+
20+
**Deprecations**
21+
22+
- Deprecate ``Python 3.9`` support.
23+
24+
**New Features**
25+
26+
- Upgrade LLM examples to use TensorRT-LLM 0.19.
27+
- Add new model support in the ``llm_ptq`` example: Qwen3 MoE.
28+
- ModelOpt now supports advanced quantization algorithms such as AWQ, SVDQuant and SmoothQuant for cpu-offloaded Huggingface models.
29+
- Add AutoCast tool to convert ONNX models to FP16 or BF16.
30+
- Add ``--low_memory_mode`` flag in the llm_ptq example support to initialize HF models with compressed weights and reduce peak memory of PTQ and quantized checkpoint export.
31+
432
0.29 (2025-05-08)
533
^^^^^^^^^^^^^^^^^
634

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
## Latest News
2020

21+
- [2025/05/14] [NVIDIA TensorRT Unlocks FP4 Image Generation for NVIDIA Blackwell GeForce RTX 50 Series GPUs](https://developer.nvidia.com/blog/nvidia-tensorrt-unlocks-fp4-image-generation-for-nvidia-blackwell-geforce-rtx-50-series-gpus/)
2122
- [2025/04/21] [Adobe optimized deployment using TensorRT-Model-Optimizer + TensorRT leading to a 60% reduction in diffusion latency, a 40% reduction in total cost of ownership](https://developer.nvidia.com/blog/optimizing-transformer-based-diffusion-models-for-video-generation-with-nvidia-tensorrt/)
2223
- [2025/04/05] [NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick](https://developer.nvidia.com/blog/nvidia-accelerates-inference-on-meta-llama-4-scout-and-maverick/). Check out how to quantize Llama4 for deployment acceleration [here](./examples/llm_ptq/README.md#llama-4)
2324
- [2025/03/18] [World's Fastest DeepSeek-R1 Inference with Blackwell FP4 & Increasing Image Generation Efficiency on Blackwell](https://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/)

docker/Dockerfile

Lines changed: 44 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,56 @@
1-
FROM nvidia/cuda:12.8.1-devel-ubuntu22.04
1+
FROM nvcr.io/nvidia/pytorch:25.03-py3
2+
3+
ARG PIP_EXTRA_INDEX_URL="https://pypi.nvidia.com"
4+
ARG TRT_LLM_COMMIT=v0.19.0
5+
ARG REMOVE_TRT_LLM_SRC=1
6+
ARG CUDA_ARCH="89-real;90-real;100-real"
7+
8+
ENV PIP_EXTRA_INDEX_URL=$PIP_EXTRA_INDEX_URL \
9+
PIP_NO_CACHE_DIR=off \
10+
PIP_CONSTRAINT= \
11+
TORCH_CUDA_ARCH_LIST="8.0 8.6 8.7 8.9 9.0 10.0+PTX"
212

313
WORKDIR /workspace
414

5-
RUN apt-get update && \
6-
apt-get -y install python3.10 python3-pip python-is-python3 openmpi-bin libopenmpi-dev libgl1 libglib2.0-0 wget git git-lfs unzip jq cmake vim && \
7-
rm -rf /var/lib/apt/lists/*
15+
# Install TensorRT-LLM from source
16+
RUN --mount=type=ssh,id=nvidia git clone https://github.com/NVIDIA/TensorRT-LLM.git tensorrt-llm \
17+
&& cd tensorrt-llm \
18+
&& git checkout ${TRT_LLM_COMMIT} \
19+
&& git submodule update --init --recursive
820

9-
ARG PIP_EXTRA_INDEX_URL="https://pypi.nvidia.com"
10-
ENV PIP_EXTRA_INDEX_URL=$PIP_EXTRA_INDEX_URL
11-
ENV PIP_NO_CACHE_DIR=off
12-
13-
# Install the latest setuptools using pip
14-
RUN rm -rf /usr/lib/python3/dist-packages/setuptools* && \
15-
pip install --upgrade pip setuptools
16-
17-
# Install TensorRT-LLM
18-
ARG TRT_LLM_VERSION=0.18.1
19-
RUN pip install "tensorrt-llm~=$TRT_LLM_VERSION" -U
20-
RUN git clone --depth 1 --branch "v$TRT_LLM_VERSION" https://github.com/NVIDIA/TensorRT-LLM.git && \
21-
mkdir tensorrt-llm && \
22-
mv TensorRT-LLM/benchmarks/ tensorrt-llm && \
23-
rm -rf TensorRT-LLM
24-
RUN cd /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs && ln -s libnvinfer_plugin_tensorrt_llm.so libnvinfer_plugin_tensorrt_llm.so.10
25-
ENV LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs:$LD_LIBRARY_PATH
21+
# Install required dependencies
22+
RUN bash tensorrt-llm/docker/common/install_base.sh $(python --version 2>&1 | awk '{print $2}')
23+
RUN bash tensorrt-llm/docker/common/install_cmake.sh
24+
RUN bash tensorrt-llm/docker/common/install_mpi4py.sh
25+
RUN bash tensorrt-llm/docker/common/install_tensorrt.sh
26+
RUN bash tensorrt-llm/docker/common/install_cuda_toolkit.sh
27+
28+
RUN cd tensorrt-llm && git lfs install && git lfs pull
29+
30+
RUN cd tensorrt-llm \
31+
&& ./scripts/build_wheel.py --job_count $(nproc) --clean --python_bindings --benchmarks --install --cuda_architecture=${CUDA_ARCH} \
32+
&& git rev-parse --short HEAD > /workspace/tensorrt-llm.commit \
33+
&& chmod -R 777 .
34+
RUN pip install tensorrt-llm/build/tensorrt_llm*.whl
35+
36+
# Remove TensorRT-LLM source code to reduce image size except for benchmarks and examples folders
37+
RUN if [ "$REMOVE_TRT_LLM_SRC" = "1" ]; then \
38+
mkdir -p tensorrt-llm_keep; \
39+
mv tensorrt-llm/benchmarks tensorrt-llm_keep/benchmarks; \
40+
mv tensorrt-llm/examples tensorrt-llm_keep/examples; \
41+
rm -rf tensorrt-llm; \
42+
mv tensorrt-llm_keep tensorrt-llm; \
43+
fi
44+
45+
# Update PATH and LD_LIBRARY_PATH variables for the TensorRT binaries
46+
ENV LD_LIBRARY_PATH="/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:${LD_LIBRARY_PATH}" \
47+
PATH="/usr/local/tensorrt/targets/x86_64-linux-gnu/bin:${PATH}"
2648

2749
# Export the path to 'libcudnn.so.X' needed by 'libonnxruntime_providers_tensorrt.so'
28-
ENV LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH
29-
30-
# Install TensorRT dev environment
31-
ARG TENSORRT_URL=https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.9.0/tars/TensorRT-10.9.0.34.Linux.x86_64-gnu.cuda-12.8.tar.gz
32-
RUN wget -q -O tensorrt.tar.gz $TENSORRT_URL && \
33-
tar -xf tensorrt.tar.gz && \
34-
cp TensorRT-*/bin/trtexec /usr/local/bin && \
35-
cp TensorRT-*/include/* /usr/include/x86_64-linux-gnu && \
36-
python -m pip install TensorRT-*/python/tensorrt-*-cp310-none-linux_x86_64.whl && \
37-
cp -a TensorRT-*/targets/x86_64-linux-gnu/lib/* /usr/local/lib/python3.10/dist-packages/tensorrt_libs && \
38-
rm -rf TensorRT-*.Linux.x86_64-gnu.cuda-*.tar.gz TensorRT-* tensorrt.tar.gz
39-
ENV TRT_LIB_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt_libs
40-
ENV LD_LIBRARY_PATH=$TRT_LIB_PATH:$LD_LIBRARY_PATH
50+
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
4151

4252
# Install modelopt with all optional dependencies and pre-compile CUDA extensions otherwise they take several minutes on every docker run
4353
RUN pip install -U "nvidia-modelopt[all,dev-test]"
44-
ENV TORCH_CUDA_ARCH_LIST="8.0 8.6 8.7 8.9 9.0 10.0+PTX"
4554
RUN python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"
4655

4756
# Find and install requirements.txt files for all examples excluding windows

docs/source/_ext/modelopt_autodoc_pydantic.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,9 @@
1717

1818
import json
1919
import types
20+
from collections.abc import Callable
2021
from contextlib import contextmanager, nullcontext
21-
from typing import Any, Callable
22+
from typing import Any
2223

2324
from sphinx.application import Sphinx
2425
from sphinxcontrib.autodoc_pydantic import __version__
@@ -112,7 +113,7 @@ def add_default_dict(self) -> None:
112113
# create valid rst lines from the config
113114
config_json = json.dumps(config, default=str, indent=3)
114115
lines = [f" {line}" for line in config_json.split("\n")]
115-
lines = [":Default config (JSON):", "", ".. code-block:: json", ""] + lines + [""]
116+
lines = [":Default config (JSON):", "", ".. code-block:: json", "", *lines, ""]
116117

117118
# add lines to autodoc
118119
source_name = self.get_sourcename()

docs/source/_templates/autosummary/module.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@
1010
:toctree:
1111
:recursive:
1212
{% for item in modules %}
13-
{% if '.plugins.' not in item or item == 'modelopt.torch.opt.plugins.huggingface' %}
14-
{{ item }}
13+
{% set full_item = fullname + '.' + item.split('.')[-1] %}
14+
{% if '.plugins.' not in full_item or full_item == 'modelopt.torch.opt.plugins.huggingface' %}
15+
{{ full_item }}
1516
{% endif %}
1617
{%- endfor %}
1718
{% endif %}

docs/source/conf.py

Lines changed: 4 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -32,29 +32,14 @@
3232
# sys.path.insert(0, os.path.abspath('.'))
3333

3434
import os
35-
import shutil
3635
import sys
37-
import tempfile
3836

3937
import sphinx.application
4038
from docutils import nodes
4139
from docutils.nodes import Element
42-
from pypandoc.pandoc_download import download_pandoc
4340
from sphinx.writers.html5 import HTML5Translator
4441

45-
from modelopt import __version__ # noqa: E402
46-
47-
if not shutil.which("pandoc"):
48-
# Install pandoc if it is not installed.
49-
# Default `targetfolder` for Mac (~/Applications/pandoc) is not in `$PATH` so use whatever is in PATH
50-
# Pandoc is required by nbconvert but it is not included in the pypandoc pip package
51-
with tempfile.TemporaryDirectory() as tmpdir:
52-
download_pandoc(
53-
version="3.1.13",
54-
download_folder=tmpdir,
55-
targetfolder=os.environ["PATH"].split(os.pathsep)[0],
56-
delete_installer=True,
57-
)
42+
from modelopt import __version__
5843

5944
sys.path.insert(0, os.path.abspath("../../"))
6045
sys.path.append(os.path.abspath("./_ext"))
@@ -75,13 +60,11 @@
7560
"sphinx.ext.autodoc",
7661
"sphinx.ext.autosummary",
7762
"sphinx.ext.githubpages",
78-
"sphinx.ext.napoleon",
79-
# "sphinx.ext.viewcode",
63+
"sphinx.ext.napoleon", # Support for NumPy and Google style docstrings
64+
"sphinxarg.ext", # for command-line help documentation
8065
"sphinx_copybutton", # line numbers getting copied so cannot use `:linenos:`
8166
"sphinx_inline_tabs",
82-
"nbsphinx", # rendering jupyter notebooks in docs
8367
"sphinx_togglebutton",
84-
"IPython.sphinxext.ipython_console_highlighting",
8568
"sphinxcontrib.autodoc_pydantic",
8669
"modelopt_autodoc_pydantic",
8770
]
@@ -132,7 +115,7 @@
132115

133116

134117
# Mock imports for autodoc
135-
autodoc_mock_imports = ["mpi4py", "tensorrt_llm"]
118+
autodoc_mock_imports = ["mpi4py", "tensorrt_llm", "triton"]
136119

137120
autosummary_generate = True
138121
autosummary_imported_members = False
@@ -162,19 +145,6 @@
162145
autodoc_member_order = "alphabetical" # can also use `bysource` or `groupwise` to sort members
163146

164147

165-
# Do not auto-execute notebooks where all outputs are empty
166-
nbsphinx_execute = "never"
167-
168-
# Add link to download notebook on top of each notebook tutorial!
169-
nbsphinx_prolog = r"""
170-
.. raw:: html
171-
172-
<div class="admonition note">
173-
This tutorial is available as a Jupyter Notebook!
174-
<a href="{{ env.doc2path(env.docname, base=None).split('/')|last|e }}">Download notebook from here</a>.
175-
</div>
176-
"""
177-
178148
# autodoc_pydantic model settings
179149
autodoc_pydantic_model_show_config_summary = False
180150
autodoc_pydantic_model_show_validator_summary = False

docs/source/getting_started/_installation_for_Linux.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,15 @@ Latest Model Optimizer (``nvidia-modelopt``) currently has the following system
1212
+-------------------------+-----------------------------+
1313
| Architecture | x86_64, aarch64 (SBSA) |
1414
+-------------------------+-----------------------------+
15-
| Python | >=3.9,<3.13 |
15+
| Python | >=3.10,<3.13 |
1616
+-------------------------+-----------------------------+
1717
| CUDA | >=12.0 |
1818
+-------------------------+-----------------------------+
1919
| PyTorch (Optional) | >=2.4 |
2020
+-------------------------+-----------------------------+
2121
| TensorRT-LLM (Optional) | 0.18 |
2222
+-------------------------+-----------------------------+
23-
| ONNX Runtime (Optional) | 1.20 (Python>=3.10) |
23+
| ONNX Runtime (Optional) | 1.22 |
2424
+-------------------------+-----------------------------+
2525
| TensorRT (Optional) | >=10.0 |
2626
+-------------------------+-----------------------------+

0 commit comments

Comments
 (0)