Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ modelopt/torch/trace @NVIDIA/modelopt-torch-nas-prune-codeowners
modelopt/torch/utils @NVIDIA/modelopt-torch-utils-codeowners

# Examples
/docker @NVIDIA/modelopt-docker-codeowners
/README.md @NVIDIA/modelopt-examples-codeowners
/examples @NVIDIA/modelopt-examples-codeowners
/examples/chained_optimizations @NVIDIA/modelopt-torch-nas-prune-codeowners
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/example_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,14 @@ jobs:
container: &example_container
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
env:
LD_LIBRARY_PATH: "/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:${LD_LIBRARY_PATH}"
# PATH: "/usr/local/tensorrt/targets/x86_64-linux-gnu/bin:${PATH}"
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
steps: &example_steps
- uses: actions/checkout@v4
- uses: nv-gha-runners/setup-proxy-cache@main
- name: Setup environment variables
run: |
echo "LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
echo "PATH=/usr/local/tensorrt/targets/x86_64-linux-gnu/bin:${PATH}" >> $GITHUB_ENV
- name: Run example tests
run: |
pip install ".[all,dev-test]"
Expand Down
46 changes: 30 additions & 16 deletions .gitlab/tests.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# NOTE: Make sure this file is consistent with .github/workflows/{unit,gpu}_tests.yml
# NOTE: Make sure this file is consistent with .github/workflows/{unit,gpu,example}_tests.yml
.tests-default:
variables:
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
stage: tests
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
when: always
- if: $CI_PIPELINE_SOURCE != "schedule"
when: manual
- if: $CI_COMMIT_TAG =~ /^\d+\.\d+\.\d+$/
- when: manual

##### Unit Tests #####
unit:
Expand All @@ -24,44 +25,57 @@ unit:
- tox -e py3$PYTHON-torch$TORCH-tf_$TRANSFORMERS-unit

##### GPU Tests #####
gpu:
.gpu-tests-default:
extends: .tests-default
timeout: 60m
image: nvcr.io/nvidia/pytorch:25.06-py3
variables:
GIT_DEPTH: 1000 # For correct version for tests/gpu/torch/quantization/plugins/test_megatron.py
LD_LIBRARY_PATH: "/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}" # Add libcudnn*.so and libnv*.so to path.
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
tags: [docker, linux, 2-gpu]
before_script:
# Add libcudnn*.so and libnv*.so to path
- export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:${LD_LIBRARY_PATH}"
# Add trtexec to path
- export PATH="/usr/local/tensorrt/targets/x86_64-linux-gnu/bin:$PATH"
# Install git-lfs for Daring-Anteater dataset
- apt-get update && apt-get install -y git-lfs
- git lfs install --system

multi-gpu:
extends: .gpu-tests-default
script:
# Use pre-installed packages without a new venv with tox-current-env
- pip install tox-current-env
- tox -e py312-cuda12-gpu --current-env

##### Example Tests #####
example:
extends: .tests-default
stage: tests
timeout: 45m
image: gitlab-master.nvidia.com:5005/omniml/modelopt/modelopt_examples:latest
extends: .gpu-tests-default
timeout: 30m
variables:
TEST_TYPE: pytest
tags: [docker, linux, 2-gpu, sm<89]
parallel:
matrix:
- EXAMPLE: [diffusers, llm_distill, llm_qat, llm_sparsity, onnx_ptq, speculative_decoding]
allow_failure: true # Allow to continue next stages even if job is canceled (e.g. during release)
before_script:
- pip install ".[all,dev-test]"
- EXAMPLE: [diffusers, llm_distill, llm_sparsity, onnx_ptq, speculative_decoding]
script:
- pip install ".[all,dev-test]"
# Uninstall apex since T5 Int8 (PixArt) + Apex is not supported as per https://github.com/huggingface/transformers/issues/21391
- if [ "$EXAMPLE" = "diffusers" ]; then pip uninstall -y apex; fi
- find examples/$EXAMPLE -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
- if [ "$TEST_TYPE" = "pytest" ]; then pytest -s tests/examples/$EXAMPLE; else bash tests/examples/test_$EXAMPLE.sh; fi

# TODO: Fix llm_qat test hang in GitLab CI
example-failing:
extends: example
allow_failure: true
parallel:
matrix:
- EXAMPLE: [llm_qat]

example-ada:
extends: example
timeout: 60m
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
tags: [docker, linux, 2-gpu, sm>=89]
parallel:
matrix:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Model Optimizer Changelog (Linux)

**Deprecations**

- Deprecated ModelOpt's custom docker image. Please use the TensorRT-LLM docker image directly or refer to the `installation guide <https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html>`_ for more details.
- Deprecated ``quantize_mode`` argument in ``examples/onnx_ptq/evaluate.py`` to support strongly typing. Use ``engine_precision`` instead.
- Deprecated TRT-LLM's TRT backend in ``examples/llm_ptq`` and ``examples/vlm_ptq``. Tasks ``build`` and ``benchmark`` support are removed and replaced with ``quant``. For performance evaluation, please use ``trtllm-bench`` directly.
- ``--export_fmt`` flag in ``examples/llm_ptq`` is removed. By default we export to the unified Hugging Face checkpoint format.
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pip install -e ".[dev]"
```

If you are working on features that require dependencies like TensorRT-LLM or Megatron-Core, consider using a docker container to simplify the setup process.
See [docker README](./README.md#installation--docker) for more details.
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.

## 🧹 Code linting and formatting

Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ Model Optimizer is also integrated with [NVIDIA NeMo](https://github.com/NVIDIA-
To install stable release packages for Model Optimizer with `pip` from [PyPI](https://pypi.org/project/nvidia-modelopt/):

```bash
pip install nvidia-modelopt[all]
pip install -U nvidia-modelopt[all]
```

To install from source in editable mode with all development dependencies or to test the latest changes, run:
To install from source in editable mode with all development dependencies or to use the latest features, run:

```bash
# Clone the Model Optimizer repository
Expand All @@ -74,7 +74,9 @@ cd TensorRT-Model-Optimizer
pip install -e .[dev]
```

Visit our [installation guide](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more fine-grained control on installed dependencies or view our pre-made [dockerfiles](docker/README.md) for more information.
You can also directly use the [TensorRT-LLM docker images](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags)
(e.g., `nvcr.io/nvidia/tensorrt-llm/release:<version>`),
which have Model Optimizer pre-installed. Visit our [installation guide](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more fine-grained control on installed dependencies or for alternative docker images and environment variables to setup.

## Techniques

Expand Down
27 changes: 0 additions & 27 deletions docker/Dockerfile

This file was deleted.

16 changes: 0 additions & 16 deletions docker/README.md

This file was deleted.

19 changes: 0 additions & 19 deletions docker/build.sh

This file was deleted.

32 changes: 11 additions & 21 deletions docs/source/getting_started/_installation_for_Linux.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,39 +30,29 @@ Environment setup

.. tab:: Docker image (Recommended)

**Using ModelOpt's docker image**
To use Model Optimizer with full dependencies (e.g. TensorRT/TensorRT-LLM deployment), we recommend using the
`TensorRT-LLM docker image <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags>`_,
e.g., ``nvcr.io/nvidia/tensorrt-llm/release:<version>``.

To use Model Optimizer with full dependencies (e.g. TensorRT/TensorRT-LLM deployment), we recommend using our provided docker image
which is based on the `TensorRT-LLM <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags>`_
docker image with additional dependencies installed.
You may upgrade the Model Optimizer to the latest version if not already as described in the next section.

After installing the `NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>`_,
please run the following commands to build the Model Optimizer docker container which has all the base
dependencies pre-installed. You may need to install additional dependencies from the examples's `requirements.txt` file.
You would also need to setup appropriate environment variables for the TensorRT binaries as follows:

.. code-block:: shell

# Clone the ModelOpt repository
git clone [email protected]:NVIDIA/TensorRT-Model-Optimizer.git
cd TensorRT-Model-Optimizer
export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib:${LD_LIBRARY_PATH}"
export PATH="/usr/local/tensorrt/targets/x86_64-linux-gnu/bin:${PATH}"

# Build the docker (will be tagged `docker.io/library/modelopt_examples:latest`)
# You may customize `docker/Dockerfile` to include or exclude certain dependencies you may or may not need.
bash docker/build.sh
You may need to install additional dependencies from the respective examples's `requirements.txt` file.

# Run the docker image
docker run --gpus all -it --shm-size 20g --rm docker.io/library/modelopt_examples:latest bash

# Check installation (inside the docker container)
python -c "import modelopt; print(modelopt.__version__)"

**Using alternative NVIDIA docker images**
**Alternative NVIDIA docker images**

For PyTorch, you can also use `NVIDIA NGC PyTorch container <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags>`_
and for NVIDIA NeMo framework, you can use the `NeMo container <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags>`_.
Both of these containers come with Model Optimizer pre-installed. Make sure to update the Model Optimizer to the latest version if not already.

For ONNX PTQ, you can use the optimized docker image from [onnx_ptq Dockerfile](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/onnx_ptq/docker).
For ONNX PTQ, you can use the docker image from `onnx_ptq Dockerfile <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/onnx_ptq/docker>`_
which includes the latest publicly available TensorRT version, providing access to cutting-edge features and superior performance.

.. tab:: Local environment (PIP / Conda)

Expand Down
2 changes: 1 addition & 1 deletion examples/diffusers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Each subsection (cache_diffusion, quantization, etc.) have their own `requiremen

You can find the latest TensorRT [here](https://developer.nvidia.com/tensorrt/download).

Visit our [installation guide](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) or view our pre-made [dockerfiles](../../docker/Dockerfile) for more information.
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.

## Getting Started

Expand Down
3 changes: 2 additions & 1 deletion examples/llm_ptq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ pip install nvidia-modelopt[hf]
pip install -r requirements.txt
```

If you want to deploy the quantized model on TRT-LLM, you will also need to install the TRT-LLM dependencies as per the [TRT-LLM documentation](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html#installation). Alternatively you can use the ModelOpt docker image built from the [ModelOpt docker build step](../../docker/README.md) which has all the dependencies including TRT-LLM installed.
If you want to deploy the quantized model on TRT-LLM, you will also need to install the TRT-LLM dependencies as per the [TRT-LLM documentation](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html#installation).
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.

For NeMo models, use the NeMo container `nvcr.io/nvidia/nemo:25.04` or later which has all the dependencies including TRT-LLM installed.

Expand Down
3 changes: 2 additions & 1 deletion examples/llm_qat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ pip install nvidia-modelopt[hf]
pip install -r requirements.txt
```

If you want to deploy the quantized model on TRT-LLM, you will also need to install the TRT-LLM dependencies as per the [TRT-LLM documentation](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html#installation). Alternatively you can use the ModelOpt docker image built from the [ModelOpt docker build step](../../docker/Dockerfile) which has all the dependencies including TRT-LLM installed.
If you want to deploy the quantized model on TRT-LLM, you will also need to install the TRT-LLM dependencies as per the [TRT-LLM documentation](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html#installation).
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.

For NeMo models, use the NeMo container `nvcr.io/nvidia/nemo:25.04` or later which has all the dependencies including TRT-LLM installed.

Expand Down
4 changes: 2 additions & 2 deletions examples/llm_sparsity/data_prep.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,9 @@ def main():
os.makedirs(args.save_path, exist_ok=True)

with open(os.path.join(args.save_path, "cnn_train.json"), "w") as write_f:
json.dump(tokenized_dataset["train"]["text"], write_f, indent=4, ensure_ascii=False)
json.dump(list(tokenized_dataset["train"]["text"]), write_f, indent=4, ensure_ascii=False)
with open(os.path.join(args.save_path, "cnn_eval.json"), "w") as write_f:
json.dump(tokenized_dataset["test"]["text"], write_f, indent=4, ensure_ascii=False)
json.dump(list(tokenized_dataset["test"]["text"]), write_f, indent=4, ensure_ascii=False)


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion examples/onnx_ptq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Model Optimizer enables highly performant quantization formats including NVFP4,

### Docker

Build from this [Dockerfile](./docker/Dockerfile) which includes the latest publicly available TensorRT version, providing access to cutting-edge features and superior performance compared to the `modelopt_examples` [Docker image](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/docker/Dockerfile).
Build from this [Dockerfile](./docker/Dockerfile) which includes the latest publicly available TensorRT version, providing access to cutting-edge features and superior performance.

Build the Docker image (will be tagged `docker.io/library/onnx_ptq_examples:latest`)

Expand Down
13 changes: 4 additions & 9 deletions tests/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,17 @@ Make sure to use as small models and less data as possible to keep the tests fas

## Running the tests

To run a test, use the [ModelOpt docker image](../../README.md#installation--docker) so all required dependencies are available.
and mount your local modelopt directory to `/workspace/TensorRT-Model-Optimizer` and run this from the root of the repository.
To run a test, start from the recommended docker image from our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html).
Then mount your local modelopt directory to `/workspace/TensorRT-Model-Optimizer` and run this from the root of the repository.

```bash
cd /workspace/TensorRT-Model-Optimizer
pip install -e ".[all,dev-test]"
pytest tests/examples/$TEST
```

NOTE: Some tests (e.g. `llm_ptq`) have an option to disable using a smaller proxy model, and instead use the original model by setting the `MODELOPT_FAST_TESTS` environment variable to `false`. This is useful in nightly tests to ensure the original model is used.

```bash
MODELOPT_FAST_TESTS=false ROOT_SAVE_PATH=/tmp/test_llm_ptq/ pytest tests/examples/llm_ptq/
```

## Environment variables

The following environment variables can be set to control the behavior of the tests:

- `MODELOPT_FAST_TESTS`: If set to `false`, the tests will use the original model instead of a smaller proxy model. Default is `true`.
- `MODELOPT_LOCAL_MODEL_ROOT`: If set, the tests will use the local model directory instead of downloading the model from the internet. Default is not set, which means the model will be downloaded.
1 change: 0 additions & 1 deletion tests/examples/llm_sparsity/test_llama_sparsify.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ def data_path(tmp_path_factory):
@pytest.mark.parametrize(
("sparsity_fmt", "dtype"),
[
("sparsegpt", "bf16"),
("sparse_magnitude", "bf16"),
],
)
Expand Down
Loading