Skip to content

Commit 11728b7

Browse files
Add all example e2e tests for github PR merge / nightly (NVIDIA#617)
## What does this PR do? **Type of change:** CICD infra improvement <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> - All example tests will now be run before PR can be merged (onnx_ptq-bash still in gitlab as it needs internal scratch space models. llm_eval / llm_autodeploy only run nightly to save per-PR gpu resource) - Users can also manually trigger specific test from https://github.com/NVIDIA/TensorRT-Model-Optimizer/actions/workflows/example_tests.yml - We no longer need to depend on internal gitlab infra for tests (except nemo-megatron integration tests) ## Testing <!-- Mention how have you tested your change if applicable. --> - Tests run in PR and manually via workflow_dispatch ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes <!--- If No, explain why. --> - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> --------- Signed-off-by: Keval Morabia <[email protected]>
1 parent e0a6efb commit 11728b7

File tree

8 files changed

+205
-117
lines changed

8 files changed

+205
-117
lines changed
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Reusable workflow for running example tests
2+
name: Example Tests Runner
3+
4+
on:
5+
workflow_call:
6+
inputs:
7+
docker_image:
8+
description: "Docker image to use for tests"
9+
required: true
10+
type: string
11+
example:
12+
description: "Example name to test (e.g. 'llm_ptq')"
13+
required: true
14+
type: string
15+
timeout_minutes:
16+
description: "Timeout in minutes for the job"
17+
required: false
18+
type: number
19+
default: 60
20+
pip_install_extras:
21+
description: "Pip install extras (e.g. '[hf,dev-test]' or '[all,dev-test]')"
22+
required: false
23+
type: string
24+
default: "[all,dev-test]"
25+
runner:
26+
description: "GitHub runner to use"
27+
required: false
28+
type: string
29+
default: "linux-amd64-gpu-h100-latest-1"
30+
31+
jobs:
32+
run-test:
33+
runs-on: ${{ inputs.runner }}
34+
timeout-minutes: ${{ inputs.timeout_minutes }}
35+
container:
36+
image: ${{ inputs.docker_image }}
37+
env:
38+
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
39+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
40+
steps:
41+
- uses: actions/checkout@v4
42+
- uses: nv-gha-runners/setup-proxy-cache@main
43+
- name: Setup environment variables
44+
run: |
45+
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV
46+
echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV
47+
- name: Install dependencies
48+
run: |
49+
# Install git-lfs for Daring-Anteater dataset
50+
apt-get update && apt-get install -y git-lfs
51+
git lfs install --system
52+
53+
pip install ".${{ inputs.pip_install_extras }}"
54+
55+
if [[ "${{ inputs.example }}" == *"diffusers"* ]]; then
56+
echo "Uninstalling apex for diffusers: T5 Int8 (PixArt) + Apex is not supported as per https://github.com/huggingface/transformers/issues/21391"
57+
pip uninstall -y apex || true
58+
fi
59+
60+
find examples/${{ inputs.example }} -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
61+
- name: Run tests
62+
run: |
63+
echo "Running tests for: ${{ inputs.example }}"
64+
pytest tests/examples/${{ inputs.example }}

.github/workflows/example_tests.yml

Lines changed: 99 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
# NOTE: Make sure this file is consistent with .gitlab/tests.yml
2-
name: E2E Example tests
1+
name: Example tests
32

43
on:
54
push:
@@ -41,10 +40,10 @@ jobs:
4140
sha: ${{ fromJSON(steps.get-pr-info.outputs.pr-info).head.sha }}
4241
files: |
4342
.github/workflows/example_tests.yml
44-
examples/llm_ptq/**
45-
modelopt/torch/**
46-
tests/examples/llm_ptq/**
43+
examples/**
44+
modelopt/**
4745
setup.py
46+
tests/examples/**
4847
fail_on_initial_diff_error: true
4948
wait-checks:
5049
needs: [check-file-changes]
@@ -56,46 +55,110 @@ jobs:
5655
with:
5756
match_pattern: "^DCO$|^linux$" # Wait for DCO and Unit tests / linux to pass
5857
delay: 300s
59-
example-tests-pr:
58+
59+
##### PyTorch Example Tests #####
60+
torch-pr:
6061
needs: [check-file-changes, wait-checks]
61-
if: needs.check-file-changes.outputs.any_changed == 'true'
62-
runs-on: linux-amd64-gpu-h100-latest-1
63-
timeout-minutes: 90
62+
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
6463
strategy:
64+
fail-fast: false
6565
matrix:
66-
EXAMPLE: [llm_ptq]
67-
container: &example_container
68-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
69-
env:
70-
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
71-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
72-
steps: &example_steps
73-
- uses: actions/checkout@v4
74-
- uses: nv-gha-runners/setup-proxy-cache@main
75-
- name: Setup environment variables
76-
run: |
77-
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV
78-
echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV
79-
- name: Run example tests
80-
run: |
81-
pip install ".[hf,dev-test]"
82-
find examples/${{ matrix.EXAMPLE }} -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
83-
pytest -s tests/examples/${{ matrix.EXAMPLE }}
84-
example-tests-non-pr:
66+
example: [llm_distill, llm_qat, llm_sparsity, speculative_decoding]
67+
uses: ./.github/workflows/_example_tests_runner.yml
68+
secrets: inherit
69+
with:
70+
docker_image: "nvcr.io/nvidia/pytorch:25.06-py3"
71+
example: ${{ matrix.example }}
72+
pip_install_extras: "[hf,dev-test]"
73+
runner: linux-amd64-gpu-l4-latest-1
74+
75+
torch-non-pr:
8576
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
86-
runs-on: linux-amd64-gpu-h100-latest-2
87-
timeout-minutes: 90
8877
strategy:
78+
fail-fast: false
79+
matrix:
80+
example: [llm_distill, llm_qat, llm_sparsity, speculative_decoding]
81+
uses: ./.github/workflows/_example_tests_runner.yml
82+
secrets: inherit
83+
with:
84+
docker_image: "nvcr.io/nvidia/pytorch:25.06-py3"
85+
example: ${{ matrix.example }}
86+
pip_install_extras: "[hf,dev-test]"
87+
runner: linux-amd64-gpu-h100-latest-2
88+
89+
##### TensorRT-LLM Example Tests #####
90+
trtllm-pr:
91+
needs: [check-file-changes, wait-checks]
92+
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
93+
strategy:
94+
fail-fast: false
8995
matrix:
90-
EXAMPLE: [llm_ptq]
91-
container: *example_container
92-
steps: *example_steps
96+
example: [llm_ptq]
97+
uses: ./.github/workflows/_example_tests_runner.yml
98+
secrets: inherit
99+
with:
100+
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2"
101+
example: ${{ matrix.example }}
102+
pip_install_extras: "[hf,dev-test]"
103+
runner: linux-amd64-gpu-h100-latest-1
104+
105+
trtllm-non-pr:
106+
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
107+
strategy:
108+
fail-fast: false
109+
matrix:
110+
example: [llm_autodeploy, llm_eval, llm_ptq, vlm_ptq]
111+
uses: ./.github/workflows/_example_tests_runner.yml
112+
secrets: inherit
113+
with:
114+
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2"
115+
example: ${{ matrix.example }}
116+
pip_install_extras: "[hf,dev-test]"
117+
runner: linux-amd64-gpu-h100-latest-2
118+
119+
##### ONNX/TensorRT Example Tests #####
120+
onnx-pr:
121+
needs: [check-file-changes, wait-checks]
122+
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
123+
strategy:
124+
fail-fast: false
125+
matrix:
126+
example: [diffusers]
127+
uses: ./.github/workflows/_example_tests_runner.yml
128+
secrets: inherit
129+
with:
130+
docker_image: "nvcr.io/nvidia/tensorrt:25.08-py3"
131+
example: ${{ matrix.example }}
132+
pip_install_extras: "[all,dev-test]"
133+
runner: linux-amd64-gpu-l4-latest-1
134+
135+
onnx-non-pr:
136+
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
137+
strategy:
138+
fail-fast: false
139+
matrix:
140+
example: [diffusers, onnx_ptq]
141+
uses: ./.github/workflows/_example_tests_runner.yml
142+
secrets: inherit
143+
with:
144+
docker_image: "nvcr.io/nvidia/tensorrt:25.08-py3"
145+
example: ${{ matrix.example }}
146+
pip_install_extras: "[all,dev-test]"
147+
runner: linux-amd64-gpu-l4-latest-1
148+
149+
##### Required Check for PR #####
93150
example-pr-required-check:
94-
# Run even if example-tests-pr is skipped
151+
# Run even if example tests are skipped
95152
if: ${{ startsWith(github.ref, 'refs/heads/pull-request/') && always() }}
96-
needs: [check-file-changes, example-tests-pr]
153+
needs: [check-file-changes, torch-pr, trtllm-pr, onnx-pr]
97154
runs-on: ubuntu-latest
98155
steps:
99156
- name: Required GPU tests did not succeed
100-
if: ${{ needs.check-file-changes.result != 'success' || (needs.check-file-changes.outputs.any_changed == 'true' && needs.example-tests-pr.result != 'success') }}
157+
if: |
158+
needs.check-file-changes.result != 'success' ||
159+
(needs.check-file-changes.outputs.any_changed == 'true' && (
160+
needs.torch-pr.result != 'success' ||
161+
needs.trtllm-pr.result != 'success' ||
162+
needs.onnx-pr.result != 'success'
163+
))
101164
run: exit 1

.gitlab/release.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,12 @@ build-and-upload-wheels:
1010
- if: $JET_ONLY != null
1111
when: never
1212
- if: $CI_COMMIT_TAG =~ /^\d+\.\d+\.\d+$/
13-
when: manual
1413
variables:
1514
RELEASE: "true"
1615
TWINE_USERNAME: svc-dl-algo-ammo
1716
TWINE_PASSWORD: $ARTIFACTORY_TOKEN # Configured in GitLab > Settings > CI/CD
1817
REPO_URL: https://urm.nvidia.com/artifactory/api/pypi/sw-dl-algo-ammo-pypi-local
1918
- if: $CI_PIPELINE_SOURCE == "schedule"
20-
when: manual
2119
variables:
2220
RELEASE: "false"
2321
TWINE_USERNAME: gitlab-ci-token

.gitlab/tests.yml

Lines changed: 6 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -9,79 +9,22 @@
99
- if: $CI_PIPELINE_SOURCE == "web" || $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED == "true"
1010
when: manual
1111

12-
##### Unit Tests #####
13-
unit:
14-
extends: .tests-default
15-
timeout: 30m
16-
variables:
17-
PYTHON: 12
18-
TORCH: 29
19-
TRANSFORMERS: latest
20-
image: python:3.$PYTHON
21-
before_script:
22-
- pip install tox
23-
script:
24-
- tox -e py3$PYTHON-torch$TORCH-tf_$TRANSFORMERS-unit
25-
26-
##### GPU Tests #####
27-
.multi-gpu-tests-default:
12+
##### Example Tests #####
13+
example-onnx-bash:
2814
extends: .tests-default
2915
timeout: 90m
30-
image: nvcr.io/nvidia/pytorch:25.06-py3
31-
variables:
32-
GIT_DEPTH: 1000 # For correct version for tests/gpu/torch/quantization/plugins/test_megatron.py
33-
tags: [docker, linux, 2-gpu]
34-
before_script:
35-
# Add libcudnn*.so and libnv*.so to path
36-
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu"
37-
# Install git-lfs for Daring-Anteater dataset
38-
- apt-get update && apt-get install -y git-lfs
39-
- git lfs install --system
40-
41-
multi-gpu:
42-
extends: .multi-gpu-tests-default
43-
script:
44-
# Use pre-installed packages without a new venv with tox-current-env
45-
- pip install tox-current-env
46-
- tox -e py312-cuda12-gpu --current-env
47-
48-
##### Example Tests #####
49-
example-torch:
50-
extends: .multi-gpu-tests-default
51-
timeout: 30m
52-
parallel:
53-
matrix:
54-
- EXAMPLE: [llm_distill, llm_qat, llm_sparsity, speculative_decoding]
55-
script:
56-
- pip install ".[hf,dev-test]"
57-
- find examples/$EXAMPLE -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
58-
- pytest -s tests/examples/$EXAMPLE
59-
60-
example-trtllm:
61-
extends: example-torch
62-
timeout: 60m
63-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
64-
tags: [docker, linux, 2-gpu, sm>=89]
65-
parallel:
66-
matrix:
67-
- EXAMPLE: [llm_autodeploy, llm_eval, llm_ptq, vlm_ptq]
68-
69-
example-onnx:
70-
extends: example-torch
7116
image: nvcr.io/nvidia/tensorrt:25.08-py3
7217
tags: [docker, linux, 2-gpu, sm>=89]
7318
parallel:
7419
matrix:
75-
- EXAMPLE: [diffusers, onnx_ptq]
76-
TEST_TYPE: pytest
7720
- EXAMPLE: [onnx_ptq]
78-
TEST_TYPE: bash
21+
before_script:
22+
# Add libcudnn*.so and libnv*.so to path
23+
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu"
7924
script:
80-
# Uninstall apex since T5 Int8 (PixArt) + Apex is not supported as per https://github.com/huggingface/transformers/issues/21391
81-
- if [ "$EXAMPLE" = "diffusers" ]; then pip uninstall -y apex; fi
8225
- pip install ".[all,dev-test]"
8326
- find examples/$EXAMPLE -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
84-
- if [ "$TEST_TYPE" = "pytest" ]; then pytest -s tests/examples/$EXAMPLE; else bash tests/examples/test_$EXAMPLE.sh; fi
27+
- bash tests/examples/test_$EXAMPLE.sh
8528

8629
##### Megatron / NeMo Integration Tests #####
8730
megatron-nemo-integration:

0 commit comments

Comments
 (0)