-
Notifications
You must be signed in to change notification settings - Fork 190
Fix failing CICD nightly tests #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoves TensorRT paths from GPU test environments and workflows, updates dependency version constraints in example projects, modifies ONNX export parameters, and reorganizes test imports. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes These changes consist primarily of repetitive path removals (TensorRT from multiple files), straightforward dependency constraint adjustments, import reorganization, and parameter additions. While spread across multiple files, the modifications follow consistent patterns with minimal logic density. Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e203106 to
f00d2ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/_test_utils/torch_quantization/onnx_export.py (1)
60-70: Guarddynamoparameter for PyTorch versions before 2.5.The dynamo parameter for torch.onnx.export was introduced in PyTorch 2.5 and made the default (dynamo=True) in PyTorch 2.9. Hardcoding
dynamo=Falsewithout a version check will raise a TypeError on PyTorch < 2.5. Add a guard check similar to the existingenable_onnx_checkerpattern (lines 56–59):if "enable_onnx_checker" in inspect.signature(torch.onnx.export).parameters: kwargs = {"enable_onnx_checker": False} else: kwargs = {} +if "dynamo" in inspect.signature(torch.onnx.export).parameters: + kwargs["dynamo"] = False # torch 2.9 flips default to True torch.onnx.export( model, dummy_input, f=buffer, opset_version=OPSET, input_names=input_names, output_names=output_names, do_constant_folding=constant_folding, - dynamo=False, # torch 2.9 flips default to True **kwargs, )Also simplify the providers assignment at line 77 (inside the
device == "cpu"block):-providers = ["CUDAExecutionProvider"] if device != "cpu" else ["CPUExecutionProvider"] +providers = ["CPUExecutionProvider"]tox.ini (1)
5-5: Update envlist to reference cuda13-gpu environment.Line 5 still references
py312-cuda12-gpu, but the GPU test environment is now defined aspy312-cuda13-gpuon line 66. The envlist will fail to locate the environment when running default tox targets.Apply this diff:
envlist= pre-commit-all py312-torch28-tf_latest-unit - py312-cuda12-gpu + py312-cuda13-gpu
🧹 Nitpick comments (4)
docs/source/getting_started/_installation_for_Linux.rst (1)
21-21: Version table update looks good.TRT-LLM 1.2.0rc0.post1 aligns with examples/workflows. Consider cross-linking to the NGC tag list next to the table entry for quick discovery.
.github/workflows/example_tests.yml (1)
69-73: Container tag bump LGTM; add a precompile warm-up to fail fast on kernels.Add a short precompile step to surface Triton/quantization kernel build issues early and cache artifacts for the job.
- name: Setup environment variables run: | echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV + - name: Precompile ModelOpt quantization kernels + run: | + python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"CHANGELOG.rst (1)
7-14: Changelog entries read well and match the PR.The breaking change note for defaulting to
cupy-cuda13xand the TRT‑LLM 1.2.0rc0.post1 upgrade align with the code/workflows. Consider linking to the install doc section that mentions Docker image selection for CUDA 13 to help users navigate.setup.py (1)
32-53: Optional: add an extra, cheap guard before calling nvcc.A quick
shutil.which("nvcc")check avoids spawning a subprocess when nvcc isn’t present, and makes intent explicit.+import shutil @@ - try: - result = subprocess.run( + try: + if shutil.which("nvcc") is None: + return None + result = subprocess.run( ["nvcc", "--version"], capture_output=True, text=True, timeout=5, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.github/workflows/example_tests.yml(1 hunks).github/workflows/gpu_tests.yml(2 hunks).gitlab/tests.yml(3 hunks)CHANGELOG.rst(1 hunks)docs/source/getting_started/_installation_for_Linux.rst(1 hunks)examples/llm_ptq/README.md(1 hunks)pyproject.toml(1 hunks)setup.py(2 hunks)tests/_test_utils/torch_quantization/onnx_export.py(1 hunks)tox.ini(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (9)
examples/llm_ptq/README.md (1)
30-31: Image tag bump LGTM; verify NeMo tag consistency.TRT-LLM tag 1.2.0rc0.post1 is correct. Please confirm if NeMo 25.07 is still the intended baseline under CUDA 13, or update to 25.08 for consistency across docs.
pyproject.toml (1)
157-157: Remove setup.py from Bandit exclusion to maintain security scanning of production code.The review comment is valid. setup.py contains subprocess usage in the
get_cuda_major_version()function, which is production code and should not be excluded from security scanning alongside test and example directories. The subprocess call uses safe parameters: a list of arguments, capture_output=True, text=True, and timeout=5, meaning it does not require exclusion. Per the codebase policy (visible in pyproject.toml), global Bandit rule skips require special approval and should remain consistent with NVIDIA's CI/CD config, making inline documentation the appropriate approach if needed.Removing setup.py from exclude_dirs is the correct change.
setup.py (2)
16-21: Fix docstring mismatch on fallback CUDA version.Text says “fallback to CUDA 13 by default” but earlier line implies CUDA 12. Make the message consistent to avoid confusion.
-If installing from a wheel, cupy for CUDA 13 is installed by default. - If you have CUDA 12, you need to run `pip uninstall -y cupy-cuda13x` and `pip install cupy-cuda12x` separately. +If installing from a wheel, cupy for CUDA 13 is installed by default. +If you have CUDA 12, run: `pip uninstall -y cupy-cuda13x && pip install cupy-cuda12x`.Likely an incorrect or invalid review comment.
128-137: Review comment is based on incorrect assumptions about cupy-cuda Windows availability.CuPy provides native Windows support with prebuilt Windows pip wheels, with official Windows support added around CuPy v9 (2021). The review comment claims cupy-cuda is "not available/needed" on Windows, but Windows wheels (package names like cupy-cuda11x or cupy-cuda12x) are available for matching CUDA toolkit versions.
While the codebase does exclude certain GPU packages from Windows (onnxruntime-gpu, deepspeed), this does not mean cupy-cuda is unavailable or problematic on Windows. The presence of onnxruntime-directml for Windows serves a different purpose and doesn't contradict cupy-cuda's availability.
No evidence is provided that the current marker causes "resolution failures" on Windows. The proposed restriction to Linux x86_64 lacks justification given that cupy-cuda has official Windows support.
Likely an incorrect or invalid review comment.
tox.ini (2)
20-22: Consistent py312 onnxsim handling.Adding py312-specific onnxsim build aligns with the same approach used in other environments (lines 45-46) and mirrors the workaround for upstream ONNX simplifier limitations.
75-76: Explicit triton pin for NGC PyTorch 25.08 compatibility.Pinning
triton==3.3.1explicitly (replacing a previous constraint) aligns with the PyTorch Triton version in the NGC PyTorch 25.08 container and helps prevent mamba from installing an incompatible version. This is a reasonable safeguard for the GPU test environment..github/workflows/gpu_tests.yml (1)
66-66: Consistent container and test environment upgrade to CUDA 13.Both the NGC PyTorch container image (line 66:
25.08-py3) and tox environment (line 79:py312-cuda13-gpu) are aligned for CUDA 13. This consistency ensures the CI environment and test execution match the target PyTorch/CUDA ecosystem.Also applies to: 79-79
.gitlab/tests.yml (2)
32-32: Consistent container and test environment upgrade to CUDA 13.Both the NGC PyTorch container image (line 32:
25.08-py3) and tox environment (line 50:py312-cuda13-gpu) are aligned for CUDA 13, matching the corresponding updates in.github/workflows/gpu_tests.yml.Also applies to: 50-50
67-67: TRT-LLM image updated to target release.Updated the example-trtllm job container from
1.1.0rc2.post2to1.2.0rc0.post1, aligning with the PR's primary objective to bump TensorRT-LLM to the new version with CUDA 13 support.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #445 +/- ##
=======================================
Coverage 73.38% 73.38%
=======================================
Files 180 180
Lines 17942 17942
=======================================
Hits 13166 13166
Misses 4776 4776 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f00d2ca to
01f08d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pyproject.toml (1)
157-157: Confirm that excluding setup.py is the appropriate security posture.The addition of
"setup.py"toexclude_dirswill suppress all Bandit security analysis on that file. While this is reasonable for masking false positives in runtime CUDA detection logic, excluding an entire file should be a last resort.Consider whether using targeted
# noseccomments in setup.py for specific lines that trigger Bandit warnings would be preferable, as it maintains security visibility over the file while allowing legitimate issues to be detected.Verify that this exclusion doesn't inadvertently mask real security concerns introduced by the new CUDA detection logic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.github/workflows/example_tests.yml(1 hunks).github/workflows/gpu_tests.yml(2 hunks).gitlab/tests.yml(3 hunks)CHANGELOG.rst(1 hunks)docs/source/getting_started/_installation_for_Linux.rst(1 hunks)examples/llm_ptq/README.md(1 hunks)examples/llm_sparsity/launch_finetune.sh(1 hunks)examples/llm_sparsity/requirements.txt(0 hunks)pyproject.toml(1 hunks)setup.py(2 hunks)tests/_test_utils/torch_quantization/onnx_export.py(1 hunks)tox.ini(3 hunks)
💤 Files with no reviewable changes (1)
- examples/llm_sparsity/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (7)
- .github/workflows/example_tests.yml
- setup.py
- docs/source/getting_started/_installation_for_Linux.rst
- tox.ini
- CHANGELOG.rst
- tests/_test_utils/torch_quantization/onnx_export.py
- .github/workflows/gpu_tests.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (3)
examples/llm_ptq/README.md (1)
30-30: Docker image version update is correct and consistent.The TensorRT-LLM image tag bump from 1.1.0rc2.post2 to 1.2.0rc0.post1 aligns with the PR objectives and is the appropriate update for this documentation.
.gitlab/tests.yml (2)
32-32: Version updates are consistent and aligned with PR objectives.The PyTorch image bump (25.06 → 25.08), CUDA target migration (12 → 13), and TensorRT-LLM version bump (1.1.0rc2.post2 → 1.2.0rc0.post1) are all coordinated and reflect the stated objectives of this PR.
Also applies to: 50-50, 67-67
1-1: Inconsistency found in example test container image—requires manual verification.Verification found that:
✓ setup.py CUDA detection: Confirmed
get_cuda_major_version()function implements runtime CUDA major-version detection with fallback to CUDA 13.✓ GPU workflows consistency:
.github/workflows/gpu_tests.ymlusespytorch:25.08-py3andpy312-cuda13-gpu, matching.gitlab/tests.yml.✓ Unit tests:
.github/workflows/unit_tests.ymlappropriately usesubuntu-latestrunners without GPU/CUDA (correct for CPU-only tests).⚠ Example tests image mismatch:
.github/workflows/example_tests.ymlusesnvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1, but.gitlab/tests.ymlline 75 usesnvcr.io/nvidia/tensorrt:25.08-py3. Determine whether this difference is intentional (separate TensorRT vs TensorRT-LLM purposes) or a configuration drift that needs alignment.
01f08d6 to
d3a6355
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest we keep the current TRT LLM docker as 1.2.0RC0 has a bug on multi TP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
setup.py (2)
37-57: Consider improving CUDA detection robustness.The CUDA detection has several limitations:
- Detection failure defaults to CUDA 13 (line 153), which may cause issues for CUDA 12 users who don't have nvcc in PATH but have CUDA 12 installed.
- Single detection method: Only checks
nvcc --version. Consider adding fallback checks:
nvidia-smioutputCUDA_HOMEorCUDA_PATHenvironment variables- PyTorch's
torch.version.cudaif torch is already installed- Broad exception handling:
except Exception:(line 54) silently catches all errors, making debugging difficult.Consider this enhanced version:
def get_cuda_major_version() -> int | None: """Return CUDA major version installed on the system or None if detection fails.""" + # Try nvcc first try: result = subprocess.run( ["nvcc", "--version"], capture_output=True, text=True, timeout=5, ) if result.returncode == 0: - # Parse output like "release 12.0, V12.0.140" or "release 13.0, V13.0.0" for line in result.stdout.split("\n"): if "release" in line.lower(): match = re.search(r"release (\d+)\.", line) if match: return int(match.group(1)) - except Exception: - pass + except (FileNotFoundError, subprocess.TimeoutExpired, subprocess.SubprocessError) as e: + print(f"Warning: nvcc detection failed: {e}", file=sys.stderr) + + # Try nvidia-smi as fallback + try: + result = subprocess.run( + ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"], + capture_output=True, + text=True, + timeout=5, + ) + if result.returncode == 0: + # CUDA version can be inferred from driver version + # This is a rough heuristic + pass # Implement if needed + except Exception as e: + print(f"Warning: nvidia-smi detection failed: {e}", file=sys.stderr) return None
160-166: Consider simplifying the conditional onnxsim logic.The conditional onnxsim inclusion logic is correct but could be clearer. The current logic:
- Python 3.12+ with cmake: install onnxsim (will build from source)
- Python 3.12+ without cmake: don't install onnxsim
- Python < 3.12: install onnxsim (from wheel)
Consider this more explicit version for better readability:
-# onnxsim for Python 3.12+ requires CMake to build from source -if is_cmake_installed() and sys.version_info >= (3, 12): - optional_deps["onnx"].append("onnxsim ; platform_machine != 'aarch64'") -else: - optional_deps["onnx"].append( - "onnxsim ; python_version < '3.12' and platform_machine != 'aarch64'" - ) +# onnxsim installation logic: +# - Python 3.12+ requires CMake to build from source (wheel not available) +# - Python < 3.12 can install from wheel (no CMake needed) +# - Not available on aarch64 +if sys.version_info >= (3, 12): + if is_cmake_installed(): + optional_deps["onnx"].append("onnxsim ; platform_machine != 'aarch64'") + else: + print( + "Warning: CMake not found. onnxsim will not be installed for Python 3.12+. " + "Run 'pip install cmake' if you need onnxsim.", + file=sys.stderr + ) +else: + optional_deps["onnx"].append("onnxsim ; platform_machine != 'aarch64'")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.github/workflows/example_tests.yml(1 hunks).github/workflows/gpu_tests.yml(2 hunks).gitlab/tests.yml(3 hunks)CHANGELOG.rst(1 hunks)docs/source/getting_started/_installation_for_Linux.rst(2 hunks)examples/llm_ptq/README.md(1 hunks)pyproject.toml(2 hunks)setup.py(3 hunks)tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py(1 hunks)tox.ini(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- .github/workflows/example_tests.yml
- examples/llm_ptq/README.md
- .gitlab/tests.yml
- docs/source/getting_started/_installation_for_Linux.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (13)
tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1)
114-115: LGTM! Necessary fix for PyTorch 25.08 / CUDA 13 compatibility.Moving the torch tensors to CPU before comparison ensures compatibility with the upgraded PyTorch version. The change is applied consistently to both assertions and preserves the test's correctness.
.github/workflows/gpu_tests.yml (2)
66-66: LGTM: Container image update aligns with PR objectives.The PyTorch container update to 25.08 is consistent with the PR's goal to support CUDA 13 and newer dependencies.
79-79: No changes needed—tox environment is correctly defined.The
py312-gpuenvironment is properly defined in tox.ini at line 62 as[testenv:{py310,py311,py312}-gpu]. The workflow change frompy312-cuda12-gputopy312-gpuis correct and consistent with the tox configuration, which supports the--current-envflag for flexible CUDA version handling.pyproject.toml (2)
5-10: LGTM: CMake requirement properly scoped for Python 3.12+ onnxsim builds.The conditional cmake requirement aligns with the setup.py logic that conditionally includes onnxsim based on cmake availability and Python version.
162-162: LGTM: Bandit exclusion appropriate for build-time detection code.Excluding setup.py from bandit scans is reasonable since it now contains subprocess calls for CUDA and cmake detection during installation, which are expected build-time operations.
CHANGELOG.rst (3)
7-9: LGTM: Breaking change clearly documented with migration path.The change to
cupy-cuda13xas the default is properly flagged as a backward breaking change with clear instructions for CUDA 12 users to migrate.
13-13: LGTM: TensorRT-LLM version update matches PR objectives.The upgrade to TensorRT-LLM 1.1.0rc5 aligns with the PR title and objectives.
17-17: LGTM: Dataset access documentation updated.The nemotron dataset update with gated access via
HF_TOKENis properly documented. While not directly related to CUDA 13 changes, it's a valid addition to the changelog.tox.ini (3)
59-62: LGTM: GPU environment naming generalized for multi-CUDA support.The environment name change from
py312-cuda12-gputopy312-gpuremoves CUDA version specificity, aligning with the workflow changes and supporting the dynamic CUDA detection in setup.py.
71-74: LGTM: Efficient triton handling for NGC containers.Using
pip-mark-installedto skip building triton is an appropriate optimization since NGC PyTorch containers already includepytorch-triton. The installation order is correct (pip-mark-installed installed first on line 72).
42-42: Verify cmake availability for Python 3.12 onnxsim builds.The Python 3.12 onnxsim installation here relies on cmake being available (as specified in pyproject.toml build requirements). Ensure that cmake is installed in the test environment or that the build system has cmake available when running these tests.
setup.py (2)
16-25: LGTM: Comprehensive docstring documents install-time behavior.The docstring clearly explains the CUDA detection logic, the default fallback to CUDA 13, and the cmake requirement for onnxsim with Python 3.12+. The remediation steps for CUDA 12 users are well-documented.
60-71: LGTM: CMake detection is simple and effective.The cmake detection logic is straightforward and appropriate. The timeout and exception handling are reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/gpu_tests.yml(2 hunks).gitlab/tests.yml(3 hunks)docs/source/getting_started/_installation_for_Linux.rst(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/gpu_tests.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (5)
docs/source/getting_started/_installation_for_Linux.rst (3)
21-21: ✓ TensorRT-LLM version bump is correct. Aligns with PR objectives.
44-44: Verify that removing TensorRT library paths from LD_LIBRARY_PATH does not break TensorRT functionality.The change removes
/usr/local/tensorrt/libfromLD_LIBRARY_PATH. Confirm this is intentional—whether TensorRT library discovery is handled elsewhere in the Docker image setup or if this path is no longer needed for the updated environment.
129-132: ✓ Clear documentation of CUDA/Python-specific dependencies.The new section clearly documents:
- Default cupy-cuda13x behavior for INT4 ONNX quantization
- User-friendly instructions for CUDA 12 users
- CMake requirement for Python 3.12+ with onnxsim
Instructions are actionable and well-placed.
.gitlab/tests.yml (2)
1-1: The initial verification shows key image versions and tox targets are consistent. However, the review comment also mentions LD_LIBRARY_PATH settings that need verification. Let me check those:Now let me search directly for LD_LIBRARY_PATH settings in the CI files:
CI configuration consistency verified.
All key configurations checked across
.gitlab/tests.ymland GitHub workflow files show consistency:
- Line 32 (PyTorch image): Both
.gitlab/tests.ymland.github/workflows/gpu_tests.ymlusepytorch:25.08-py3✓- Line 48 (Tox target): Both use
tox -e py312-gpu --current-env✓- Line 65 (TRT-LLM image): Both
.gitlab/tests.ymland.github/workflows/example_tests.ymlusetensorrt-llm/release:1.1.0rc5✓The specified lines are consistent across CI configurations. Docker images typically define
LD_LIBRARY_PATHand other library paths internally (viaENVstatements), which are identical since the same image tags are used.
38-38: LD_LIBRARY_PATH and tox target changes are valid and properly synchronized across CI configurations.Verification confirms the changes are intentional and consistent:
LD_LIBRARY_PATH narrowing is valid:
.gitlab/tests.ymland.github/workflows/gpu_tests.ymlnow have matching, narrowed paths. The example_tests.yml intentionally retains the TensorRT paths for its separate test suite (which needs the trtexec binary). TensorRT libraries are available in the standard paths now included (/usr/lib/x86_64-linux-gnu).Tox target
py312-gpuis properly defined: The target exists in tox.ini line 62 as part of the{py310,py311,py312}-gpuparameterized environment. Both.gitlab/tests.ymland.github/workflows/gpu_tests.ymlconsistently referencepy312-gpu.CI files are synchronized: No discrepancies or missing updates across configurations.
e64f668 to
2d68aff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/llm_sparsity/launch_finetune.sh (1)
94-95: This change does not address the past review's critical issue—deprecated--fsdpsyntax remains.The quoting on Line 94 fixes shell tokenization but does NOT resolve the underlying problem flagged in the past review: the monolithic
--fsdpflag is deprecated in modern Accelerate. The correct fix requires migrating to individual--fsdp_*flags or using a config YAML.Note that Line 95 already uses the modern syntax (
--fsdp_transformer_layer_cls_to_wrap), which indicates the codebase is partially migrated but this line was skipped.Recommended fix: Replace Lines 94–95 with one of the following:
Option 1: Individual modern flags
- --fsdp 'full_shard auto_wrap' \ - --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \ + --fsdp_sharding_strategy full_shard \ + --fsdp_auto_wrap_policy transformer_auto_wrap_policy \ + --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \Option 2: Use a config YAML (if your setup requires complex FSDP tuning)
Createaccelerate_config.yamlwith your FSDP settings and add--config_file accelerate_config.yamlto the command.Please verify against the Accelerate version shipped with TensorRT-LLM 1.1.0rc5 that the monolithic
--fsdpflag is no longer supported.
🧹 Nitpick comments (1)
setup.py (1)
121-127: Clarify onnxsim installation behavior for wheel vs source installs.The conditional onnxsim installation logic checks CMake availability at setup.py runtime. This works correctly for source installations but has different implications for wheel distributions:
Behavior:
- Python < 3.12: onnxsim always included (wheels available from PyPI)
- Python >= 3.12 with CMake: onnxsim included (can build from source)
- Python >= 3.12 without CMake: onnxsim excluded (cannot build)
For source installs: This logic works as intended.
For wheel installs from PyPI: The dependencies are determined during wheel build, not during user installation. Users installing a pre-built wheel will get whatever dependencies were baked in during the build, regardless of their local CMake availability.
Consider documenting this behavior in the docstring or release notes to set proper expectations. Alternatively, you might want to always include onnxsim in the dependencies and let the installation fail with a clear error message prompting users to install CMake if needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/example_tests.yml(1 hunks).github/workflows/gpu_tests.yml(1 hunks).gitlab/tests.yml(2 hunks)CHANGELOG.rst(1 hunks)docs/source/getting_started/_installation_for_Linux.rst(3 hunks)examples/diffusers/quantization/requirements.txt(1 hunks)examples/llm_ptq/README.md(1 hunks)examples/llm_sparsity/launch_finetune.sh(1 hunks)examples/llm_sparsity/requirements.txt(0 hunks)pyproject.toml(2 hunks)setup.py(3 hunks)tests/_test_utils/torch_quantization/onnx_export.py(1 hunks)tests/gpu/onnx/test_plugin.py(1 hunks)tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py(1 hunks)tox.ini(2 hunks)
💤 Files with no reviewable changes (1)
- examples/llm_sparsity/requirements.txt
✅ Files skipped from review due to trivial changes (2)
- tests/gpu/onnx/test_plugin.py
- examples/llm_ptq/README.md
🚧 Files skipped from review as they are similar to previous changes (6)
- .gitlab/tests.yml
- tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py
- pyproject.toml
- tox.ini
- .github/workflows/gpu_tests.yml
- examples/diffusers/quantization/requirements.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (9)
tests/_test_utils/torch_quantization/onnx_export.py (1)
68-68: ****The original review comment incorrectly assumes the
dynamoparameter may not exist in all supported PyTorch versions. However, thedynamoparameter was added to torch.onnx.export in PyTorch 2.5, and since the minimum supported version istorch>=2.6, the parameter is guaranteed to exist in all supported versions.The code change is correct and safe as-is. PyTorch 2.9 changed the default from dynamo=False to dynamo=True, and explicitly setting
dynamo=Falseappropriately preserves the prior behavior. No defensive check is needed because the parameter availability is not a concern.Likely an incorrect or invalid review comment.
.github/workflows/example_tests.yml (1)
69-69: LGTM! TensorRT-LLM upgrade is consistent.The container image upgrade to 1.1.0rc5 aligns with the PR objectives and CHANGELOG documentation.
CHANGELOG.rst (2)
4-4: Planned release date noted.The version date is set to 2025-11-07, which is a future date. This appears to be the planned release date for version 0.39.
9-13: LGTM! Clear documentation of new features.The changelog accurately documents the TensorRT-LLM upgrade and the dataset guidance changes for the nemotron example.
docs/source/getting_started/_installation_for_Linux.rst (3)
21-21: LGTM! Version requirement updated consistently.The TensorRT-LLM version requirement is updated to 1.1.0rc5, consistent with other changes in this PR.
44-44: Verify that simplified LD_LIBRARY_PATH is sufficient.The LD_LIBRARY_PATH has been simplified to only include base system paths. However, the GitHub workflow in
.github/workflows/example_tests.yml(lines 78-79) still includes additional TensorRT-specific paths:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib /usr/local/tensorrt/targets/x86_64-linux-gnu/binPlease verify that the simplified path in the documentation is sufficient for users, or if the TensorRT-specific paths should remain in the documentation as well.
129-131: LGTM! Clear guidance for Python 3.12+ users.The documentation clearly explains the CMake requirement for onnxsim on Python 3.12+, which aligns with the conditional installation logic in setup.py.
setup.py (2)
16-21: LGTM! Clear documentation added.The module docstring now clearly explains the CMake requirement for onnxsim on Python 3.12+.
23-24: LGTM! CMake detection is well-implemented.The
is_cmake_installed()function properly detects CMake availability using subprocess with appropriate timeout and exception handling.Also applies to: 32-43
2d68aff to
5a6c995
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.gitlab/tests.yml (1)
48-48:⚠️ Tox environment name mismatch:py312-cuda12-gpudoes not exist.Line 48 references the tox environment
py312-cuda12-gpu, but based on the previous review, the only GPU testenv defined is the parameterized[testenv:{py310,py311,py312}-gpu], which producespy312-gpu, notpy312-cuda12-gpu. Update this line to usepy312-gpuinstead.examples/llm_sparsity/launch_finetune.sh (1)
94-95: Deprecated--fsdpsyntax: modernize to use individual flags or config file.The addition of quotes to
--fsdp 'full_shard auto_wrap'does not address the underlying issue flagged in the previous review: modern Accelerate (as expected with the CUDA 13 upgrade and updated dependencies) uses individual--fsdp_*CLI flags rather than a monolithic--fsdpflag.Update the script to use one of these approaches:
Option 1 (preferred): Use individual
--fsdp_*flags
Replace line 94–95 with:--fsdp_sharding_strategy full_shard \ --fsdp_auto_wrap_policy transformer_auto_wrap_policy \Option 2: Use Accelerate config YAML
Create an accelerate config file with FSDP settings and use:--config_file path/to/accelerate_fsdp_config.yaml \Remove the current
--fsdpentry and add the appropriate modern flags to ensure compatibility with the updated Accelerate version in the TensorRT-LLM 1.1.0rc5 environment.
🧹 Nitpick comments (1)
examples/diffusers/quantization/requirements.txt (1)
7-9: Reasonable temporary fix for diffusers compatibility.The torch<2.9 and torchvision<0.24.0 constraints appropriately address the known diffusers compatibility issue. The TODO comment is helpful for tracking when this can be re-evaluated.
That said, these are strict version pins that could limit users upgrading dependencies. Ensure this is documented in migration guides or release notes so users understand the constraint and timeline for resolution.
If this requires broader user-facing documentation (migration guide, FAQ, or similar), I can help draft that content. Would you like me to assist?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.github/workflows/example_tests.yml(1 hunks).github/workflows/gpu_tests.yml(1 hunks).gitlab/tests.yml(2 hunks)CHANGELOG.rst(1 hunks)docs/source/getting_started/_installation_for_Linux.rst(2 hunks)examples/diffusers/quantization/requirements.txt(1 hunks)examples/llm_ptq/README.md(1 hunks)examples/llm_sparsity/launch_finetune.sh(1 hunks)examples/llm_sparsity/requirements.txt(0 hunks)tests/_test_utils/torch_quantization/onnx_export.py(1 hunks)tests/gpu/onnx/test_plugin.py(1 hunks)tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py(1 hunks)tox.ini(2 hunks)
💤 Files with no reviewable changes (1)
- examples/llm_sparsity/requirements.txt
✅ Files skipped from review due to trivial changes (1)
- examples/llm_ptq/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/_test_utils/torch_quantization/onnx_export.py
- CHANGELOG.rst
- .github/workflows/gpu_tests.yml
- tox.ini
🧰 Additional context used
🧬 Code graph analysis (1)
tests/gpu/onnx/test_plugin.py (4)
tests/_test_utils/onnx_autocast/utils.py (1)
_assert_tensors_are_fp16(20-27)modelopt/onnx/autocast/convert.py (1)
convert_to_mixed_precision(46-148)modelopt/onnx/autocast/graphsanitizer.py (1)
GraphSanitizer(28-522)modelopt/onnx/quantization/quantize.py (1)
quantize(209-564)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (6)
docs/source/getting_started/_installation_for_Linux.rst (2)
21-21: Version bump to 1.1.0rc5 is correct.Aligns with the PR objectives and downstream CI/example documentation updates.
41-44: LD_LIBRARY_PATH simplification: verify consistency across CI configurations.Line 44 simplifies LD_LIBRARY_PATH to include only base system paths (
/usr/includeand/usr/lib/x86_64-linux-gnu), removing TensorRT-specific paths. Verify this aligns with the runtime environment of the TensorRT-LLM 1.1.0rc5 container and matches the LD_LIBRARY_PATH setup in.github/workflows/example_tests.ymllines 78–79 and.gitlab/tests.ymlline 38..gitlab/tests.yml (2)
38-38: LD_LIBRARY_PATH simplification aligns with documentation.Consistent with the documentation updates in
docs/source/getting_started/_installation_for_Linux.rstand the downstream.github/workflows/example_tests.ymlenvironment setup. The removal of TensorRT-specific paths appears intentional as part of the CUDA 13/TensorRT-LLM 1.1.0rc5 upgrade.
65-65: Version bump to 1.1.0rc5 is correct.Aligns with the PR objectives and documentation updates.
.github/workflows/example_tests.yml (1)
69-69: Container image version bump to 1.1.0rc5 is correct.Consistent with parallel updates across CI configurations (.gitlab/tests.yml, documentation, and examples).
tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1)
114-115: LGTM! Assertions correctly updated to handle tensor device placement.The addition of
.detach().cpu()properly ensures the PyTorch tensors are detached from the computation graph and moved to CPU before comparison. This fix aligns with the TensorRT-LLM upgrade and resolves compatibility issues in the CI/CD pipeline.Optional refinement: When
int4.has_cupyis True, both tensors could remain on GPU for consistency by usingcupy.asarray(wq_torch_awq_lite.detach())instead of.cpu(). However, since this is test code with small arrays, the current approach is perfectly acceptable.
Signed-off-by: Keval Morabia <[email protected]>
5a6c995 to
b0926ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/llm_sparsity/launch_finetune.sh (1)
94-94: Verify that the deprecated--fsdpflag is still supported or upgrade to modern--fsdp_*syntax.A critical concern from a previous review remains unaddressed: the current change adds quoting but does not resolve the underlying issue. According to the past review, modern Accelerate (with CUDA 13 dependencies as mentioned in this PR) uses individual
--fsdp_*CLI flags (e.g.,--fsdp_sharding_strategy full_shard,--fsdp_auto_wrap_policy transformer_auto_wrap_policy), not a monolithic--fsdpflag.The quoting change affects shell tokenization but assumes Accelerate still supports the old combined syntax. If the internal CICD is passing, confirm whether:
- The Accelerate version being used still supports
--fsdp, or- The quoting change inadvertently fixed an issue, or
- A different mechanism is resolving the tests.
To confirm the current state, please run:
Does Hugging Face Accelerate version 0.30+ (CUDA 13 compatible) still support the `--fsdp` flag or only `--fsdp_*` individual flags?Alternatively, if the deprecated syntax is no longer supported, consider migrating to the modern approach:
- --fsdp 'full_shard auto_wrap' \ - --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \ + --fsdp_sharding_strategy full_shard \ + --fsdp_auto_wrap_policy transformer_auto_wrap_policy \ + --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \Or consider using a config file with
--config_fileto specify FSDP settings.
🧹 Nitpick comments (1)
tests/_test_utils/torch_quantization/onnx_export.py (1)
68-68: Consider adding a version compatibility check for thedynamoparameter.Similar to the
enable_onnx_checkerparameter check on lines 56-59, thedynamoparameter should be checked for existence before use to ensure compatibility with older PyTorch versions that may not support it.Apply this diff to add the version check:
if "enable_onnx_checker" in inspect.signature(torch.onnx.export).parameters: kwargs = {"enable_onnx_checker": False} else: kwargs = {} + if "dynamo" in inspect.signature(torch.onnx.export).parameters: + kwargs["dynamo"] = False torch.onnx.export( model, dummy_input, f=buffer, opset_version=OPSET, input_names=input_names, output_names=output_names, do_constant_folding=constant_folding, - dynamo=False, **kwargs, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.github/workflows/gpu_tests.yml(1 hunks).gitlab/tests.yml(1 hunks)CHANGELOG.rst(1 hunks)docs/source/getting_started/_installation_for_Linux.rst(1 hunks)examples/diffusers/quantization/requirements.txt(1 hunks)examples/llm_sparsity/launch_finetune.sh(1 hunks)examples/llm_sparsity/requirements.txt(0 hunks)tests/_test_utils/torch_quantization/onnx_export.py(1 hunks)tests/gpu/onnx/test_plugin.py(1 hunks)tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py(1 hunks)tox.ini(2 hunks)
💤 Files with no reviewable changes (1)
- examples/llm_sparsity/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (4)
- .github/workflows/gpu_tests.yml
- tests/gpu/onnx/test_plugin.py
- CHANGELOG.rst
- examples/diffusers/quantization/requirements.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (7)
tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1)
114-115: LGTM! Correct device handling for tensor comparisons.The addition of
.detach().cpu()properly handles the comparison between GPU torch tensors and CPU numpy/cupy arrays, preventing device mismatch errors while also detaching from the computation graph..gitlab/tests.yml (1)
36-41: LD_LIBRARY_PATH simplification aligns with PR objectives.Removal of TensorRT-specific library paths (
/usr/local/tensorrt/targets/x86_64-linux-gnu/lib,/usr/local/tensorrt/bin) from LD_LIBRARY_PATH is consistent across CI configs and documentation. The simplified path setup supports the PR's goal of reducing TensorRT runtime dependencies in GPU test environments. Since internal CICD is passing, this change is validated.docs/source/getting_started/_installation_for_Linux.rst (1)
39-45: Documentation environment setup aligns with CI changes.The LD_LIBRARY_PATH modification in the Docker environment setup section matches the CI changes (.gitlab/tests.yml:38). Documentation and implementation are now consistent regarding simplified library path configuration.
tox.ini (4)
62-64: Minor documentation improvement in GPU environment header.Changing "Can be used with" to "Should be used with" is a minor but appropriate clarification that the
--current-envflag is the expected usage pattern for these GPU environments.
74-76: Triton installation approach uses container-provided version.Switching from an explicit
triton==3.4.0constraint to marking triton as already-installed viapip-mark-installedis a sound approach for NGC PyTorch containers, which providepytorch-tritonpre-installed. This avoids version conflicts and redundant builds.However, verify that
pip-mark-installedis available in the container or will be installed from PyPI without issues before this command runs.
80-80: Verify Eagle-3 test dependency additions.The addition of
tiktoken,blobfile, andsentencepieceappears to support Eagle-3 test dependencies. Confirm these packages:
- Are actually required for Eagle-3 tests and not redundant with other test dependencies
- Don't create version conflicts with other installed packages
- Should remain hardcoded in
tox.inior should be managed viasetup.pyoptional dependency groups (e.g.,[all,dev-test])Consider whether these belong in the packaging metadata rather than CI-specific environment setup.
83-83: Conditional onnxsim install for py312 mirrors unit test environment.The py312-specific
onnxsiminstallation in the GPU environment mirrors the same pattern in the unit test environment (line 21). This is consistent and appropriate for building onnxsim from source on Python 3.12.
What does this PR do?
Testing
Before your PR is "Ready for review"
Summary by CodeRabbit
Documentation
Chores