Skip to content

Conversation

@kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Oct 17, 2025

What does this PR do?

  • Fix diffusers (pin torch<2.9) and sparsity tests

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: Yes

Summary by CodeRabbit

  • Documentation

    • Version 0.39 released (2025-11-07)
    • Clarified dataset gating mechanism for example datasets, now specifying HF_TOKEN environment variable requirement
  • Chores

    • Updated torch and torchvision version constraints
    • Removed transformers dependency from example configurations

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

Walkthrough

Removes TensorRT paths from GPU test environments and workflows, updates dependency version constraints in example projects, modifies ONNX export parameters, and reorganizes test imports.

Changes

Cohort / File(s) Summary
TensorRT Path Removal
.github/workflows/gpu_tests.yml, .gitlab/tests.yml, docs/source/getting_started/_installation_for_Linux.rst
Removed TensorRT lib and bin directory entries from LD_LIBRARY_PATH and PATH environment variables across GPU test workflows and installation documentation.
Dependency Constraints
examples/llm_sparsity/requirements.txt
Removed transformers>=4.57.0 dependency.
Dependency Constraints
examples/diffusers/quantization/requirements.txt
Added version constraints: torch<2.9 and torchvision<0.24.0 with TODO comment regarding torch 2.9 compatibility.
GPU Test Configuration
tox.ini
Updated GPU test environment setup: replaced explicit Triton 3.4.0 constraint with pip-mark-installed flow, added conditional onnxsim installation for py312, and added dependencies for tiktoken, blobfile, and sentencepiece.
Script Arguments
examples/llm_sparsity/launch_finetune.sh
Reformatted Accelerate FSDP argument from three separate tokens to single quoted string: --fsdp 'full_shard auto_wrap'.
ONNX Export and Tests
tests/_test_utils/torch_quantization/onnx_export.py
Added dynamo=False parameter to torch.onnx.export() to override Torch 2.9's default behavior.
ONNX Tests
tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py
Added CPU transfer in forward comparison assertions and removed ONNX version guard skip.
Test Import Organization
tests/gpu/onnx/test_plugin.py
Moved imports (_assert_tensors_are_fp16, _assert_nodes_are_quantized, convert_to_mixed_precision, GraphSanitizer, quantize, load_onnx_model) from function scope to module scope.
Documentation
CHANGELOG.rst
Updated version date to 2025-11-07 and clarified dataset gating mechanism to explicitly reference HF_TOKEN environment variable.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

These changes consist primarily of repetitive path removals (TensorRT from multiple files), straightforward dependency constraint adjustments, import reorganization, and parameter additions. While spread across multiple files, the modifications follow consistent patterns with minimal logic density.

Poem

🐰 hop hop Through paths we trim with care,
TensorRT gone, lighter air!
Constraints we add, dependencies align,
Tests now shine with torch divine,
Exports flow smooth, our code is fine! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Fix failing CICD nightly tests" is related to actual changes present in the changeset, specifically the updates to GPU test workflows, environment configuration, and test file modifications aimed at resolving test failures. However, based on the PR objectives, the title appears to address only a secondary aspect of the work. The primary stated objective is "Upgrade TRT-LLM Docker image to 1.1.0rc5," followed by fixing CICD tests, yet the title entirely omits the version upgrade. While the principle acknowledges that titles don't need to cover every detail, they should highlight the most important change from the developer's perspective, which in this case appears to be the TRT-LLM version bump.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmorabia/cuda13-cicd

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/_test_utils/torch_quantization/onnx_export.py (1)

60-70: Guard dynamo parameter for PyTorch versions before 2.5.

The dynamo parameter for torch.onnx.export was introduced in PyTorch 2.5 and made the default (dynamo=True) in PyTorch 2.9. Hardcoding dynamo=False without a version check will raise a TypeError on PyTorch < 2.5. Add a guard check similar to the existing enable_onnx_checker pattern (lines 56–59):

if "enable_onnx_checker" in inspect.signature(torch.onnx.export).parameters:
    kwargs = {"enable_onnx_checker": False}
else:
    kwargs = {}
+if "dynamo" in inspect.signature(torch.onnx.export).parameters:
+    kwargs["dynamo"] = False  # torch 2.9 flips default to True
torch.onnx.export(
    model,
    dummy_input,
    f=buffer,
    opset_version=OPSET,
    input_names=input_names,
    output_names=output_names,
    do_constant_folding=constant_folding,
-    dynamo=False,  # torch 2.9 flips default to True
    **kwargs,
)

Also simplify the providers assignment at line 77 (inside the device == "cpu" block):

-providers = ["CUDAExecutionProvider"] if device != "cpu" else ["CPUExecutionProvider"]
+providers = ["CPUExecutionProvider"]
tox.ini (1)

5-5: Update envlist to reference cuda13-gpu environment.

Line 5 still references py312-cuda12-gpu, but the GPU test environment is now defined as py312-cuda13-gpu on line 66. The envlist will fail to locate the environment when running default tox targets.

Apply this diff:

 envlist=
     pre-commit-all
     py312-torch28-tf_latest-unit
-    py312-cuda12-gpu
+    py312-cuda13-gpu
🧹 Nitpick comments (4)
docs/source/getting_started/_installation_for_Linux.rst (1)

21-21: Version table update looks good.

TRT-LLM 1.2.0rc0.post1 aligns with examples/workflows. Consider cross-linking to the NGC tag list next to the table entry for quick discovery.

.github/workflows/example_tests.yml (1)

69-73: Container tag bump LGTM; add a precompile warm-up to fail fast on kernels.

Add a short precompile step to surface Triton/quantization kernel build issues early and cache artifacts for the job.

       - name: Setup environment variables
         run: |
           echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV
           echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV
+      - name: Precompile ModelOpt quantization kernels
+        run: |
+          python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"
CHANGELOG.rst (1)

7-14: Changelog entries read well and match the PR.

The breaking change note for defaulting to cupy-cuda13x and the TRT‑LLM 1.2.0rc0.post1 upgrade align with the code/workflows. Consider linking to the install doc section that mentions Docker image selection for CUDA 13 to help users navigate.

setup.py (1)

32-53: Optional: add an extra, cheap guard before calling nvcc.

A quick shutil.which("nvcc") check avoids spawning a subprocess when nvcc isn’t present, and makes intent explicit.

+import shutil
@@
-    try:
-        result = subprocess.run(
+    try:
+        if shutil.which("nvcc") is None:
+            return None
+        result = subprocess.run(
             ["nvcc", "--version"],
             capture_output=True,
             text=True,
             timeout=5,
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8a9353 and f00d2ca.

📒 Files selected for processing (10)
  • .github/workflows/example_tests.yml (1 hunks)
  • .github/workflows/gpu_tests.yml (2 hunks)
  • .gitlab/tests.yml (3 hunks)
  • CHANGELOG.rst (1 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (1 hunks)
  • examples/llm_ptq/README.md (1 hunks)
  • pyproject.toml (1 hunks)
  • setup.py (2 hunks)
  • tests/_test_utils/torch_quantization/onnx_export.py (1 hunks)
  • tox.ini (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (9)
examples/llm_ptq/README.md (1)

30-31: Image tag bump LGTM; verify NeMo tag consistency.

TRT-LLM tag 1.2.0rc0.post1 is correct. Please confirm if NeMo 25.07 is still the intended baseline under CUDA 13, or update to 25.08 for consistency across docs.

pyproject.toml (1)

157-157: Remove setup.py from Bandit exclusion to maintain security scanning of production code.

The review comment is valid. setup.py contains subprocess usage in the get_cuda_major_version() function, which is production code and should not be excluded from security scanning alongside test and example directories. The subprocess call uses safe parameters: a list of arguments, capture_output=True, text=True, and timeout=5, meaning it does not require exclusion. Per the codebase policy (visible in pyproject.toml), global Bandit rule skips require special approval and should remain consistent with NVIDIA's CI/CD config, making inline documentation the appropriate approach if needed.

Removing setup.py from exclude_dirs is the correct change.

setup.py (2)

16-21: Fix docstring mismatch on fallback CUDA version.

Text says “fallback to CUDA 13 by default” but earlier line implies CUDA 12. Make the message consistent to avoid confusion.

-If installing from a wheel, cupy for CUDA 13 is installed by default.
-    If you have CUDA 12, you need to run `pip uninstall -y cupy-cuda13x` and `pip install cupy-cuda12x` separately.
+If installing from a wheel, cupy for CUDA 13 is installed by default.
+If you have CUDA 12, run: `pip uninstall -y cupy-cuda13x && pip install cupy-cuda12x`.

Likely an incorrect or invalid review comment.


128-137: Review comment is based on incorrect assumptions about cupy-cuda Windows availability.

CuPy provides native Windows support with prebuilt Windows pip wheels, with official Windows support added around CuPy v9 (2021). The review comment claims cupy-cuda is "not available/needed" on Windows, but Windows wheels (package names like cupy-cuda11x or cupy-cuda12x) are available for matching CUDA toolkit versions.

While the codebase does exclude certain GPU packages from Windows (onnxruntime-gpu, deepspeed), this does not mean cupy-cuda is unavailable or problematic on Windows. The presence of onnxruntime-directml for Windows serves a different purpose and doesn't contradict cupy-cuda's availability.

No evidence is provided that the current marker causes "resolution failures" on Windows. The proposed restriction to Linux x86_64 lacks justification given that cupy-cuda has official Windows support.

Likely an incorrect or invalid review comment.

tox.ini (2)

20-22: Consistent py312 onnxsim handling.

Adding py312-specific onnxsim build aligns with the same approach used in other environments (lines 45-46) and mirrors the workaround for upstream ONNX simplifier limitations.


75-76: Explicit triton pin for NGC PyTorch 25.08 compatibility.

Pinning triton==3.3.1 explicitly (replacing a previous constraint) aligns with the PyTorch Triton version in the NGC PyTorch 25.08 container and helps prevent mamba from installing an incompatible version. This is a reasonable safeguard for the GPU test environment.

.github/workflows/gpu_tests.yml (1)

66-66: Consistent container and test environment upgrade to CUDA 13.

Both the NGC PyTorch container image (line 66: 25.08-py3) and tox environment (line 79: py312-cuda13-gpu) are aligned for CUDA 13. This consistency ensures the CI environment and test execution match the target PyTorch/CUDA ecosystem.

Also applies to: 79-79

.gitlab/tests.yml (2)

32-32: Consistent container and test environment upgrade to CUDA 13.

Both the NGC PyTorch container image (line 32: 25.08-py3) and tox environment (line 50: py312-cuda13-gpu) are aligned for CUDA 13, matching the corresponding updates in .github/workflows/gpu_tests.yml.

Also applies to: 50-50


67-67: TRT-LLM image updated to target release.

Updated the example-trtllm job container from 1.1.0rc2.post2 to 1.2.0rc0.post1, aligning with the PR's primary objective to bump TensorRT-LLM to the new version with CUDA 13 support.

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.38%. Comparing base (6ef9954) to head (b0926ac).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #445   +/-   ##
=======================================
  Coverage   73.38%   73.38%           
=======================================
  Files         180      180           
  Lines       17942    17942           
=======================================
  Hits        13166    13166           
  Misses       4776     4776           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner October 17, 2025 18:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pyproject.toml (1)

157-157: Confirm that excluding setup.py is the appropriate security posture.

The addition of "setup.py" to exclude_dirs will suppress all Bandit security analysis on that file. While this is reasonable for masking false positives in runtime CUDA detection logic, excluding an entire file should be a last resort.

Consider whether using targeted # nosec comments in setup.py for specific lines that trigger Bandit warnings would be preferable, as it maintains security visibility over the file while allowing legitimate issues to be detected.

Verify that this exclusion doesn't inadvertently mask real security concerns introduced by the new CUDA detection logic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f00d2ca and 01f08d6.

📒 Files selected for processing (12)
  • .github/workflows/example_tests.yml (1 hunks)
  • .github/workflows/gpu_tests.yml (2 hunks)
  • .gitlab/tests.yml (3 hunks)
  • CHANGELOG.rst (1 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (1 hunks)
  • examples/llm_ptq/README.md (1 hunks)
  • examples/llm_sparsity/launch_finetune.sh (1 hunks)
  • examples/llm_sparsity/requirements.txt (0 hunks)
  • pyproject.toml (1 hunks)
  • setup.py (2 hunks)
  • tests/_test_utils/torch_quantization/onnx_export.py (1 hunks)
  • tox.ini (3 hunks)
💤 Files with no reviewable changes (1)
  • examples/llm_sparsity/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (7)
  • .github/workflows/example_tests.yml
  • setup.py
  • docs/source/getting_started/_installation_for_Linux.rst
  • tox.ini
  • CHANGELOG.rst
  • tests/_test_utils/torch_quantization/onnx_export.py
  • .github/workflows/gpu_tests.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (3)
examples/llm_ptq/README.md (1)

30-30: Docker image version update is correct and consistent.

The TensorRT-LLM image tag bump from 1.1.0rc2.post2 to 1.2.0rc0.post1 aligns with the PR objectives and is the appropriate update for this documentation.

.gitlab/tests.yml (2)

32-32: Version updates are consistent and aligned with PR objectives.

The PyTorch image bump (25.06 → 25.08), CUDA target migration (12 → 13), and TensorRT-LLM version bump (1.1.0rc2.post2 → 1.2.0rc0.post1) are all coordinated and reflect the stated objectives of this PR.

Also applies to: 50-50, 67-67


1-1: Inconsistency found in example test container image—requires manual verification.

Verification found that:

setup.py CUDA detection: Confirmed get_cuda_major_version() function implements runtime CUDA major-version detection with fallback to CUDA 13.

GPU workflows consistency: .github/workflows/gpu_tests.yml uses pytorch:25.08-py3 and py312-cuda13-gpu, matching .gitlab/tests.yml.

Unit tests: .github/workflows/unit_tests.yml appropriately uses ubuntu-latest runners without GPU/CUDA (correct for CPU-only tests).

Example tests image mismatch: .github/workflows/example_tests.yml uses nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1, but .gitlab/tests.yml line 75 uses nvcr.io/nvidia/tensorrt:25.08-py3. Determine whether this difference is intentional (separate TensorRT vs TensorRT-LLM purposes) or a configuration drift that needs alignment.

Copy link
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we keep the current TRT LLM docker as 1.2.0RC0 has a bug on multi TP

@kevalmorabia97 kevalmorabia97 changed the title Bump trtllm to 1.2.0rc0.post1 and pytorch to 25.08 for cuda 13 Bump trtllm to 1.1.0rc5 and pytorch to 25.08 with cuda 13 fixes Oct 18, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
setup.py (2)

37-57: Consider improving CUDA detection robustness.

The CUDA detection has several limitations:

  1. Detection failure defaults to CUDA 13 (line 153), which may cause issues for CUDA 12 users who don't have nvcc in PATH but have CUDA 12 installed.
  2. Single detection method: Only checks nvcc --version. Consider adding fallback checks:
    • nvidia-smi output
    • CUDA_HOME or CUDA_PATH environment variables
    • PyTorch's torch.version.cuda if torch is already installed
  3. Broad exception handling: except Exception: (line 54) silently catches all errors, making debugging difficult.

Consider this enhanced version:

 def get_cuda_major_version() -> int | None:
     """Return CUDA major version installed on the system or None if detection fails."""
+    # Try nvcc first
     try:
         result = subprocess.run(
             ["nvcc", "--version"],
             capture_output=True,
             text=True,
             timeout=5,
         )
         if result.returncode == 0:
-            # Parse output like "release 12.0, V12.0.140" or "release 13.0, V13.0.0"
             for line in result.stdout.split("\n"):
                 if "release" in line.lower():
                     match = re.search(r"release (\d+)\.", line)
                     if match:
                         return int(match.group(1))
-    except Exception:
-        pass
+    except (FileNotFoundError, subprocess.TimeoutExpired, subprocess.SubprocessError) as e:
+        print(f"Warning: nvcc detection failed: {e}", file=sys.stderr)
+
+    # Try nvidia-smi as fallback
+    try:
+        result = subprocess.run(
+            ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+        if result.returncode == 0:
+            # CUDA version can be inferred from driver version
+            # This is a rough heuristic
+            pass  # Implement if needed
+    except Exception as e:
+        print(f"Warning: nvidia-smi detection failed: {e}", file=sys.stderr)

     return None

160-166: Consider simplifying the conditional onnxsim logic.

The conditional onnxsim inclusion logic is correct but could be clearer. The current logic:

  • Python 3.12+ with cmake: install onnxsim (will build from source)
  • Python 3.12+ without cmake: don't install onnxsim
  • Python < 3.12: install onnxsim (from wheel)

Consider this more explicit version for better readability:

-# onnxsim for Python 3.12+ requires CMake to build from source
-if is_cmake_installed() and sys.version_info >= (3, 12):
-    optional_deps["onnx"].append("onnxsim ; platform_machine != 'aarch64'")
-else:
-    optional_deps["onnx"].append(
-        "onnxsim ; python_version < '3.12' and platform_machine != 'aarch64'"
-    )
+# onnxsim installation logic:
+# - Python 3.12+ requires CMake to build from source (wheel not available)
+# - Python < 3.12 can install from wheel (no CMake needed)
+# - Not available on aarch64
+if sys.version_info >= (3, 12):
+    if is_cmake_installed():
+        optional_deps["onnx"].append("onnxsim ; platform_machine != 'aarch64'")
+    else:
+        print(
+            "Warning: CMake not found. onnxsim will not be installed for Python 3.12+. "
+            "Run 'pip install cmake' if you need onnxsim.",
+            file=sys.stderr
+        )
+else:
+    optional_deps["onnx"].append("onnxsim ; platform_machine != 'aarch64'")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cba8c0 and 5ccb9c2.

📒 Files selected for processing (10)
  • .github/workflows/example_tests.yml (1 hunks)
  • .github/workflows/gpu_tests.yml (2 hunks)
  • .gitlab/tests.yml (3 hunks)
  • CHANGELOG.rst (1 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (2 hunks)
  • examples/llm_ptq/README.md (1 hunks)
  • pyproject.toml (2 hunks)
  • setup.py (3 hunks)
  • tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1 hunks)
  • tox.ini (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • .github/workflows/example_tests.yml
  • examples/llm_ptq/README.md
  • .gitlab/tests.yml
  • docs/source/getting_started/_installation_for_Linux.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (13)
tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1)

114-115: LGTM! Necessary fix for PyTorch 25.08 / CUDA 13 compatibility.

Moving the torch tensors to CPU before comparison ensures compatibility with the upgraded PyTorch version. The change is applied consistently to both assertions and preserves the test's correctness.

.github/workflows/gpu_tests.yml (2)

66-66: LGTM: Container image update aligns with PR objectives.

The PyTorch container update to 25.08 is consistent with the PR's goal to support CUDA 13 and newer dependencies.


79-79: No changes needed—tox environment is correctly defined.

The py312-gpu environment is properly defined in tox.ini at line 62 as [testenv:{py310,py311,py312}-gpu]. The workflow change from py312-cuda12-gpu to py312-gpu is correct and consistent with the tox configuration, which supports the --current-env flag for flexible CUDA version handling.

pyproject.toml (2)

5-10: LGTM: CMake requirement properly scoped for Python 3.12+ onnxsim builds.

The conditional cmake requirement aligns with the setup.py logic that conditionally includes onnxsim based on cmake availability and Python version.


162-162: LGTM: Bandit exclusion appropriate for build-time detection code.

Excluding setup.py from bandit scans is reasonable since it now contains subprocess calls for CUDA and cmake detection during installation, which are expected build-time operations.

CHANGELOG.rst (3)

7-9: LGTM: Breaking change clearly documented with migration path.

The change to cupy-cuda13x as the default is properly flagged as a backward breaking change with clear instructions for CUDA 12 users to migrate.


13-13: LGTM: TensorRT-LLM version update matches PR objectives.

The upgrade to TensorRT-LLM 1.1.0rc5 aligns with the PR title and objectives.


17-17: LGTM: Dataset access documentation updated.

The nemotron dataset update with gated access via HF_TOKEN is properly documented. While not directly related to CUDA 13 changes, it's a valid addition to the changelog.

tox.ini (3)

59-62: LGTM: GPU environment naming generalized for multi-CUDA support.

The environment name change from py312-cuda12-gpu to py312-gpu removes CUDA version specificity, aligning with the workflow changes and supporting the dynamic CUDA detection in setup.py.


71-74: LGTM: Efficient triton handling for NGC containers.

Using pip-mark-installed to skip building triton is an appropriate optimization since NGC PyTorch containers already include pytorch-triton. The installation order is correct (pip-mark-installed installed first on line 72).


42-42: Verify cmake availability for Python 3.12 onnxsim builds.

The Python 3.12 onnxsim installation here relies on cmake being available (as specified in pyproject.toml build requirements). Ensure that cmake is installed in the test environment or that the build system has cmake available when running these tests.

setup.py (2)

16-25: LGTM: Comprehensive docstring documents install-time behavior.

The docstring clearly explains the CUDA detection logic, the default fallback to CUDA 13, and the cmake requirement for onnxsim with Python 3.12+. The remediation steps for CUDA 12 users are well-documented.


60-71: LGTM: CMake detection is simple and effective.

The cmake detection logic is straightforward and appropriate. The timeout and exception handling are reasonable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ccb9c2 and e64f668.

📒 Files selected for processing (3)
  • .github/workflows/gpu_tests.yml (2 hunks)
  • .gitlab/tests.yml (3 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/gpu_tests.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (5)
docs/source/getting_started/_installation_for_Linux.rst (3)

21-21: ✓ TensorRT-LLM version bump is correct. Aligns with PR objectives.


44-44: Verify that removing TensorRT library paths from LD_LIBRARY_PATH does not break TensorRT functionality.

The change removes /usr/local/tensorrt/lib from LD_LIBRARY_PATH. Confirm this is intentional—whether TensorRT library discovery is handled elsewhere in the Docker image setup or if this path is no longer needed for the updated environment.


129-132: ✓ Clear documentation of CUDA/Python-specific dependencies.

The new section clearly documents:

  • Default cupy-cuda13x behavior for INT4 ONNX quantization
  • User-friendly instructions for CUDA 12 users
  • CMake requirement for Python 3.12+ with onnxsim

Instructions are actionable and well-placed.

.gitlab/tests.yml (2)

1-1: The initial verification shows key image versions and tox targets are consistent. However, the review comment also mentions LD_LIBRARY_PATH settings that need verification. Let me check those:

Now let me search directly for LD_LIBRARY_PATH settings in the CI files:

CI configuration consistency verified.

All key configurations checked across .gitlab/tests.yml and GitHub workflow files show consistency:

  • Line 32 (PyTorch image): Both .gitlab/tests.yml and .github/workflows/gpu_tests.yml use pytorch:25.08-py3
  • Line 48 (Tox target): Both use tox -e py312-gpu --current-env
  • Line 65 (TRT-LLM image): Both .gitlab/tests.yml and .github/workflows/example_tests.yml use tensorrt-llm/release:1.1.0rc5

The specified lines are consistent across CI configurations. Docker images typically define LD_LIBRARY_PATH and other library paths internally (via ENV statements), which are identical since the same image tags are used.


38-38: LD_LIBRARY_PATH and tox target changes are valid and properly synchronized across CI configurations.

Verification confirms the changes are intentional and consistent:

  1. LD_LIBRARY_PATH narrowing is valid: .gitlab/tests.yml and .github/workflows/gpu_tests.yml now have matching, narrowed paths. The example_tests.yml intentionally retains the TensorRT paths for its separate test suite (which needs the trtexec binary). TensorRT libraries are available in the standard paths now included (/usr/lib/x86_64-linux-gnu).

  2. Tox target py312-gpu is properly defined: The target exists in tox.ini line 62 as part of the {py310,py311,py312}-gpu parameterized environment. Both .gitlab/tests.yml and .github/workflows/gpu_tests.yml consistently reference py312-gpu.

  3. CI files are synchronized: No discrepancies or missing updates across configurations.

@kevalmorabia97 kevalmorabia97 changed the title Bump trtllm to 1.1.0rc5 and pytorch to 25.08 with cuda 13 fixes Bump TRT-LLM to 1.1.0rc5 + fix failing CICD tests Oct 18, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
examples/llm_sparsity/launch_finetune.sh (1)

94-95: This change does not address the past review's critical issue—deprecated --fsdp syntax remains.

The quoting on Line 94 fixes shell tokenization but does NOT resolve the underlying problem flagged in the past review: the monolithic --fsdp flag is deprecated in modern Accelerate. The correct fix requires migrating to individual --fsdp_* flags or using a config YAML.

Note that Line 95 already uses the modern syntax (--fsdp_transformer_layer_cls_to_wrap), which indicates the codebase is partially migrated but this line was skipped.

Recommended fix: Replace Lines 94–95 with one of the following:

Option 1: Individual modern flags

-    --fsdp 'full_shard auto_wrap' \
-    --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \
+    --fsdp_sharding_strategy full_shard \
+    --fsdp_auto_wrap_policy transformer_auto_wrap_policy \
+    --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \

Option 2: Use a config YAML (if your setup requires complex FSDP tuning)
Create accelerate_config.yaml with your FSDP settings and add --config_file accelerate_config.yaml to the command.

Please verify against the Accelerate version shipped with TensorRT-LLM 1.1.0rc5 that the monolithic --fsdp flag is no longer supported.

🧹 Nitpick comments (1)
setup.py (1)

121-127: Clarify onnxsim installation behavior for wheel vs source installs.

The conditional onnxsim installation logic checks CMake availability at setup.py runtime. This works correctly for source installations but has different implications for wheel distributions:

Behavior:

  • Python < 3.12: onnxsim always included (wheels available from PyPI)
  • Python >= 3.12 with CMake: onnxsim included (can build from source)
  • Python >= 3.12 without CMake: onnxsim excluded (cannot build)

For source installs: This logic works as intended.

For wheel installs from PyPI: The dependencies are determined during wheel build, not during user installation. Users installing a pre-built wheel will get whatever dependencies were baked in during the build, regardless of their local CMake availability.

Consider documenting this behavior in the docstring or release notes to set proper expectations. Alternatively, you might want to always include onnxsim in the dependencies and let the installation fail with a clear error message prompting users to install CMake if needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e64f668 and 2d68aff.

📒 Files selected for processing (15)
  • .github/workflows/example_tests.yml (1 hunks)
  • .github/workflows/gpu_tests.yml (1 hunks)
  • .gitlab/tests.yml (2 hunks)
  • CHANGELOG.rst (1 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (3 hunks)
  • examples/diffusers/quantization/requirements.txt (1 hunks)
  • examples/llm_ptq/README.md (1 hunks)
  • examples/llm_sparsity/launch_finetune.sh (1 hunks)
  • examples/llm_sparsity/requirements.txt (0 hunks)
  • pyproject.toml (2 hunks)
  • setup.py (3 hunks)
  • tests/_test_utils/torch_quantization/onnx_export.py (1 hunks)
  • tests/gpu/onnx/test_plugin.py (1 hunks)
  • tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1 hunks)
  • tox.ini (2 hunks)
💤 Files with no reviewable changes (1)
  • examples/llm_sparsity/requirements.txt
✅ Files skipped from review due to trivial changes (2)
  • tests/gpu/onnx/test_plugin.py
  • examples/llm_ptq/README.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • .gitlab/tests.yml
  • tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py
  • pyproject.toml
  • tox.ini
  • .github/workflows/gpu_tests.yml
  • examples/diffusers/quantization/requirements.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (9)
tests/_test_utils/torch_quantization/onnx_export.py (1)

68-68: ****

The original review comment incorrectly assumes the dynamo parameter may not exist in all supported PyTorch versions. However, the dynamo parameter was added to torch.onnx.export in PyTorch 2.5, and since the minimum supported version is torch>=2.6, the parameter is guaranteed to exist in all supported versions.

The code change is correct and safe as-is. PyTorch 2.9 changed the default from dynamo=False to dynamo=True, and explicitly setting dynamo=False appropriately preserves the prior behavior. No defensive check is needed because the parameter availability is not a concern.

Likely an incorrect or invalid review comment.

.github/workflows/example_tests.yml (1)

69-69: LGTM! TensorRT-LLM upgrade is consistent.

The container image upgrade to 1.1.0rc5 aligns with the PR objectives and CHANGELOG documentation.

CHANGELOG.rst (2)

4-4: Planned release date noted.

The version date is set to 2025-11-07, which is a future date. This appears to be the planned release date for version 0.39.


9-13: LGTM! Clear documentation of new features.

The changelog accurately documents the TensorRT-LLM upgrade and the dataset guidance changes for the nemotron example.

docs/source/getting_started/_installation_for_Linux.rst (3)

21-21: LGTM! Version requirement updated consistently.

The TensorRT-LLM version requirement is updated to 1.1.0rc5, consistent with other changes in this PR.


44-44: Verify that simplified LD_LIBRARY_PATH is sufficient.

The LD_LIBRARY_PATH has been simplified to only include base system paths. However, the GitHub workflow in .github/workflows/example_tests.yml (lines 78-79) still includes additional TensorRT-specific paths:

/usr/local/tensorrt/targets/x86_64-linux-gnu/lib
/usr/local/tensorrt/targets/x86_64-linux-gnu/bin

Please verify that the simplified path in the documentation is sufficient for users, or if the TensorRT-specific paths should remain in the documentation as well.


129-131: LGTM! Clear guidance for Python 3.12+ users.

The documentation clearly explains the CMake requirement for onnxsim on Python 3.12+, which aligns with the conditional installation logic in setup.py.

setup.py (2)

16-21: LGTM! Clear documentation added.

The module docstring now clearly explains the CMake requirement for onnxsim on Python 3.12+.


23-24: LGTM! CMake detection is well-implemented.

The is_cmake_installed() function properly detects CMake availability using subprocess with appropriate timeout and exception handling.

Also applies to: 32-43

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.gitlab/tests.yml (1)

48-48: ⚠️ Tox environment name mismatch: py312-cuda12-gpu does not exist.

Line 48 references the tox environment py312-cuda12-gpu, but based on the previous review, the only GPU testenv defined is the parameterized [testenv:{py310,py311,py312}-gpu], which produces py312-gpu, not py312-cuda12-gpu. Update this line to use py312-gpu instead.

examples/llm_sparsity/launch_finetune.sh (1)

94-95: Deprecated --fsdp syntax: modernize to use individual flags or config file.

The addition of quotes to --fsdp 'full_shard auto_wrap' does not address the underlying issue flagged in the previous review: modern Accelerate (as expected with the CUDA 13 upgrade and updated dependencies) uses individual --fsdp_* CLI flags rather than a monolithic --fsdp flag.

Update the script to use one of these approaches:

Option 1 (preferred): Use individual --fsdp_* flags
Replace line 94–95 with:

    --fsdp_sharding_strategy full_shard \
    --fsdp_auto_wrap_policy transformer_auto_wrap_policy \

Option 2: Use Accelerate config YAML
Create an accelerate config file with FSDP settings and use:

    --config_file path/to/accelerate_fsdp_config.yaml \

Remove the current --fsdp entry and add the appropriate modern flags to ensure compatibility with the updated Accelerate version in the TensorRT-LLM 1.1.0rc5 environment.

🧹 Nitpick comments (1)
examples/diffusers/quantization/requirements.txt (1)

7-9: Reasonable temporary fix for diffusers compatibility.

The torch<2.9 and torchvision<0.24.0 constraints appropriately address the known diffusers compatibility issue. The TODO comment is helpful for tracking when this can be re-evaluated.

That said, these are strict version pins that could limit users upgrading dependencies. Ensure this is documented in migration guides or release notes so users understand the constraint and timeline for resolution.

If this requires broader user-facing documentation (migration guide, FAQ, or similar), I can help draft that content. Would you like me to assist?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d68aff and 5a6c995.

📒 Files selected for processing (13)
  • .github/workflows/example_tests.yml (1 hunks)
  • .github/workflows/gpu_tests.yml (1 hunks)
  • .gitlab/tests.yml (2 hunks)
  • CHANGELOG.rst (1 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (2 hunks)
  • examples/diffusers/quantization/requirements.txt (1 hunks)
  • examples/llm_ptq/README.md (1 hunks)
  • examples/llm_sparsity/launch_finetune.sh (1 hunks)
  • examples/llm_sparsity/requirements.txt (0 hunks)
  • tests/_test_utils/torch_quantization/onnx_export.py (1 hunks)
  • tests/gpu/onnx/test_plugin.py (1 hunks)
  • tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1 hunks)
  • tox.ini (2 hunks)
💤 Files with no reviewable changes (1)
  • examples/llm_sparsity/requirements.txt
✅ Files skipped from review due to trivial changes (1)
  • examples/llm_ptq/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/_test_utils/torch_quantization/onnx_export.py
  • CHANGELOG.rst
  • .github/workflows/gpu_tests.yml
  • tox.ini
🧰 Additional context used
🧬 Code graph analysis (1)
tests/gpu/onnx/test_plugin.py (4)
tests/_test_utils/onnx_autocast/utils.py (1)
  • _assert_tensors_are_fp16 (20-27)
modelopt/onnx/autocast/convert.py (1)
  • convert_to_mixed_precision (46-148)
modelopt/onnx/autocast/graphsanitizer.py (1)
  • GraphSanitizer (28-522)
modelopt/onnx/quantization/quantize.py (1)
  • quantize (209-564)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (6)
docs/source/getting_started/_installation_for_Linux.rst (2)

21-21: Version bump to 1.1.0rc5 is correct.

Aligns with the PR objectives and downstream CI/example documentation updates.


41-44: LD_LIBRARY_PATH simplification: verify consistency across CI configurations.

Line 44 simplifies LD_LIBRARY_PATH to include only base system paths (/usr/include and /usr/lib/x86_64-linux-gnu), removing TensorRT-specific paths. Verify this aligns with the runtime environment of the TensorRT-LLM 1.1.0rc5 container and matches the LD_LIBRARY_PATH setup in .github/workflows/example_tests.yml lines 78–79 and .gitlab/tests.yml line 38.

.gitlab/tests.yml (2)

38-38: LD_LIBRARY_PATH simplification aligns with documentation.

Consistent with the documentation updates in docs/source/getting_started/_installation_for_Linux.rst and the downstream .github/workflows/example_tests.yml environment setup. The removal of TensorRT-specific paths appears intentional as part of the CUDA 13/TensorRT-LLM 1.1.0rc5 upgrade.


65-65: Version bump to 1.1.0rc5 is correct.

Aligns with the PR objectives and documentation updates.

.github/workflows/example_tests.yml (1)

69-69: Container image version bump to 1.1.0rc5 is correct.

Consistent with parallel updates across CI configurations (.gitlab/tests.yml, documentation, and examples).

tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1)

114-115: LGTM! Assertions correctly updated to handle tensor device placement.

The addition of .detach().cpu() properly ensures the PyTorch tensors are detached from the computation graph and moved to CPU before comparison. This fix aligns with the TensorRT-LLM upgrade and resolves compatibility issues in the CI/CD pipeline.

Optional refinement: When int4.has_cupy is True, both tensors could remain on GPU for consistency by using cupy.asarray(wq_torch_awq_lite.detach()) instead of .cpu(). However, since this is test code with small arrays, the current approach is perfectly acceptable.

Signed-off-by: Keval Morabia <[email protected]>
@kevalmorabia97 kevalmorabia97 changed the title Bump TRT-LLM to 1.1.0rc5 + fix failing CICD tests Fix failing CICD nightly tests Oct 20, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
examples/llm_sparsity/launch_finetune.sh (1)

94-94: Verify that the deprecated --fsdp flag is still supported or upgrade to modern --fsdp_* syntax.

A critical concern from a previous review remains unaddressed: the current change adds quoting but does not resolve the underlying issue. According to the past review, modern Accelerate (with CUDA 13 dependencies as mentioned in this PR) uses individual --fsdp_* CLI flags (e.g., --fsdp_sharding_strategy full_shard, --fsdp_auto_wrap_policy transformer_auto_wrap_policy), not a monolithic --fsdp flag.

The quoting change affects shell tokenization but assumes Accelerate still supports the old combined syntax. If the internal CICD is passing, confirm whether:

  1. The Accelerate version being used still supports --fsdp, or
  2. The quoting change inadvertently fixed an issue, or
  3. A different mechanism is resolving the tests.

To confirm the current state, please run:

Does Hugging Face Accelerate version 0.30+ (CUDA 13 compatible) still support the `--fsdp` flag or only `--fsdp_*` individual flags?

Alternatively, if the deprecated syntax is no longer supported, consider migrating to the modern approach:

-    --fsdp 'full_shard auto_wrap' \
-    --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \
+    --fsdp_sharding_strategy full_shard \
+    --fsdp_auto_wrap_policy transformer_auto_wrap_policy \
+    --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \

Or consider using a config file with --config_file to specify FSDP settings.

🧹 Nitpick comments (1)
tests/_test_utils/torch_quantization/onnx_export.py (1)

68-68: Consider adding a version compatibility check for the dynamo parameter.

Similar to the enable_onnx_checker parameter check on lines 56-59, the dynamo parameter should be checked for existence before use to ensure compatibility with older PyTorch versions that may not support it.

Apply this diff to add the version check:

     if "enable_onnx_checker" in inspect.signature(torch.onnx.export).parameters:
         kwargs = {"enable_onnx_checker": False}
     else:
         kwargs = {}
+    if "dynamo" in inspect.signature(torch.onnx.export).parameters:
+        kwargs["dynamo"] = False
     torch.onnx.export(
         model,
         dummy_input,
         f=buffer,
         opset_version=OPSET,
         input_names=input_names,
         output_names=output_names,
         do_constant_folding=constant_folding,
-        dynamo=False,
         **kwargs,
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a6c995 and b0926ac.

📒 Files selected for processing (11)
  • .github/workflows/gpu_tests.yml (1 hunks)
  • .gitlab/tests.yml (1 hunks)
  • CHANGELOG.rst (1 hunks)
  • docs/source/getting_started/_installation_for_Linux.rst (1 hunks)
  • examples/diffusers/quantization/requirements.txt (1 hunks)
  • examples/llm_sparsity/launch_finetune.sh (1 hunks)
  • examples/llm_sparsity/requirements.txt (0 hunks)
  • tests/_test_utils/torch_quantization/onnx_export.py (1 hunks)
  • tests/gpu/onnx/test_plugin.py (1 hunks)
  • tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1 hunks)
  • tox.ini (2 hunks)
💤 Files with no reviewable changes (1)
  • examples/llm_sparsity/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (4)
  • .github/workflows/gpu_tests.yml
  • tests/gpu/onnx/test_plugin.py
  • CHANGELOG.rst
  • examples/diffusers/quantization/requirements.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (7)
tests/gpu/onnx/test_quantize_onnx_torch_int4_awq.py (1)

114-115: LGTM! Correct device handling for tensor comparisons.

The addition of .detach().cpu() properly handles the comparison between GPU torch tensors and CPU numpy/cupy arrays, preventing device mismatch errors while also detaching from the computation graph.

.gitlab/tests.yml (1)

36-41: LD_LIBRARY_PATH simplification aligns with PR objectives.

Removal of TensorRT-specific library paths (/usr/local/tensorrt/targets/x86_64-linux-gnu/lib, /usr/local/tensorrt/bin) from LD_LIBRARY_PATH is consistent across CI configs and documentation. The simplified path setup supports the PR's goal of reducing TensorRT runtime dependencies in GPU test environments. Since internal CICD is passing, this change is validated.

docs/source/getting_started/_installation_for_Linux.rst (1)

39-45: Documentation environment setup aligns with CI changes.

The LD_LIBRARY_PATH modification in the Docker environment setup section matches the CI changes (.gitlab/tests.yml:38). Documentation and implementation are now consistent regarding simplified library path configuration.

tox.ini (4)

62-64: Minor documentation improvement in GPU environment header.

Changing "Can be used with" to "Should be used with" is a minor but appropriate clarification that the --current-env flag is the expected usage pattern for these GPU environments.


74-76: Triton installation approach uses container-provided version.

Switching from an explicit triton==3.4.0 constraint to marking triton as already-installed via pip-mark-installed is a sound approach for NGC PyTorch containers, which provide pytorch-triton pre-installed. This avoids version conflicts and redundant builds.

However, verify that pip-mark-installed is available in the container or will be installed from PyPI without issues before this command runs.


80-80: Verify Eagle-3 test dependency additions.

The addition of tiktoken, blobfile, and sentencepiece appears to support Eagle-3 test dependencies. Confirm these packages:

  1. Are actually required for Eagle-3 tests and not redundant with other test dependencies
  2. Don't create version conflicts with other installed packages
  3. Should remain hardcoded in tox.ini or should be managed via setup.py optional dependency groups (e.g., [all,dev-test])

Consider whether these belong in the packaging metadata rather than CI-specific environment setup.


83-83: Conditional onnxsim install for py312 mirrors unit test environment.

The py312-specific onnxsim installation in the GPU environment mirrors the same pattern in the unit test environment (line 21). This is consistent and appropriate for building onnxsim from source on Python 3.12.

@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) October 20, 2025 17:01
@kevalmorabia97 kevalmorabia97 merged commit e6e0d2c into main Oct 20, 2025
30 of 32 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/cuda13-cicd branch October 20, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants