Skip to content

[None][fix] AutoDeploy: Use tmp folder for the load_moe_align#9101

Merged
nvchenghaoz merged 1 commit intoNVIDIA:mainfrom
nv-auto-deploy:chenghao/moe_tmp_1112
Nov 12, 2025
Merged

[None][fix] AutoDeploy: Use tmp folder for the load_moe_align#9101
nvchenghaoz merged 1 commit intoNVIDIA:mainfrom
nv-auto-deploy:chenghao/moe_tmp_1112

Conversation

@nvchenghaoz
Copy link
Collaborator

@nvchenghaoz nvchenghaoz commented Nov 12, 2025

Summary by CodeRabbit

  • Chores
    • Updated build directory resolution to use system temporary directories, improving portability and reducing environment variable dependencies for the MOE custom operations build process.

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 12, 2025

📝 Walkthrough

Walkthrough

Modified build directory resolution in the MOE alignment module loader to use system temporary directory instead of environment-dependent cache paths. Replaced CACHE_ROOT/build logic with tempfile.gettempdir(). Removed PermissionError fallback and updated module loader to pass the new BUILD_DIR.

Changes

Cohort / File(s) Summary
Build directory resolution refactoring
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/load_moe_align.py
Added tempfile import; replaced environment-variable-dependent CACHE_ROOT/build logic with tempfile.gettempdir() for BUILD_DIR construction; removed PermissionError fallback; now creates directory unconditionally and passes BUILD_DIR as build_directory parameter to module loader

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify implications of switching from cache directory to system temp directory (cleanup, persistence, availability)
  • Confirm tempfile.gettempdir() behavior is consistent across supported platforms
  • Ensure unconditional directory creation doesn't introduce permission or collision issues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and contains only a template directive (@coderabbitai summary) with no actual description, test coverage information, or checklist completion. Complete the PR description by filling in the Description and Test Coverage sections, and confirm the PR checklist items have been reviewed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: using a temporary folder for the load_moe_align module in AutoDeploy, directly matching the code changes shown in the summary.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a751a0 and f5eceba.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/load_moe_align.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/load_moe_align.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/load_moe_align.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/load_moe_align.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 6915
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:4010-4012
Timestamp: 2025-08-14T23:23:27.449Z
Learning: For MOE (Mixture of Experts) code reviews in TensorRT-LLM, avoid repeatedly suggesting finalize fusion validation checks and safety assertions. The user djns99 has indicated these suggestions are repetitive and unwanted across multiple MOE-related changes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/load_moe_align.py (2)

7-7: LGTM!

The tempfile import is necessary for the new system temp directory approach.


17-19: Verify security and isolation for multi-user/multi-process scenarios.

Using tempfile.gettempdir() improves portability but introduces potential concerns:

  1. Security: System temp directories are typically world-readable on multi-user systems, allowing other users to see or potentially tamper with compiled extensions.
  2. Isolation: No user-specific or version-specific isolation in the path. Multiple users or processes building different versions concurrently could conflict.
  3. Build collisions: PyTorch's load function may handle concurrent builds, but this should be confirmed.

The previous CACHE_ROOT approach likely provided better user isolation. Please verify that:

  • The deployment environment is single-user or properly isolated
  • Concurrent builds from multiple processes/users won't cause issues
  • Security implications of world-readable compiled extensions are acceptable

Consider adding user or process isolation to the path:

import getpass
BUILD_DIR = os.path.join(
    tempfile.gettempdir(), 
    f"ad_cache_{getpass.getuser()}", 
    "auto_deploy", 
    "fused_moe", 
    "moe_align"
)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24332 [ run ] triggered by Bot. Commit: f5eceba

@yibinl-nvidia
Copy link
Collaborator

Thank you @nvchenghaoz for making this change to strengthen TRTLLM security level.

Copy link
Collaborator

@Fridah-nv Fridah-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, closes #9017

@github-project-automation github-project-automation bot moved this from Backlog to In review in AutoDeploy Board Nov 12, 2025
@nvchenghaoz nvchenghaoz enabled auto-merge (squash) November 12, 2025 22:56
@tensorrt-cicd
Copy link
Collaborator

PR_Github #24332 [ run ] completed with state SUCCESS. Commit: f5eceba
/LLM/main/L0_MergeRequest_PR pipeline #18361 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@nvchenghaoz nvchenghaoz merged commit f1d637e into NVIDIA:main Nov 12, 2025
7 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in AutoDeploy Board Nov 12, 2025
zheyuf pushed a commit to zheyuf/TensorRT-LLM that referenced this pull request Nov 19, 2025
…#9101)

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[Bug]: AutoDeploy: avoid aot compilation of auto_deploy kernels unless kernel is needed

4 participants