Skip to content

[https://nvbugs/5644632][fix] Use correct memory pool#9196

Open
HuiGao-NV wants to merge 4 commits intoNVIDIA:mainfrom
HuiGao-NV:get_mem_pool
Open

[https://nvbugs/5644632][fix] Use correct memory pool#9196
HuiGao-NV wants to merge 4 commits intoNVIDIA:mainfrom
HuiGao-NV:get_mem_pool

Conversation

@HuiGao-NV
Copy link
Collaborator

@HuiGao-NV HuiGao-NV commented Nov 16, 2025

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved CUDA memory pool management and buffer reservation logic for more efficient resource allocation.
  • Tests

    • Extended test coverage by removing previously skipped test configurations, improving validation of model compilation scenarios.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@HuiGao-NV HuiGao-NV requested review from a team as code owners November 16, 2025 15:18
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 16, 2025

📝 Walkthrough

Walkthrough

This PR refactors CUDA memory pool handling across the TensorRT-LLM PyTorch integration by introducing explicit torch.cuda.MemPool instances and dedicated memory pool handles, replacing indirect pool management with direct, type-safe pool management.

Changes

Cohort / File(s) Summary
Memory pool management refactoring
tensorrt_llm/_torch/compilation/backend.py, tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py, tensorrt_llm/_torch/pyexecutor/model_engine.py
Type changes from Any to torch.cuda.MemPool, initialization of dedicated memory pool instances and handles, routing CUDA graph operations to use memory pool handles instead of pools directly, and removal of indirect pool assignments from graph capture results.
Buffer reservation logic
tensorrt_llm/_torch/memory_buffer_utils.py
Conditional wrapping of best-fit buffer search inside reserve_buffer flag, addition of guards to skip candidates based on reservation status, and immediate return of marked reserved buffers.
Test skip configuration
tests/integration/test_lists/waives.txt
Removal of two skip entries for TestGPTOSS::test_auto_dtype configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Memory pool management changes: Review how torch.cuda.MemPool instances are initialized and how handles are derived; verify consistency across backend, runner config, and model engine components
  • Buffer reservation guard logic: Carefully examine the conditional logic in get_buffer() to ensure reservation status checks correctly prevent superseding or selecting non-reserved blocks
  • Handle vs. pool routing: Confirm that CUDA graph operations are correctly routed through memory pool handles and that the removal of indirect pool assignments doesn't lose necessary pool state

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning The PR description is empty; it contains only the template structure with placeholder sections and no actual content. Fill in the Description section explaining what issue is being fixed and why. Add Test Coverage section listing relevant tests. Complete the PR Checklist items as appropriate.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main fix: using the correct memory pool. It's specific, concise, and directly relates to the primary changes across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tensorrt_llm/_torch/memory_buffer_utils.py (1)

60-81: Consider adding a comment to clarify the reserved block preference logic.

The conditional at lines 72-73 correctly ensures that a reserved block is not superseded by an unreserved one during best-fit search. However, the nested logic could be clearer for future maintainers.

Consider adding a comment:

                # Find the smallest buffer that is still large enough (best-fit).
                if block.buffer.numel() < smallest_sufficient_size:
+                   # Prefer reserved blocks: don't replace a reserved best-fit with an unreserved candidate
                    if best_fit_block is not None and best_fit_block.is_reserved and not block.is_reserved:
                        continue
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2854f0c and 840ae4b.

📒 Files selected for processing (5)
  • tensorrt_llm/_torch/compilation/backend.py (2 hunks)
  • tensorrt_llm/_torch/memory_buffer_utils.py (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (4 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
  • tests/integration/test_lists/waives.txt (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/kernels/nccl_device/multimem.h:20-30
Timestamp: 2025-09-23T15:13:48.819Z
Learning: TRT-LLM targets modern CUDA toolkits that support FP8 datatypes, so cuda_fp8.h can be included unconditionally without version guards in TRT-LLM code.
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
Repo: NVIDIA/TensorRT-LLM PR: 7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM's bench configuration, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which is a Dict[str, Any] that can contain default values including `cuda_graph_config`, making the fallback `llm_args["cuda_graph_config"]` safe to use.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
📚 Learning: 2025-08-19T12:45:11.997Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 7033
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:0-0
Timestamp: 2025-08-19T12:45:11.997Z
Learning: In tensorrt_llm/_torch/pyexecutor/model_engine.py, DoRA (Delta Orthogonal Rank Adaptation) functionality was removed from the PyTorch flow to eliminate issues with inverted DoRA detection logic. The original is_dora condition was checking if scaling_vec_pointer == 0, which was potentially incorrect.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/model_engine.py
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
Repo: NVIDIA/TensorRT-LLM PR: 7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which can contain default `cuda_graph_config` values, so `llm_args` may already have this config before the extra options processing.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/model_engine.py
🧬 Code graph analysis (1)
tensorrt_llm/_torch/memory_buffer_utils.py (1)
tensorrt_llm/_utils.py (1)
  • numel (1009-1010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (4)

64-64: LGTM: Type annotation improved.

Changing the type from Any to torch.cuda.MemPool makes the expected type explicit and improves type safety.


298-298: LGTM: Correct use of memory pool handle.

The graph capture correctly uses memory_pool_handle for the pool parameter, aligning with the new handle-based approach.


431-432: LGTM: Proper cleanup of memory pool handle.

The cleanup correctly deletes and nullifies the memory_pool_handle, ensuring resources are properly released.


101-103: Fix syntax error: stray closing parenthesis.

Line 102 contains a stray closing parenthesis that will cause a syntax error. Line 101 is a complete statement and doesn't require this.

Apply this diff:

 self.memory_pool = config.cuda_graph_mem_pool if config.cuda_graph_mem_pool else torch.cuda.MemPool()
-)
 self.memory_pool_handle = self.memory_pool.id

Likely an incorrect or invalid review comment.

tensorrt_llm/_torch/compilation/backend.py (2)

26-27: LGTM: Explicit MemPool lifecycle management.

Adding _graph_pool as a class attribute enables explicit management of the CUDA memory pool lifecycle, which is cleaner than relying on implicit pool handles alone.


62-63: LGTM: Correct MemPool initialization.

The initialization correctly creates a MemPool instance and stores its id as the handle, replacing the previous direct graph_pool_handle() call with explicit pool management.

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24683 [ run ] triggered by Bot. Commit: 840ae4b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24683 [ run ] completed with state SUCCESS. Commit: 840ae4b
/LLM/main/L0_MergeRequest_PR pipeline #18638 completed with status: 'FAILURE'

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24696 [ run ] triggered by Bot. Commit: 729bf8f

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24707 [ run ] triggered by Bot. Commit: 64c6aea

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24696 [ run ] completed with state ABORTED. Commit: 729bf8f
LLM/main/L0_MergeRequest_PR #18651 (Blue Ocean) completed with status: ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24707 [ run ] completed with state FAILURE. Commit: 64c6aea
/LLM/main/L0_MergeRequest_PR pipeline #18658 completed with status: 'FAILURE'

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24717 [ run ] triggered by Bot. Commit: 64c6aea

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24717 [ run ] completed with state FAILURE. Commit: 64c6aea
/LLM/main/L0_MergeRequest_PR pipeline #18660 completed with status: 'FAILURE'

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24749 [ run ] triggered by Bot. Commit: 843a759

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24749 [ run ] completed with state FAILURE. Commit: 843a759
/LLM/main/L0_MergeRequest_PR pipeline #18668 completed with status: 'FAILURE'

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24815 [ run ] triggered by Bot. Commit: 843a759

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24815 [ run ] completed with state SUCCESS. Commit: 843a759
/LLM/main/L0_MergeRequest_PR pipeline #18728 completed with status: 'FAILURE'

@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31370 [ run ] triggered by Bot. Commit: 5f1748d

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31370 [ run ] completed with state SUCCESS. Commit: 5f1748d
/LLM/main/L0_MergeRequest_PR pipeline #24259 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32268 [ run ] triggered by Bot. Commit: 5f1748d

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32268 [ run ] completed with state FAILURE. Commit: 5f1748d
/LLM/main/L0_MergeRequest_PR pipeline #25011 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Hui Gao <huig@nvidia.com>
@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36943 [ run ] triggered by Bot. Commit: 876dd68 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36943 [ run ] completed with state SUCCESS. Commit: 876dd68
/LLM/main/L0_MergeRequest_PR pipeline #28605 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36974 [ run ] triggered by Bot. Commit: 876dd68 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36974 [ run ] completed with state SUCCESS. Commit: 876dd68
/LLM/main/L0_MergeRequest_PR pipeline #28629 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@HuiGao-NV
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37040 [ run ] triggered by Bot. Commit: 876dd68 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37040 [ run ] completed with state SUCCESS. Commit: 876dd68
/LLM/main/L0_MergeRequest_PR pipeline #28681 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Hui Gao <huig@nvidia.com>
@HuiGao-NV
Copy link
Collaborator Author

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Ray-1, DGX_H100-2_GPUs-PyTorch-Others-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37085 [ run ] triggered by Bot. Commit: 868789c Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37085 [ run ] completed with state SUCCESS. Commit: 868789c
/LLM/main/L0_MergeRequest_PR pipeline #28713 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Hui Gao <huig@nvidia.com>
@HuiGao-NV
Copy link
Collaborator Author

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Ray-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37134 [ run ] triggered by Bot. Commit: cac9a89 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37134 [ run ] completed with state SUCCESS. Commit: cac9a89
/LLM/main/L0_MergeRequest_PR pipeline #28750 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Hui Gao <huig@nvidia.com>
@HuiGao-NV
Copy link
Collaborator Author

/bot run --stage-list "DGX_H100-4_GPUs-PyTorch-Ray-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37152 [ run ] triggered by Bot. Commit: d8620de Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37152 [ run ] completed with state SUCCESS. Commit: d8620de
/LLM/main/L0_MergeRequest_PR pipeline #28763 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

best_fit_block.is_reserved = True
# A suitable buffer was found, so reuse it.
return self._view_as(best_fit_block.buffer, tensor_shape, dtype)
# else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just directly remove the commented out code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants