Skip to content

[https://nvbugs/6018172][fix] Add synchronization calls to warmup when host cache offloading is active#12539

Draft
longlee0622 wants to merge 1 commit intoNVIDIA:mainfrom
longlee0622:dev-jonasl-bug6018172
Draft

[https://nvbugs/6018172][fix] Add synchronization calls to warmup when host cache offloading is active#12539
longlee0622 wants to merge 1 commit intoNVIDIA:mainfrom
longlee0622:dev-jonasl-bug6018172

Conversation

@longlee0622
Copy link
Collaborator

@longlee0622 longlee0622 commented Mar 25, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced KV cache synchronization across multiple warmup phases (general warmup, autotuner warmup, and CUDA graph generation). Ensures proper cache state and metadata alignment before forward execution.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
@longlee0622 longlee0622 requested a review from a team as a code owner March 25, 2026 11:04
@longlee0622 longlee0622 requested a review from lancelly March 25, 2026 11:04
@longlee0622 longlee0622 marked this pull request as draft March 25, 2026 11:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e655230-2eb7-4bfa-8dfe-43f151a77db9

📥 Commits

Reviewing files that changed from the base of the PR and between 2b5c434 and 9722864.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py

📝 Walkthrough

Walkthrough

A new private helper method _sync_kv_cache_for_warmup() was added to synchronize KV cache transfer state during host cache offloading. This method is integrated into four warmup phases—general warmup, autotuner warmup, and two CUDA graph generation paths—calling it before forward execution to ensure consistent KV cache metadata.

Changes

Cohort / File(s) Summary
KV Cache Synchronization Helper
tensorrt_llm/_torch/pyexecutor/model_engine.py
Added new _sync_kv_cache_for_warmup() method that conditionally synchronizes KV cache transfer and block metadata when host cache offloading is active. Integrated into four warmup phases: _general_warmup, _run_autotuner_warmup, _capture_generation_cuda_graphs, and _capture_piecewise_cuda_graphs by inserting the sync call before each self.forward() invocation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is essentially empty, containing only the template with unfilled sections (Description and Test Coverage are blank), and the checklist is incomplete. Fill in the Description section explaining the issue and solution, and the Test Coverage section listing relevant test cases that validate the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title is specific and directly related to the main change: adding synchronization calls to the warmup path when host cache offloading is active.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant