[https://nvbugs/6018172][fix] Add synchronization calls to warmup when host cache offloading is active by longlee0622 · Pull Request #12539 · NVIDIA/TensorRT-LLM

longlee0622 · 2026-03-25T11:04:39Z

Summary by CodeRabbit

Bug Fixes
- Enhanced KV cache synchronization across multiple warmup phases (general warmup, autotuner warmup, and CUDA graph generation). Ensures proper cache state and metadata alignment before forward execution.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>

coderabbitai · 2026-03-25T11:12:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e655230-2eb7-4bfa-8dfe-43f151a77db9

📥 Commits

Reviewing files that changed from the base of the PR and between 2b5c434 and 9722864.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/model_engine.py

📝 Walkthrough

Walkthrough

A new private helper method _sync_kv_cache_for_warmup() was added to synchronize KV cache transfer state during host cache offloading. This method is integrated into four warmup phases—general warmup, autotuner warmup, and two CUDA graph generation paths—calling it before forward execution to ensure consistent KV cache metadata.

Changes

Cohort / File(s)	Summary
KV Cache Synchronization Helper `tensorrt_llm/_torch/pyexecutor/model_engine.py`	Added new `_sync_kv_cache_for_warmup()` method that conditionally synchronizes KV cache transfer and block metadata when host cache offloading is active. Integrated into four warmup phases: `_general_warmup`, `_run_autotuner_warmup`, `_capture_generation_cuda_graphs`, and `_capture_piecewise_cuda_graphs` by inserting the sync call before each `self.forward()` invocation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is essentially empty, containing only the template with unfilled sections (Description and Test Coverage are blank), and the checklist is incomplete.	Fill in the Description section explaining the issue and solution, and the Test Coverage section listing relevant test cases that validate the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is specific and directly related to the main change: adding synchronization calls to the warmup path when host cache offloading is active.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

add synchronization calls to warmup when host cache offloading is active

9722864

Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>

longlee0622 requested a review from a team as a code owner March 25, 2026 11:04

longlee0622 requested a review from lancelly March 25, 2026 11:04

github-actions bot assigned longlee0622 Mar 25, 2026

longlee0622 marked this pull request as draft March 25, 2026 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6018172][fix] Add synchronization calls to warmup when host cache offloading is active#12539

[https://nvbugs/6018172][fix] Add synchronization calls to warmup when host cache offloading is active#12539
longlee0622 wants to merge 1 commit intoNVIDIA:mainfrom
longlee0622:dev-jonasl-bug6018172

longlee0622 commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

longlee0622 commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

longlee0622 commented Mar 25, 2026 •

edited by coderabbitai bot

Loading