[TRTLLM-11146][feat] Extend python cache transceiver to support nemotron by bo-nv · Pull Request #12150 · NVIDIA/TensorRT-LLM

bo-nv · 2026-03-12T08:15:34Z

Summary by CodeRabbit

Release Notes

New Features
- Added support for Mamba-based models in disaggregated serving with state transfer between context and generation phases
- Implemented dual-mode Mamba state management supporting both Python and C++ runtime backends
- Enhanced configuration flexibility for transceiver runtime selection in hybrid deployments
Tests
- Added comprehensive test coverage for Mamba state transfer across tensor parallelism configurations

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

bo-nv · 2026-03-12T08:17:29Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-12T08:24:20Z

PR_Github #38698 [ run ] triggered by Bot. Commit: af50612 Link to invocation

coderabbitai · 2026-03-12T08:29:38Z

📝 Walkthrough

Walkthrough

This change introduces Mamba state transfer support to the disaggregated serving architecture. It adds a new mamba_state_index field to KVSlice, creates a MambaPolicy class for TP-aware state mapping and dispatch, integrates MambaHybridCacheManager throughout the cache and transfer pipeline, unifies state indices computation in metadata/mixer logic, and includes comprehensive transfer testing.

Changes

Cohort / File(s)	Summary
Core Data Model `tensorrt_llm/_torch/disaggregation/base/transfer.py`, `tensorrt_llm/_torch/disaggregation/resource/page.py`	Added `mamba_state_index` field to `KVSlice`; extended `LayerGroup` with optional Mamba-specific fields (`mamba_layer_offsets`, `conv_states`, `ssm_states`, etc.) and updated `to_dict`/`from_dict` to handle both KV and Mamba variants.
Mamba Policy & Dispatch `tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py`	Introduced `MambaPolicy` class with TP-aware helpers (`_mamba_tp`, `_build_layer_ptrs`, `_select_mapper`) and fragment-building methods (`build_mamba_frags`, `collect_frags`) to centralize Mamba state transfer logic and mapper selection.
Cache Management Integration `tensorrt_llm/_torch/disaggregation/native/py_cache_transceiver.py`, `tensorrt_llm/_torch/disaggregation/resource/kv_extractor.py`, `tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py`	Extended `PyNativeCacheTransceiver` to handle Mamba layers and populate `mamba_state_index`; added `_build_layer_group_for_mamba` in `kv_extractor`; enhanced `MambaCacheManager` with `add_dummy_requests` and updated `get_state_indices` to support selective slot allocation and dual return modes.
Transfer Orchestration `tensorrt_llm/_torch/disaggregation/native/transfer.py`	Extended `RecvReqInfo` with optional `mamba_state_index` field; augmented `KVRecvTask._create_req_info` and `KVSendTask` to handle Mamba fragments via `MambaPolicy.collect_frags`; added dual pathways for pool-based and Mamba-based state transfers.
Metadata & Mixer Unification `tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py`, `tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`	Removed conditional gating in `prepare()` to always compute state indices; consolidated `state_indices` to use `mamba_metadata.state_indices` as single source of truth across both CPP and Python paths.
Executor Configuration `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`	Refined MambaCacheManager enablement logic to check for disaggregation and hybrid model presence, conditionally favoring CPP path based on transceiver runtime and environment variable.
Test Coverage & Configuration `tests/integration/defs/accuracy/test_disaggregated_serving.py`, `tests/integration/test_lists/test-db/l0_a10.yml`, `tests/integration/test_lists/test-db/l0_dgx_b200.yml`, `tests/unittest/disaggregated/test_mamba_transfer.py`	Added parameterization to disaggregated serving tests with `use_python_runtime` flag; registered new Mamba transfer unit test; added test entry to PyTorch pre-merge list and parameterized `test_nixl_backend` variants.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Executor as PyExecutor
    participant CacheMgr as MambaCacheManager
    participant Transceiver as PyNativeCacheTransceiver
    participant Policy as MambaPolicy
    participant Transfer as KVSendTask/KVRecvTask
    
    App->>Executor: init with disaggregated config
    Executor->>CacheMgr: create MambaHybridCacheManager
    CacheMgr->>CacheMgr: partition conv/ssm states across TP ranks
    
    App->>Transceiver: initialize for context rank
    Transceiver->>CacheMgr: get layer groups with mamba_state_index
    Transceiver->>Transceiver: build page tables with Mamba layers
    
    App->>Executor: run context phase
    Executor->>CacheMgr: allocate & write Mamba states (conv/ssm)
    
    App->>Transfer: prepare send to generation rank
    Transfer->>Policy: collect_frags for Mamba states
    Policy->>Policy: _mamba_tp (compute effective TP mapping)
    Policy->>Policy: _select_mapper (choose conv/ssm mapper)
    Policy->>Policy: build_mamba_frags (compute source/dest pointers)
    Policy-->>Transfer: return fragment pointers & sizes
    
    Transfer->>Transfer: aggregate KV + Mamba fragments
    Transfer->>Transceiver: transfer state via transceiver
    
    Transceiver->>Transceiver: recv & reconstruct states
    Transceiver->>CacheMgr: write received states to Mamba cache
    
    App->>Executor: run generation phase
    Executor->>CacheMgr: read Mamba states from cache

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description contains only the template with placeholders and no meaningful information about the changes, rationale, or test coverage.	Fill in the Description section explaining the Mamba state transfer implementation for Nemotron; document test coverage for the new transfer logic; confirm PR checklist items are completed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: extending the Python cache transceiver to support Nemotron. It is specific, directly related to the changeset, and captures the primary objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py (1)
1-1: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2026.

The copyright header shows "2022-2024" but this file has meaningful modifications in 2026. As per coding guidelines, the year should reflect the latest modification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py` at line 1, Update the file
header copyright year range from "2022-2024" to include 2026 (e.g., "2022-2026")
at the top of the file; modify the SPDX/FileCopyrightText comment line in
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py so the year range reflects the
latest modification year.

🧹 Nitpick comments (2)

tests/unittest/disaggregated/test_mamba_transfer.py (1)

329-455: Run the shutdown path from a finally block.

Any exception or assertion failure before Line 450 skips both the cache-manager shutdowns and the transfer-worker shutdowns, which can leak GPU memory and background threads into later tests in the same pytest worker.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/disaggregated/test_mamba_transfer.py` around lines 329 - 455,
The test shutdown steps are not executed on exceptions; wrap the teardown in a
finally block so resources are always released: move the cleanup that calls
mgr.shutdown() for all ctx_mgrs and gen_mgrs and the transfer_worker.shutdown()
for ctx_tcs/gen_tcs into a finally clause at the end of run_mamba_transfer_test,
ensuring ctx_mgrs, gen_mgrs, ctx_tcs and gen_tcs are visible to the finally
block (initialize them before try) and guard transfer_worker shutdown with
hasattr/None checks as currently done.

tensorrt_llm/_torch/disaggregation/resource/page.py (1)

143-144: Consider extracting exception messages to constants (optional).

Static analysis (TRY003) flags the inline exception messages. While functional, extracting these to a class-level constant or using a dedicated exception subclass would be cleaner.

Example refactor

class InvalidLayerGroupError(ValueError):
    """Raised when a LayerGroup has neither kv nor mamba configuration."""
    pass

# Then use:
raise InvalidLayerGroupError("LayerGroup must have either kv_head_num_per_rank or mamba_layer_offsets")

Also applies to: 173-174

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/disaggregation/resource/page.py` around lines 143 - 144,
Replace the inline ValueError("Invalid layer group") raises in the LayerGroup
handling code with either a dedicated exception class (e.g.,
InvalidLayerGroupError subclassing ValueError) or a class/module-level constant
message (e.g., INVALID_LAYER_GROUP_MSG) and use that constant when raising;
update both occurrences where LayerGroup validation currently raises ValueError
(the else branch in the layer-group selection logic and the other similar raise
later) to raise InvalidLayerGroupError("LayerGroup must have either
kv_head_num_per_rank or mamba_layer_offsets") or raise
ValueError(INVALID_LAYER_GROUP_MSG) so all messages are centralized and
reusable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/disaggregation/native/py_cache_transceiver.py`:
- Around line 156-164: The code currently creates a PyNativeCacheTransceiver
even when the MambaCacheManager is the C++ implementation which makes accessing
MambaHybridCacheManager.mamba_cache_index (used in _create_kv_slice) trigger its
internal `not self._use_cpp` assertion; fix by adding a guard: in
get_cache_transceiver() detect if the chosen kv_cache_manager is using C++
(check the manager instance or its `_use_cpp` flag / TRTLLM_USE_CPP_MAMBA env)
and do not instantiate PyNativeCacheTransceiver when C++ Mamba is active, or
alternatively add a validation in PyNativeCacheTransceiver.__init__ that
inspects the passed kv_cache_manager and raises/returns early if `_use_cpp` is
True so _create_kv_slice will never access mamba_cache_index on a C++-backed
manager.

In `@tensorrt_llm/_torch/disaggregation/resource/page.py`:
- Around line 132-142: The branch handling mamba mode calls
self.conv_states.to_dict() and self.ssm_states.to_dict() without guarding for
None even though those attributes are Optional; update the to_dict() branch
where mamba_layer_offsets is not None to either validate/invariant in
__post_init__ that conv_states and ssm_states are non-None or (preferred) add
defensive None checks: replace direct .to_dict() calls with conditional
expressions that return None or empty dict if conv_states or ssm_states is None
(e.g., use conv_states.to_dict() if conv_states is not None else None),
referencing the mamba_layer_offsets branch and the conv_states/ssm_states
attributes to locate the change.

In `@tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py`:
- Around line 396-439: The bug is that _prepare_mamba_cache_blocks (triggered by
add_dummy_requests) can rewrite self.state_indices but
get_state_indices(request_ids, is_padding) only returns a computed mapping
without persisting it, while update_mamba_states still reads self.state_indices
causing mis-synced slot assignments. Fix by making get_state_indices persist the
computed mapping back into self.state_indices when request_ids and is_padding
are provided: compute result as currently done, then assign self.state_indices =
torch.tensor(result, dtype=self.state_indices.dtype,
device=self.state_indices.device) (or keep list form only if callers expect a
list), and then return the value; ensure you use self.mamba_cache_index for
lookups and preserve existing tensor dtype/device to avoid device/type
mismatches.
- Around line 221-227: Before publishing conv_section_dims, assert that each
section is exactly divisible by tp_size instead of only checking the totals:
verify d_inner, n_groups * d_state, conv_dim, and nheads are divisible by
tp_size; if any are not, raise a clear error (or assert) indicating which value
and its expected divisibility. Update the block that computes d_inner_local,
ng_ds_local, conv_dim, and nheads (which then sets self.conv_section_dims) to
perform these divisibility checks prior to integer division so conv_section_dims
accurately reflects the local slot layout.

---

Outside diff comments:
In `@tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py`:
- Line 1: Update the file header copyright year range from "2022-2024" to
include 2026 (e.g., "2022-2026") at the top of the file; modify the
SPDX/FileCopyrightText comment line in
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py so the year range reflects the
latest modification year.

---

Nitpick comments:
In `@tensorrt_llm/_torch/disaggregation/resource/page.py`:
- Around line 143-144: Replace the inline ValueError("Invalid layer group")
raises in the LayerGroup handling code with either a dedicated exception class
(e.g., InvalidLayerGroupError subclassing ValueError) or a class/module-level
constant message (e.g., INVALID_LAYER_GROUP_MSG) and use that constant when
raising; update both occurrences where LayerGroup validation currently raises
ValueError (the else branch in the layer-group selection logic and the other
similar raise later) to raise InvalidLayerGroupError("LayerGroup must have
either kv_head_num_per_rank or mamba_layer_offsets") or raise
ValueError(INVALID_LAYER_GROUP_MSG) so all messages are centralized and
reusable.

In `@tests/unittest/disaggregated/test_mamba_transfer.py`:
- Around line 329-455: The test shutdown steps are not executed on exceptions;
wrap the teardown in a finally block so resources are always released: move the
cleanup that calls mgr.shutdown() for all ctx_mgrs and gen_mgrs and the
transfer_worker.shutdown() for ctx_tcs/gen_tcs into a finally clause at the end
of run_mamba_transfer_test, ensuring ctx_mgrs, gen_mgrs, ctx_tcs and gen_tcs are
visible to the finally block (initialize them before try) and guard
transfer_worker shutdown with hasattr/None checks as currently done.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0c8dd4c9-4b86-4b04-b8ba-c22058c737d6

📥 Commits

Reviewing files that changed from the base of the PR and between adfc542 and af50612.

📒 Files selected for processing (14)

tensorrt_llm/_torch/disaggregation/base/transfer.py
tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py
tensorrt_llm/_torch/disaggregation/native/py_cache_transceiver.py
tensorrt_llm/_torch/disaggregation/native/transfer.py
tensorrt_llm/_torch/disaggregation/resource/kv_extractor.py
tensorrt_llm/_torch/disaggregation/resource/page.py
tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py
tensorrt_llm/_torch/modules/mamba/mamba2_mixer.py
tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/unittest/disaggregated/test_mamba_transfer.py

tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py

tensorrt_llm/_torch/disaggregation/native/py_cache_transceiver.py

tensorrt_llm/_torch/disaggregation/resource/page.py

coderabbitai · 2026-03-12T08:29:41Z

tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py

+        d_inner_local = d_inner // tp_size
+        ng_ds_local = n_groups * d_state // tp_size
        conv_dim = conv_dim // tp_size
        nheads = nheads // tp_size

+        # Per-section dims for conv_state: [x | B | C]
+        self.conv_section_dims = [d_inner_local, ng_ds_local, ng_ds_local]


⚠️ Potential issue | 🟡 Minor

Assert per-section TP divisibility before publishing conv_section_dims.

conv_dim % tp_size == 0 is weaker than what this new metadata needs. If only the total is divisible, ng_ds_local = n_groups * d_state // tp_size silently truncates and the section list no longer matches the local slot layout, which will mis-fragment conv-state transfers on TP-mismatched runs.

Suggested guard

assert nheads % tp_size == 0, "nheads must be divisible by tp_size" assert conv_dim % tp_size == 0, "conv_dim must be divisible by tp_size" + assert (n_groups * d_state) % tp_size == 0, ( + "n_groups * d_state must be divisible by tp_size" + ) # partition conv_dim and nheads d_inner_local = d_inner // tp_size ng_ds_local = n_groups * d_state // tp_size

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py` around lines 221 - 227, Before publishing conv_section_dims, assert that each section is exactly divisible by tp_size instead of only checking the totals: verify d_inner, n_groups * d_state, conv_dim, and nheads are divisible by tp_size; if any are not, raise a clear error (or assert) indicating which value and its expected divisibility. Update the block that computes d_inner_local, ng_ds_local, conv_dim, and nheads (which then sets self.conv_section_dims) to perform these divisibility checks prior to integer division so conv_section_dims accurately reflects the local slot layout.

tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py

tensorrt-cicd · 2026-03-12T13:28:49Z

PR_Github #38698 [ run ] completed with state SUCCESS. Commit: af50612
/LLM/main/L0_MergeRequest_PR pipeline #30017 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bo-nv · 2026-03-13T03:41:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-13T03:48:33Z

PR_Github #38820 [ run ] triggered by Bot. Commit: 8be9a8c Link to invocation

tensorrt-cicd · 2026-03-13T11:12:44Z

PR_Github #38820 [ run ] completed with state SUCCESS. Commit: 8be9a8c
/LLM/main/L0_MergeRequest_PR pipeline #30132 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py

bo-nv · 2026-03-15T14:19:20Z

/bot run --disable-fail-fast

bo-nv · 2026-03-15T14:30:00Z

/bot kill

bo-nv · 2026-03-15T14:31:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-15T14:35:29Z

PR_Github #38986 [ kill ] triggered by Bot. Commit: e20c425 Link to invocation

tensorrt-cicd · 2026-03-15T14:35:31Z

PR_Github #38986 [ kill ] completed with state SUCCESS. Commit: e20c425
Successfully killed previous jobs for commit e20c425

Link to invocation

bo-nv · 2026-03-24T23:07:36Z

/bot run

tensorrt-cicd · 2026-03-24T23:13:42Z

PR_Github #40184 [ run ] triggered by Bot. Commit: 4a4c54d Link to invocation

bo-nv · 2026-03-25T01:50:33Z

/bot run

tensorrt-cicd · 2026-03-25T01:56:34Z

PR_Github #40210 [ run ] triggered by Bot. Commit: 4a4c54d Link to invocation

tensorrt-cicd · 2026-03-25T07:08:37Z

PR_Github #40210 [ run ] completed with state SUCCESS. Commit: 4a4c54d
/LLM/main/L0_MergeRequest_PR pipeline #31347 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bo-nv · 2026-03-25T07:18:19Z

/bot run

tensorrt-cicd · 2026-03-25T07:25:12Z

PR_Github #40281 [ run ] triggered by Bot. Commit: 4a4c54d Link to invocation

tensorrt-cicd · 2026-03-25T14:05:52Z

PR_Github #40281 [ run ] completed with state SUCCESS. Commit: 4a4c54d
/LLM/main/L0_MergeRequest_PR pipeline #31397 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

bo-nv · 2026-03-25T14:08:50Z

/bot run

tensorrt-cicd · 2026-03-25T14:14:55Z

PR_Github #40336 [ run ] triggered by Bot. Commit: 4a4c54d Link to invocation

bo-nv · 2026-03-25T15:12:20Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-03-25T15:18:42Z

PR_Github #40345 [ run ] triggered by Bot. Commit: ed03704 Link to invocation

tensorrt-cicd · 2026-03-25T15:18:44Z

PR_Github #40336 [ run ] completed with state ABORTED. Commit: 4a4c54d

Link to invocation

tensorrt-cicd · 2026-03-26T15:19:31Z

PR_Github #40345 [ run ] completed with state ABORTED. Commit: ed03704
LLM/main/L0_MergeRequest_PR #31450 (Blue Ocean) completed with status: ABORTED

Link to invocation

bo-nv · 2026-03-28T08:39:47Z

/bot kill

…ron-super v3 Add SSM/Mamba state transfer support for disaggregated serving of hybrid Nemotron models. Key changes: - Add MambaLayerGroup to resource/page.py for SSM state layout description - Add MambaPolicy and region mappers in native/mixers/ssm/peer.py for TP-aware Mamba state fragment building - Extend RecvReqInfo, KVSlice with mamba_state_index for slot identification - Update Sender/Receiver in native/transfer.py to handle mamba fragments - Update KvCacheTransceiverV2 (transceiver.py) with mamba layer count exchange and MambaLayerGroup-aware KV slice creation - Update get_unique_pool_memory_descs in resource/utils.py for mamba pools - Adapt mamba_cache_manager, kv_extractor, py_executor_creator for disaggregated hybrid model support - Add comprehensive unit tests in test_mamba_transfer.py Signed-off-by: Bo Deng <deemod@nvidia.com>

bo-nv · 2026-03-28T09:20:25Z

/bot run --disable-fail-fast --add-multi-gpu-test

tensorrt-cicd · 2026-03-28T09:27:06Z

PR_Github #40533 [ run ] triggered by Bot. Commit: 3774a6a Link to invocation

tensorrt-cicd · 2026-03-28T09:27:07Z

PR_Github #40533 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/28.

Link to invocation

bo-nv · 2026-03-28T10:19:02Z

/bot run

tensorrt-cicd · 2026-03-28T10:24:37Z

PR_Github #40537 [ run ] triggered by Bot. Commit: 3774a6a Link to invocation

tensorrt-cicd · 2026-03-28T10:24:38Z

PR_Github #40537 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/28.

Link to invocation

bo-nv · 2026-03-28T11:19:11Z

/bot run

tensorrt-cicd · 2026-03-28T11:25:38Z

PR_Github #40538 [ run ] triggered by Bot. Commit: 3774a6a Link to invocation

tensorrt-cicd · 2026-03-28T11:25:39Z

PR_Github #40538 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/28.

Link to invocation

bo-nv · 2026-03-28T12:19:17Z

/bot run

tensorrt-cicd · 2026-03-28T12:24:59Z

PR_Github #40542 [ run ] triggered by Bot. Commit: 3774a6a Link to invocation

tensorrt-cicd · 2026-03-28T12:25:00Z

PR_Github #40542 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 3/28.

Link to invocation

bo-nv requested review from Shixiaowei02, Tabrizian, chuangz0 and pcastonguay March 12, 2026 08:15

bo-nv requested review from a team as code owners March 12, 2026 08:15

bo-nv requested review from Wanli-Jiang, danielafrimi and shaharmor98 March 12, 2026 08:15

github-actions bot assigned bo-nv Mar 12, 2026

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

bo-nv requested a review from a team as a code owner March 13, 2026 03:40

StanleySun639 approved these changes Mar 13, 2026

View reviewed changes

pcastonguay reviewed Mar 13, 2026

View reviewed changes

tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/disaggregation/native/mixers/ssm/peer.py Outdated Show resolved Hide resolved

bo-nv force-pushed the main-mamba_python_transfer branch from 937e336 to e20c425 Compare March 15, 2026 14:29

bo-nv force-pushed the main-mamba_python_transfer branch from 4a4c54d to ed03704 Compare March 25, 2026 15:10

bo-nv force-pushed the main-mamba_python_transfer branch from ed03704 to 3774a6a Compare March 28, 2026 09:03

Conversation

bo-nv commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

bo-nv commented Mar 12, 2026

Uh oh!

tensorrt-cicd commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 12, 2026

Uh oh!

bo-nv commented Mar 13, 2026

Uh oh!

tensorrt-cicd commented Mar 13, 2026

Uh oh!

tensorrt-cicd commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

bo-nv commented Mar 15, 2026

Uh oh!

bo-nv commented Mar 15, 2026

Uh oh!

bo-nv commented Mar 15, 2026

Uh oh!

tensorrt-cicd commented Mar 15, 2026

Uh oh!

tensorrt-cicd commented Mar 15, 2026

Uh oh!

bo-nv commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

bo-nv commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

bo-nv commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

bo-nv commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

bo-nv commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 26, 2026

Uh oh!

bo-nv commented Mar 12, 2026 •

edited

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading