[TRTLLM-9527][feat] change context params and disagg params (step3) #10495

chuangz0 · 2026-01-07T09:29:33Z

Description

for python based cache transceiver ,we need unique id to identify request in context and gen side for compatibility with the pre-registration flow and other purposes.
We add three new fields in ContextPhaseParams and DisaggregatedParams , disagg_id acts as unique_id , ctx_dp_rank + ctx_info_endpoint act as opaque_state or dataTransceiverState ,which can't be used in python based cache transceiver.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Chuang Zhu <[email protected]>

coderabbitai · 2026-01-07T09:37:23Z

📝 Walkthrough

Walkthrough

This PR extends the ContextPhaseParams API to support three new optional disaggregation-related fields (disaggId, ctxDpRank, disaggInfoEndpoint) across C++ core, serialization, Python bindings, and runtime execution layers, enabling tracking and propagation of disaggregated context data throughout the system.

Changes

Cohort / File(s)	Summary
C++ Core API `cpp/include/tensorrt_llm/executor/executor.h`, `cpp/tensorrt_llm/executor/contextPhaseParams.cpp`	Extended `ContextPhaseParams` constructor overloads to accept three new optional trailing parameters (`disaggId`, `ctxDpRank`, `disaggInfoEndpoint`). Added public getter/setter methods for new disaggregation fields and for existing fields (`firstGenTokens`, `draftTokens`, `reqId`). Introduced private member variables and updated equality operator to include new fields.
Serialization `cpp/tensorrt_llm/executor/serialization.cpp`	Extended serialization/deserialization paths to handle three new optional fields in `ContextPhaseParams`. Updated size calculation and constructor invocations to propagate disaggregation metadata in both state-based and non-state code paths.
Batch Manager `cpp/tensorrt_llm/batch_manager/llmRequest.cpp`, `cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp`	Updated `ContextPhaseParams` initialization to pass three new disaggregation parameters. Changed Python binding for `context_phase_params` property from read-only to read-write by exposing setter method.
Python Executor Bindings `cpp/tensorrt_llm/nanobind/executor/request.cpp`, `cpp/tensorrt_llm/pybind/executor/request.cpp`	Extended pickle serialization state from 4 to 7 fields in `__getstate__`/`__setstate__`. Added new public properties (`disagg_id`, `ctx_dp_rank`, `disagg_info_endpoint`) with read/write accessors. Updated constructor binding to accept three additional optional parameters with proper defaults.
PyTorch Executor Runtime `tensorrt_llm/_torch/pyexecutor/llm_request.py`, `tensorrt_llm/_torch/pyexecutor/executor_request_queue.py`, `tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py`, `tensorrt_llm/executor/base_worker.py`	Added `py_disaggregated_params` attribute to `LlmRequest` class. Extended request broadcasting to include disaggregated params. Added propagation of `disagg_id` from `py_disaggregated_params` to `context_phase_params`. Added conditional attachment of disaggregated params for PyTorch backend.
DisaggregatedParams API `tensorrt_llm/disaggregated_params.py`, `tensorrt_llm/openai_protocol.py`, `tensorrt_llm/serve/openai_disagg_service.py`, `tensorrt_llm/executor/result.py`	Added three new optional fields to `DisaggregatedParams`: `disagg_id`, `ctx_dp_rank`, `ctx_info_endpoint`. Updated `get_context_phase_params` to propagate new fields when constructing `ContextPhaseParams`. Updated conversion functions and result handling to include new fields in disaggregated params reconstruction.
Tests `tests/unittest/bindings/test_executor_bindings.py`	Updated `ContextPhaseParams` construction from 4-argument to 7-argument form. Extended pickle round-trip assertions to verify new fields (`disagg_id`, `ctx_dp_rank`, `disagg_info_endpoint`) on both original and copied instances.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title '[TRTLLM-9527][feat] change context params and disagg params (step3)' is vague and generic. While it mentions the feature type and includes a ticket reference, 'change context params and disagg params' is overly broad and doesn't convey the specific nature of the changes (adding new fields, exposing setters, etc.).	Clarify the title to describe the specific change, such as '[TRTLLM-9527][feat] Add disaggregation fields to ContextPhaseParams' or similar to better convey the main change.
Description check	❓ Inconclusive	PR description is incomplete, using only template with placeholder text. Author-provided description lacks explicit rationale section, detailed test coverage, and documentation updates.	Add detailed description of why these three fields (disagg_id, ctx_dp_rank, ctx_info_endpoint) are necessary, explain the pre-registration flow compatibility requirement, and document which test cases validate these changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chuangz0 · 2026-01-07T09:37:27Z

/bot run --add-multi-gpu-tes

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tensorrt_llm/serve/openai_disagg_service.py (1)
314-317: Consider extracting the error message to improve maintainability.

The validation logic is correct, but per the static analysis hint (TRY003), the error message could be extracted to a constant or custom exception class for better maintainability.
♻️ Optional refactoring to address TRY003

Define a constant at module level:
DISAGG_ID_MISSING_ERROR = (
    "Invalid disaggregated params in context phase response. disagg_id is None"
)
Then use it:
             if ctx_response.choices[0].disaggregated_params.disagg_id is None:
-                raise ValueError(
-                    "Invalid disaggregated params in context phase response. disagg_id is None"
-                )
+                raise ValueError(DISAGG_ID_MISSING_ERROR)
tensorrt_llm/executor/result.py (1)

430-432: Consider naming consistency for endpoint field.

The field name changes from disagg_info_endpoint (in context_phase_params) to ctx_info_endpoint (in DisaggregatedParams). While this follows the pattern of other fields like ctx_dp_rank and ctx_request_id, it could cause confusion since it's the only field with a name transformation.

Consider either:

Keeping consistent naming (both as disagg_info_endpoint), or

Document this mapping clearly if the distinction is intentional.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b130d58 and ead4fc3.

📒 Files selected for processing (17)

cpp/include/tensorrt_llm/executor/executor.h
cpp/tensorrt_llm/batch_manager/llmRequest.cpp
cpp/tensorrt_llm/executor/contextPhaseParams.cpp
cpp/tensorrt_llm/executor/serialization.cpp
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
cpp/tensorrt_llm/nanobind/executor/request.cpp
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp
cpp/tensorrt_llm/pybind/executor/request.cpp
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py
tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py
tensorrt_llm/_torch/pyexecutor/llm_request.py
tensorrt_llm/disaggregated_params.py
tensorrt_llm/executor/base_worker.py
tensorrt_llm/executor/result.py
tensorrt_llm/serve/openai_disagg_service.py
tensorrt_llm/serve/openai_protocol.py
tests/unittest/bindings/test_executor_bindings.py

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh}