-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-8263][feat] Add Disagg Perf Tests #10912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[TRTLLM-8263][feat] Add Disagg Perf Tests #10912
Conversation
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-4" |
|
PR_Github #33100 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughIntroduces a programmatic helper function Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In `@jenkins/L0_Test.groovy`:
- Around line 3150-3161: The values tuple constructed in buildStageConfigs does
not match the function parameter order: it currently builds values as [platform,
testlist, k, testCount, gpuCount, nodeCount] while the signature is (stageName,
platform, testlist, testCount, nodeCount, gpuCount, ...); fix by reordering the
values array to [platform, testlist, k, testCount, nodeCount, gpuCount] so
positional unpacking later matches the signature (alter values in the loop where
configs["${stageName}-${k}"] is assigned), or alternatively add a clear comment
next to buildStageConfigs and the values assignment documenting an intentional
reversal if you must keep the current order.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_dep8_bs192_eplb0_mtp1_ccb-UCX.yaml`:
- Around line 11-37: This config contains raw placeholder tokens (<partition>,
<account>, <dataset_file>, <container_mount>, <container_image>, <model_path>,
<full_path_to_work_dir>) which must be substituted before CI; locate the
template rendering path that processes perf configs (search the
harness/template/render functions or scripts that load this YAML) and either (a)
ensure those placeholders are passed into the renderer and replaced at submit
time or (b) replace them in this YAML with concrete values or supported template
variables used by the harness; validate by running the provided grep checks (rg
for the placeholder strings and for render/template/substitute keywords in
.py/.sh) and update the renderer or this file accordingly so no raw <...> tokens
remain at job submission.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_dep8_bs32_eplb0_mtp1_ccb-UCX.yaml`:
- Around line 38-39: The multiline value for the YAML key worker_env_var is
invalid because the second line (TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes) is
indented as a continuation without a block scalar; fix by converting
worker_env_var to a proper YAML multiline scalar (e.g., use a pipe '|' after
worker_env_var) or join the env vars on a single line separated by spaces so the
sequence "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" is a valid
scalar value for worker_env_var.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs1_eplb0_mtp3_ccb-UCX.yaml`:
- Around line 7-10: The YAML defines script_file twice (in the metadata block
and inside the slurm mapping); remove the redundant key to avoid
divergence—prefer keeping script_file under the slurm mapping (the canonical job
config), so delete the metadata-level script_file entry and leave
slurm.script_file intact; ensure any code that reads metadata for script_file is
updated to read slurm.script_file if needed and run a quick lint to confirm no
duplicate keys remain.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-v32-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs1_eplb0_mtp3_ccb-UCX.yaml`:
- Around line 7-10: Remove the duplicate YAML key by keeping a single source of
truth for script_file: delete either the top-level metadata/script_file or the
slurm/script_file entry so only one script_file remains; update any consumers to
reference the retained location (keys: script_file, metadata, slurm) to avoid
ambiguity and maintenance drift.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-kimi-k2-thinking-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs4_eplb0_mtp3_ccb-UCX.yaml`:
- Around line 71-76: The YAML sets eagle3_one_model under speculative_config as
the quoted string 'true' which YAML treats as a string; change it to an unquoted
boolean true so code reading eagle3_one_model receives a boolean. Locate the
speculative_config block (identifier &id001) and replace the value for
eagle3_one_model from the quoted 'true' to the literal boolean true, keeping the
rest of the keys (decoding_type, max_draft_len, speculative_model_dir,
trust_remote_code) unchanged.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-qwen3-235b-fp4_8k1k_ctx1_tp1_bs4_gen1_dep8_bs128_eplb0_mtp0_ccb-UCX.yaml`:
- Around line 15-55: The gen parallelism (worker_config.gen.tensor_parallel_size
and worker_config.gen.moe_expert_parallel_size) exceeds the allocated GPUs
(extra_args --gres=gpu:4 and hardware.gpus_per_node: 4); update either to
request 8 GPUs by changing extra_args and hardware.gpus_per_node to 8, or reduce
tensor_parallel_size and moe_expert_parallel_size to values that fit within 4
GPUs (e.g., 4 or less) so the gen server's parallelism matches the allocated
hardware.
In
`@tests/integration/test_lists/test-db/l0_gb200_multi_nodes_disagg_perf_sanity_2ctx_2gen.yml`:
- Around line 1-16: The YAML filename
l0_gb200_multi_nodes_disagg_perf_sanity_2ctx_2gen.yml contradicts the embedded
test id which uses _ctx1_..._gen1_...
(perf/test_perf_sanity.py::test_e2e[disagg_upload-gb200-deepseek-r1-fp4_128k8k_ctx1_pp8_bs1_gen1_...]);
either rename the file to l0_gb200_multi_nodes_disagg_perf_sanity_1ctx_1gen.yml
to match the existing test, or update the test entry to reference a
2-context/2-generator configuration (replace _ctx1_ with _ctx2_ and _gen1_ with
_gen2_ in the test id) so filename and test config are consistent.
In
`@tests/integration/test_lists/test-db/l0_gb200_multi_nodes_disagg_perf_sanity_2ctx_8gen.yml`:
- Around line 1-16: The YAML filename
l0_gb200_multi_nodes_disagg_perf_sanity_2ctx_8gen.yml does not match the test
content (the only test uses _ctx1_..._gen1_), so either rename the file to
reflect the actual parameters (e.g.,
l0_gb200_multi_nodes_disagg_perf_sanity_1ctx_1gen.yml) or update the test list
to include the missing 2ctx/8gen variants; locate the test entry with the
identifier
perf/test_perf_sanity.py::test_e2e[disagg_upload-gb200-deepseek-r1-fp4_128k8k_ctx1_pp8_bs1_gen1_dep32_bs2_eplb0_mtp3_ccb-UCX]
and either duplicate/modify it to create the ctx2/gen8 cases or rename the file
accordingly so filename and contained test parameters match.
🧹 Nitpick comments (3)
tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs1_eplb0_mtp3_ccb-UCX.yaml (1)
71-72: Consider using a descriptive YAML anchor name.The anchor
&id001is functional but auto-generated in style. A more descriptive name improves readability and maintainability.Suggested improvement
- speculative_config: &id001 + speculative_config: &speculative_mtp_config decoding_type: MTP num_nextn_predict_layers: 3And update the reference:
- speculative_config: *id001 + speculative_config: *speculative_mtp_configAlso applies to: 97-97
tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-v32-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs1_eplb0_mtp3_ccb-UCX.yaml (1)
71-74: Consider using a descriptive YAML anchor name.The anchor
&id001is generic. Using a semantic name like&speculative_config_mtp3would improve readability and make the reference at line 104 self-documenting.- speculative_config: &id001 + speculative_config: &speculative_config_mtp3 decoding_type: MTP num_nextn_predict_layers: 3And at line 104:
- speculative_config: *id001 + speculative_config: *speculative_config_mtp3tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_1k1k_ctx1_dep4_bs16_gen1_dep32_bs32_eplb0_mtp3_ccb-UCX.yaml (1)
7-10: Redundantscript_filedeclaration.
script_fileis defined in bothmetadata(line 7) andslurm(line 10) sections with the same value. Consider removing the duplicate to maintain a single source of truth.♻️ Suggested fix
metadata: model_name: deepseek_r1_0528_fp4_v2 precision: fp4 model_dir_name: DeepSeek-R1-0528-FP4-v2 supported_gpus: - GB200 - script_file: disaggr_torch.slurm benchmark_type: 1k1k slurm: script_file: disaggr_torch.slurm
.../perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_dep8_bs192_eplb0_mtp1_ccb-UCX.yaml
Show resolved
Hide resolved
...g/perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_dep8_bs32_eplb0_mtp1_ccb-UCX.yaml
Outdated
Show resolved
Hide resolved
...gg/perf-sanity/b200-deepseek-r1-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs1_eplb0_mtp3_ccb-UCX.yaml
Show resolved
Hide resolved
.../perf-sanity/gb200-deepseek-v32-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs1_eplb0_mtp3_ccb-UCX.yaml
Show resolved
Hide resolved
...f-sanity/gb200-kimi-k2-thinking-fp4_8k1k_ctx1_dep4_bs2_gen1_tep8_bs4_eplb0_mtp3_ccb-UCX.yaml
Show resolved
Hide resolved
...g/perf-sanity/gb200-qwen3-235b-fp4_8k1k_ctx1_tp1_bs4_gen1_dep8_bs128_eplb0_mtp0_ccb-UCX.yaml
Show resolved
Hide resolved
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_disagg_perf_sanity_2ctx_2gen.yml
Show resolved
Hide resolved
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_disagg_perf_sanity_2ctx_8gen.yml
Show resolved
Hide resolved
|
PR_Github #33100 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-4" |
|
PR_Github #33113 [ run ] triggered by Bot. Commit: |
|
PR_Github #33113 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-5,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-8" |
|
PR_Github #33140 [ run ] triggered by Bot. Commit: |
|
PR_Github #33140 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-5,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-8" |
|
PR_Github #33185 [ run ] triggered by Bot. Commit: |
|
PR_Github #33185 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-5,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-8" |
|
PR_Github #33255 [ run ] triggered by Bot. Commit: |
|
PR_Github #33255 [ run ] completed with state |
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-1_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-5,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-PerfSanity-Disagg-1_CTX-2_GEN-Post-Merge-8" |
|
PR_Github #33286 [ run ] triggered by Bot. Commit: |
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.