[None][chore] Revert "[TRTLLM-8263][feat] Add ctx-only and gen-only Disagg Perf Tes…#11602
[None][chore] Revert "[TRTLLM-8263][feat] Add ctx-only and gen-only Disagg Perf Tes…#11602pcastonguay wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
…ts (NVIDIA#11361)" This reverts commit 80dbb64. Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
/bot run --post-merge |
|
PR_Github #36348 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThis PR restructures performance sanity test infrastructure by renaming test configurations with aggr/disagg prefixes, updating the MOE backend from CUTEDSL to TRTLLM across 60+ disaggregated test configs, refactoring performance submission scripts with new environment variable naming conventions, removing legacy local SLURM orchestration code, and significantly reworking the test harness for improved command execution and result collection. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml (1)
65-68:⚠️ Potential issue | 🟠 MajorThe
genworker'smoe_config.backendshould also be reverted toTRTLLM.Commit 80dbb64 changed both workers' MOE backends:
worker_config.ctx.moe_config.backend: CUTEDSL → CUTEDSL (unchanged by 80dbb64)worker_config.gen.moe_config.backend: TRTLLM → CUTEDSLThis PR reverts only the ctx backend (CUTEDSL → TRTLLM), leaving gen at CUTEDSL. For a complete revert, gen should also be restored to TRTLLM, eliminating the asymmetry between the two worker roles.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml` around lines 65 - 68, The gen worker's MOE backend was left as CUTEDSL while ctx was reverted to TRTLLM; update the worker_config.gen.moe_config.backend entry (the gen worker's moe_config.backend) to TRTLLM to match worker_config.ctx and complete the revert so both workers use TRTLLM.tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_8k1k_ctx1_dep4_gen1_dep16_eplb0_mtp1_ccb-UCX.yaml (1)
64-92:⚠️ Potential issue | 🟡 MinorConfirm whether the asymmetrical moe_config backends (
gen: CUTEDSL,ctx: TRTLLM) in the dep16 variant is intentional.The asymmetry is confirmed. This config uses
CUTEDSLin gen butTRTLLMin ctx, while comparable configs (e.g., thetep8variants of the same 8k1k config and similar 1k1k patterns) useTRTLLMin both sections. If the dep16 variant requires CUTEDSL specifically for generation on this hardware, this is correct — but given the divergence from the pattern across other comparable test configs, it merits explicit clarification.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_8k1k_ctx1_dep4_gen1_dep16_eplb0_mtp1_ccb-UCX.yaml` around lines 64 - 92, The review points out asymmetrical moe_config backends: the generation section uses moe_config.backend = CUTEDSL while the ctx section uses moe_config.backend = TRTLLM; confirm whether this is intentional for the dep16 variant and, if not, make them consistent (either set ctx.moe_config.backend to CUTEDSL to match gen.moe_config.backend or change gen.moe_config.backend to TRTLLM to match ctx), updating the YAML entries for the moe_config under the gen and ctx sections accordingly and adding a brief inline comment indicating why the chosen backend is required for dep16 if the asymmetry must remain.tests/integration/defs/perf/test_perf_sanity.py (2)
1-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2026.
This file was modified in 2026 but the header still ends at 2025.
As per coding guidelines: “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. … update year on modified files.”✏️ Suggested header update
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/test_perf_sanity.py` around lines 1 - 2, Update the SPDX header years to include 2026 by changing the trailing year in both header lines (the copyright line and the SPDX-License-Identifier line) so the header reflects the latest meaningful modification year; locate the top-of-file header in test_perf_sanity.py and replace "2022-2025" with "2022-2026" (or otherwise ensure the final year is 2026).
17-47:⚠️ Potential issue | 🟡 MinorUse namespace‑preserving module imports throughout the import block.
Multiple symbol-level imports violate the namespace guidelines. All of the following should import the module and qualify usages instead:
from test_common.http_utils import wait_for_endpoint_readyfrom defs.trt_test_alternative import print_error, print_infofrom tensorrt_llm._utils import get_free_portfrom ..conftest import get_llm_root, llm_models_rootfrom .open_search_db_utils import (...)with all named symbolsfrom .utils import collect_and_clean_myelin_timeAs per coding guidelines: "Always maintain the namespace when importing. Use
from package.subpackage import fooinstead offrom package.subpackage.foo import SomeClass" and "Python imports must usefrom package.subpackage import modulestyle; never usefrom module import Class."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/test_perf_sanity.py` around lines 17 - 47, The import block uses symbol-level imports that violate the namespace rule; change them to module-level imports and update all call sites to qualify names. Replace "from test_common.http_utils import wait_for_endpoint_ready" with "import test_common.http_utils as http_utils" and use http_utils.wait_for_endpoint_ready; replace "from defs.trt_test_alternative import print_error, print_info" with "import defs.trt_test_alternative as trt_test_alternative" and use trt_test_alternative.print_error/print_info; replace "from tensorrt_llm._utils import get_free_port" with "import tensorrt_llm._utils as trt_utils" and use trt_utils.get_free_port; replace "from ..conftest import get_llm_root, llm_models_root" with "import ..conftest as conftest" and use conftest.get_llm_root / conftest.llm_models_root; replace the multi-symbol "from .open_search_db_utils import (...)" with "import .open_search_db_utils as open_search_db_utils" and qualify SCENARIO_MATCH_FIELDS, add_id, check_perf_regression, get_common_values, get_history_data, get_job_info, post_new_perf_data, prepare_baseline_data, prepare_regressive_test_cases as open_search_db_utils.SYMBOL; and replace "from .utils import collect_and_clean_myelin_time" with "import .utils as utils" and use utils.collect_and_clean_myelin_time throughout.jenkins/scripts/perf/disaggregated/submit.py (3)
1-5:⚠️ Potential issue | 🟡 MinorAdd the NVIDIA Apache 2.0 header to this Python file.
This file is modified but lacks the required NVIDIA copyright header with the latest year of modification (2026).
As per coding guidelines: “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format. … update year on modified files.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jenkins/scripts/perf/disaggregated/submit.py` around lines 1 - 5, Insert the NVIDIA Apache 2.0 license header with the latest modification year 2026 at the top of the file (keeping the existing shebang and imports intact); specifically add the standard multi-line Apache 2.0 comment block that includes "Copyright (c) 2026 NVIDIA CORPORATION" and the Apache License, Version 2.0 notice before or immediately after the "#!/usr/bin/env python3" line so the header is present for the module containing the current imports (argparse, os, yaml).
223-258:⚠️ Potential issue | 🟡 MinorAdd a Google-style docstring for
get_config_yaml.This public helper lacks a docstring. Per coding guidelines, all functions must include Google-style docstrings that can be parsed by Sphinx.
📄 Example docstring
def get_config_yaml(test_list_path, llm_src): + """Resolve the perf-sanity config YAML path from a test list entry. + + Args: + test_list_path: Path to the rendered test list file. + llm_src: Path to the TensorRT-LLM source root. + + Returns: + Absolute path to the config YAML file. + + Raises: + ValueError: If the test list format is invalid. + FileNotFoundError: If the config YAML does not exist. + """ with open(test_list_path, "r") as f: first_line = f.readline().strip()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jenkins/scripts/perf/disaggregated/submit.py` around lines 223 - 258, Add a Google-style docstring to the public helper function get_config_yaml describing its purpose and behavior; include an initial one-line summary, a short description, and sections for Args (test_list_path: str, llm_src: str), Returns (str path to the resolved YAML), and Raises (ValueError for malformed test names, FileNotFoundError when the config file is missing), and optionally an example usage. Place this docstring immediately under the def get_config_yaml(...) signature so Sphinx can parse it.
223-258:⚠️ Potential issue | 🟠 MajorLocal mode requires
--test-listdue to unconditionalget_config_yaml()call even when--config-yamlis supplied.Line 296 unconditionally calls
get_config_yaml(args.test_list, args.llm_src)even in local mode. With both arguments defaulting to empty strings, this fails when a user provides only--config-yamlfor local execution.Proposed fix
- config_yaml = get_config_yaml(args.test_list, args.llm_src) + if args.run_ci: + config_yaml = get_config_yaml(args.test_list, args.llm_src) + elif args.config_yaml: + config_yaml = args.config_yaml + else: + parser.error( + "Provide --run-ci with --test-list/--llm-src, or supply --config-yaml for local mode." + )Also applies to: 296–296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jenkins/scripts/perf/disaggregated/submit.py` around lines 223 - 258, Unconditionally calling get_config_yaml(args.test_list, args.llm_src) forces --test-list even when the user supplied --config-yaml (especially in local mode); change the call site so that you use args.config_yaml if provided, and only call get_config_yaml(...) when args.config_yaml is empty and args.test_list is present. Update the logic where get_config_yaml is invoked (the code that sets config_yaml from args) to: if args.config_yaml: config_yaml = args.config_yaml elif args.test_list: config_yaml = get_config_yaml(args.test_list, args.llm_src) else: raise a clear error asking for one of --config-yaml or --test-list; keep using the existing get_config_yaml function unchanged.
🧹 Nitpick comments (1)
tests/test_common/http_utils.py (1)
40-41:except Exceptionis too broad; narrow to expected I/O exceptions.Per the coding guidelines, exceptions should be limited to the smallest applicable set. For file reading,
OSError(coversIOError,FileNotFoundError) andUnicodeDecodeErrorare the concrete failure modes; catching bareExceptionalso suppresses unrelated programming errors.🛠️ Proposed fix
- except Exception as err: + except (OSError, UnicodeDecodeError) as err: print(f"Failed to read server file {check_file}: {err}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_common/http_utils.py` around lines 40 - 41, Replace the broad "except Exception as err" in tests/test_common/http_utils.py (the block that prints "Failed to read server file {check_file}: {err}") with a narrower catch for the I/O-related errors expected when reading files—specifically catch OSError and UnicodeDecodeError (e.g., "except (OSError, UnicodeDecodeError) as err") so unrelated exceptions aren't swallowed; keep the same error message using check_file and err.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@jenkins/scripts/perf/perf_sanity_triage.py`:
- Line 241: The regression_date currently takes the raw millisecond epoch from
data.get("ts_created", "N/A") and will show an unreadable integer in Slack;
replace that assignment so it parses the value (if present and numeric), divides
by 1000 to convert ms → seconds, converts to a human-readable date/time string
(e.g., via datetime.fromtimestamp(...).strftime("%Y-%m-%d %H:%M:%S") or
similar), and falls back to "N/A" on missing/invalid values; update the place
where regression_date is set (the line assigning regression_date from data.get)
so downstream Slack text uses the formatted string.
In
`@tests/integration/test_lists/test-db/l0_dgx_b200_multi_nodes_disagg_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu8.yml`:
- Around line 5-8: The comment currently describes total hardware ("2 nodes with
each node has 8 GPUs") but should describe per-worker allocations to match the
filename and other GB200 files; update the comment above the system_gpu_count
block to state the per-worker allocation (e.g., "ctx worker: 4 GPUs, gen worker:
8 GPUs; total system_gpu_count: 16") so that it aligns with the filename
ctx1_node1_gpu4_gen1_node1_gpu8 and the system_gpu_count key.
In
`@tests/integration/test_lists/test-db/l0_gb200_multi_nodes_disagg_perf_sanity_ctx1_node1_gpu1_gen1_node1_gpu4.yml`:
- Around line 5-9: The system_gpu_count is fixed to 8 but the config comments
and group name imply 1 ctx node with 1 GPU + 1 gen node with 4 GPUs (5 active
GPUs); clarify that the 8 GPU requirement is intentional to force whole-node
allocation on GB200 NVL4 (4 GPUs/node) — add an inline comment next to
system_gpu_count explaining that the scheduler must allocate whole NVL4 nodes (2
nodes × 4 GPUs = 8) so the ctx node will have unused GPUs, or adjust the range
if whole-node allocation is not intended; reference the system_gpu_count key and
the ctx/gen worker comments when making the change.
In `@tests/scripts/perf-sanity/gpt_oss_120b_fp4_grace_blackwell.yaml`:
- Around line 3-5: The YAML filename incorrectly uses the "_grace_blackwell"
suffix while declaring standalone Blackwell GPUs (supported_gpus: [B200, B300]);
rename the file to gpt_oss_120b_fp4_blackwell.yaml and update all references to
it (notably the five test entries in the
tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml) so
they point to the new filename; ensure the file content still lists
supported_gpus: [B200, B300] and run a quick grep across the repo to catch any
other references to the old gpt_oss_120b_fp4_grace_blackwell.yaml name.
In `@tests/scripts/perf-sanity/README.md`:
- Line 31: Replace the four fenced code blocks that currently start with plain
``` (the blocks containing lines like
"perf/test_perf_sanity.py::test_e2e[aggr_upload-{config yaml file base name}]"
and the two variants with "-{server_config_name}") so they begin with ```text
instead of ``` to satisfy MD040; update all four occurrences (the three
additional blocks that contain the other test identifier variants) so each code
fence explicitly uses the text language specifier.
In `@tests/test_common/http_utils.py`:
- Line 14: The type annotation for the parameter/variable server_proc in
tests/test_common/http_utils.py currently uses subprocess.Popen without
explicitly allowing None, triggering Ruff RUF013; update the annotation to use
the Python 3.10 union form subprocess.Popen | None (i.e., change server_proc:
subprocess.Popen to server_proc: subprocess.Popen | None) so the variable is
explicitly optional and conforms to the project's typing style.
- Around line 26-41: The periodic file-scan using "if check_files and iteration
% 300 == 0" is unreliable; change it to a time-based check using a timestamp
(e.g., start_time = time.time()) and a small interval (e.g., 1–5s) derived from
the existing timeout so the scan runs repeatedly during long request timeouts.
Update the loop that references check_files, ERROR_KEYWORDS and iteration to
instead compare time.time() - last_check_time >= check_interval, update
last_check_time when scanning, and keep the existing behavior of reading files
and raising RuntimeError when keywords are found.
---
Outside diff comments:
In `@jenkins/scripts/perf/disaggregated/submit.py`:
- Around line 1-5: Insert the NVIDIA Apache 2.0 license header with the latest
modification year 2026 at the top of the file (keeping the existing shebang and
imports intact); specifically add the standard multi-line Apache 2.0 comment
block that includes "Copyright (c) 2026 NVIDIA CORPORATION" and the Apache
License, Version 2.0 notice before or immediately after the "#!/usr/bin/env
python3" line so the header is present for the module containing the current
imports (argparse, os, yaml).
- Around line 223-258: Add a Google-style docstring to the public helper
function get_config_yaml describing its purpose and behavior; include an initial
one-line summary, a short description, and sections for Args (test_list_path:
str, llm_src: str), Returns (str path to the resolved YAML), and Raises
(ValueError for malformed test names, FileNotFoundError when the config file is
missing), and optionally an example usage. Place this docstring immediately
under the def get_config_yaml(...) signature so Sphinx can parse it.
- Around line 223-258: Unconditionally calling get_config_yaml(args.test_list,
args.llm_src) forces --test-list even when the user supplied --config-yaml
(especially in local mode); change the call site so that you use
args.config_yaml if provided, and only call get_config_yaml(...) when
args.config_yaml is empty and args.test_list is present. Update the logic where
get_config_yaml is invoked (the code that sets config_yaml from args) to: if
args.config_yaml: config_yaml = args.config_yaml elif args.test_list:
config_yaml = get_config_yaml(args.test_list, args.llm_src) else: raise a clear
error asking for one of --config-yaml or --test-list; keep using the existing
get_config_yaml function unchanged.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml`:
- Around line 65-68: The gen worker's MOE backend was left as CUTEDSL while ctx
was reverted to TRTLLM; update the worker_config.gen.moe_config.backend entry
(the gen worker's moe_config.backend) to TRTLLM to match worker_config.ctx and
complete the revert so both workers use TRTLLM.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-r1-fp4_8k1k_ctx1_dep4_gen1_dep16_eplb0_mtp1_ccb-UCX.yaml`:
- Around line 64-92: The review points out asymmetrical moe_config backends: the
generation section uses moe_config.backend = CUTEDSL while the ctx section uses
moe_config.backend = TRTLLM; confirm whether this is intentional for the dep16
variant and, if not, make them consistent (either set ctx.moe_config.backend to
CUTEDSL to match gen.moe_config.backend or change gen.moe_config.backend to
TRTLLM to match ctx), updating the YAML entries for the moe_config under the gen
and ctx sections accordingly and adding a brief inline comment indicating why
the chosen backend is required for dep16 if the asymmetry must remain.
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 1-2: Update the SPDX header years to include 2026 by changing the
trailing year in both header lines (the copyright line and the
SPDX-License-Identifier line) so the header reflects the latest meaningful
modification year; locate the top-of-file header in test_perf_sanity.py and
replace "2022-2025" with "2022-2026" (or otherwise ensure the final year is
2026).
- Around line 17-47: The import block uses symbol-level imports that violate the
namespace rule; change them to module-level imports and update all call sites to
qualify names. Replace "from test_common.http_utils import
wait_for_endpoint_ready" with "import test_common.http_utils as http_utils" and
use http_utils.wait_for_endpoint_ready; replace "from defs.trt_test_alternative
import print_error, print_info" with "import defs.trt_test_alternative as
trt_test_alternative" and use trt_test_alternative.print_error/print_info;
replace "from tensorrt_llm._utils import get_free_port" with "import
tensorrt_llm._utils as trt_utils" and use trt_utils.get_free_port; replace "from
..conftest import get_llm_root, llm_models_root" with "import ..conftest as
conftest" and use conftest.get_llm_root / conftest.llm_models_root; replace the
multi-symbol "from .open_search_db_utils import (...)" with "import
.open_search_db_utils as open_search_db_utils" and qualify
SCENARIO_MATCH_FIELDS, add_id, check_perf_regression, get_common_values,
get_history_data, get_job_info, post_new_perf_data, prepare_baseline_data,
prepare_regressive_test_cases as open_search_db_utils.SYMBOL; and replace "from
.utils import collect_and_clean_myelin_time" with "import .utils as utils" and
use utils.collect_and_clean_myelin_time throughout.
---
Duplicate comments:
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/b200-deepseek-r1-fp4_1k1k_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml`:
- Around line 91-93: The config shows asymmetric backends between generation and
context (moe_config.backend: TRTLLM while cache_transceiver_config is
different/unspecified), duplicating the earlier issue; update the YAML so
generation and context use the same backend—e.g., explicitly set
cache_transceiver_config to use TRTLLM or change moe_config.backend to match the
cache transceiver—so both components (moe_config.backend and
cache_transceiver_config) are consistent with the matching config used in the
referenced gb200 example.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-deepseek-v32-fp4_8k1k_ctx1_dep4_gen1_dep32_eplb256_mtp0_ccb-UCX.yaml`:
- Around line 99-101: The config shows asymmetric backends between ctx and gen:
moe_config.backend is TRTLLM while gen still uses CUTEDSL (see gen's
load_balancer.num_slots: 256 and nvfp4_gemm_config.allowed_backends containing
cutedsl). Make the backends consistent by updating the gen or ctx section so
both use the same backend (adjust moe_config.backend in the gen/ctx block
accordingly) and ensure associated fields (e.g., load_balancer.num_slots and
nvfp4_gemm_config.allowed_backends) are updated to match the chosen backend.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb200-qwen3-235b-fp4_8k1k_ctx1_tp1_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml`:
- Around line 87-89: The YAML shows asymmetric backends between generation and
context paths (moe_config.backend: TRTLLM vs cache_transceiver_config
unspecified), duplicate of earlier flag; make the backends consistent by
explicitly setting the same backend for both paths (e.g., set
cache_transceiver_config.backend to TRTLLM or change moe_config.backend to match
the intended cache transceiver backend), ensuring the keys referenced are
moe_config.backend and cache_transceiver_config.backend so both gen and ctx use
the same backend implementation.
---
Nitpick comments:
In `@tests/test_common/http_utils.py`:
- Around line 40-41: Replace the broad "except Exception as err" in
tests/test_common/http_utils.py (the block that prints "Failed to read server
file {check_file}: {err}") with a narrower catch for the I/O-related errors
expected when reading files—specifically catch OSError and UnicodeDecodeError
(e.g., "except (OSError, UnicodeDecodeError) as err") so unrelated exceptions
aren't swallowed; keep the same error message using check_file and err.
|
|
||
| # Get regression branch and commit from data | ||
| regression_date = _timestamp_to_date(data.get("ts_created", "N/A")) | ||
| regression_date = data.get("ts_created", "N/A") |
There was a problem hiding this comment.
ts_created will display a raw epoch-millisecond integer in the Slack message.
The query at lines 354–362 builds the ts_created range filter in epoch milliseconds (… * 1000), meaning OpenSearch stores this field as a numeric ms-epoch value. After the removal of the previous _timestamp_to_date() conversion, data.get("ts_created", "N/A") will return a value like 1708444800000, which will appear verbatim in the Slack notification under "Regression date, branch and commit:", making it unreadable.
🛠️ Proposed fix
- regression_date = data.get("ts_created", "N/A")
+ ts_created = data.get("ts_created")
+ if isinstance(ts_created, (int, float)):
+ import datetime
+ regression_date = datetime.datetime.utcfromtimestamp(ts_created / 1000).strftime(
+ "%Y-%m-%d"
+ )
+ else:
+ regression_date = ts_created if ts_created is not None else "N/A"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jenkins/scripts/perf/perf_sanity_triage.py` at line 241, The regression_date
currently takes the raw millisecond epoch from data.get("ts_created", "N/A") and
will show an unreadable integer in Slack; replace that assignment so it parses
the value (if present and numeric), divides by 1000 to convert ms → seconds,
converts to a human-readable date/time string (e.g., via
datetime.fromtimestamp(...).strftime("%Y-%m-%d %H:%M:%S") or similar), and falls
back to "N/A" on missing/invalid values; update the place where regression_date
is set (the line assigning regression_date from data.get) so downstream Slack
text uses the formatted string.
| # 2 nodes with each node has 8 GPUs | ||
| system_gpu_count: | ||
| gte: 16 | ||
| lte: 16 |
There was a problem hiding this comment.
Nit: Comment describes system hardware, not per-worker allocation — inconsistent with other files in this PR.
The comment "2 nodes with each node has 8 GPUs" describes the total hardware (2 × 8 = 16 GPUs), which is consistent with system_gpu_count: 16. However, the filename convention (ctx1_node1_gpu4_gen1_node1_gpu8) and the per-worker descriptions in related GB200 files (e.g., "1 ctx worker with each 2 nodes and 8 GPUs") suggest the comment should instead describe the per-worker allocation (ctx: 4 GPUs, gen: 8 GPUs).
📝 Suggested comment update
- # 2 nodes with each node has 8 GPUs
- system_gpu_count:
+ # 1 ctx worker with 1 node and 4 GPUs
+ # 1 gen worker with 1 node and 8 GPUs
+ system_gpu_count:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 2 nodes with each node has 8 GPUs | |
| system_gpu_count: | |
| gte: 16 | |
| lte: 16 | |
| # 1 ctx worker with 1 node and 4 GPUs | |
| # 1 gen worker with 1 node and 8 GPUs | |
| system_gpu_count: | |
| gte: 16 | |
| lte: 16 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/integration/test_lists/test-db/l0_dgx_b200_multi_nodes_disagg_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu8.yml`
around lines 5 - 8, The comment currently describes total hardware ("2 nodes
with each node has 8 GPUs") but should describe per-worker allocations to match
the filename and other GB200 files; update the comment above the
system_gpu_count block to state the per-worker allocation (e.g., "ctx worker: 4
GPUs, gen worker: 8 GPUs; total system_gpu_count: 16") so that it aligns with
the filename ctx1_node1_gpu4_gen1_node1_gpu8 and the system_gpu_count key.
| # 1 ctx worker with each 1 node and 1 GPU | ||
| # 1 gen worker with each 1 node and 4 GPUs | ||
| system_gpu_count: | ||
| gte: 8 | ||
| lte: 8 |
There was a problem hiding this comment.
Confirm system_gpu_count: 8 accounts for whole-node allocation on GB200 NVL4.
The group name and comments describe 1+4 = 5 GPUs in active use, but the range is pinned to exactly 8. This is likely intentional if GB200 nodes are NVL4 (4 GPUs/node) and the scheduler must allocate whole nodes (2 nodes × 4 GPUs = 8), leaving 3 GPUs on the ctx node idle. If that assumption is correct, a short inline comment would make the intent self-documenting; if it is wrong, the range may exclude valid systems or cause unexpected scheduling behavior.
📝 Suggested clarifying comment
system_gpu_count:
- gte: 8
- lte: 8
+ # 2 nodes × 4 GPUs/node (GB200 NVL4); ctx uses 1 of 4 GPUs on its node
+ gte: 8
+ lte: 8📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 1 ctx worker with each 1 node and 1 GPU | |
| # 1 gen worker with each 1 node and 4 GPUs | |
| system_gpu_count: | |
| gte: 8 | |
| lte: 8 | |
| # 1 ctx worker with each 1 node and 1 GPU | |
| # 1 gen worker with each 1 node and 4 GPUs | |
| system_gpu_count: | |
| # 2 nodes × 4 GPUs/node (GB200 NVL4); ctx uses 1 of 4 GPUs on its node | |
| gte: 8 | |
| lte: 8 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/integration/test_lists/test-db/l0_gb200_multi_nodes_disagg_perf_sanity_ctx1_node1_gpu1_gen1_node1_gpu4.yml`
around lines 5 - 9, The system_gpu_count is fixed to 8 but the config comments
and group name imply 1 ctx node with 1 GPU + 1 gen node with 4 GPUs (5 active
GPUs); clarify that the 8 GPU requirement is intentional to force whole-node
allocation on GB200 NVL4 (4 GPUs/node) — add an inline comment next to
system_gpu_count explaining that the scheduler must allocate whole NVL4 nodes (2
nodes × 4 GPUs = 8) so the ctx node will have unused GPUs, or adjust the range
if whole-node allocation is not intended; reference the system_gpu_count key and
the ctx/gen worker comments when making the change.
| supported_gpus: | ||
| - GB200 | ||
| hardware: | ||
| gpus_per_node: 4 | ||
| - B200 | ||
| - B300 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all references to this filename and similar patterns in the codebase
echo "=== Direct references to gpt_oss_120b_fp4_grace_blackwell ==="
rg -n "gpt_oss_120b_fp4_grace_blackwell" --include="*.yml" --include="*.yaml" --include="*.groovy" --include="*.py" --include="*.sh" --include="*.md"
echo ""
echo "=== Check file exists and list nearby test configs ==="
fd "gpt_oss_120b_fp4" --type f
echo ""
echo "=== Look for GB200 vs B200/B300 references in test configs ==="
rg -n "GB200|B200|B300" tests/scripts/perf-sanity/ --include="*.yaml" -A 2 -B 2Repository: NVIDIA/TensorRT-LLM
Length of output: 1209
🏁 Script executed:
#!/bin/bash
# Fix rg syntax - use -t instead of --include
echo "=== Direct references to gpt_oss_120b_fp4_grace_blackwell ==="
rg "gpt_oss_120b_fp4_grace_blackwell" -t yaml -t py -t groovy -t sh -t md
echo ""
echo "=== Search for references in test lists and configs ==="
rg "grace_blackwell" -t yaml -t py -t groovy
echo ""
echo "=== Read the actual file ==="
cat tests/scripts/perf-sanity/gpt_oss_120b_fp4_grace_blackwell.yaml | head -40
echo ""
echo "=== List all gpt_oss files in perf-sanity ==="
fd "gpt_oss" tests/scripts/perf-sanity/
echo ""
echo "=== Check if B200/B300 references in test infrastructure ==="
rg "supported_gpus" tests/scripts/perf-sanity/ -A 3 | head -50Repository: NVIDIA/TensorRT-LLM
Length of output: 11321
File should be renamed to match hardware designation: gpt_oss_120b_fp4_blackwell.yaml (and references updated).
The filename gpt_oss_120b_fp4_grace_blackwell.yaml violates the project's naming convention. Files named *_grace_blackwell.yaml consistently declare supported_gpus: [GB200, GB300] (Grace CPU + Blackwell GPU), while this file specifies [B200, B300] (standalone Blackwell GPUs). All similar standalone Blackwell configs use the *_blackwell.yaml naming pattern instead.
Update required in tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml where this config is referenced (5 test entries).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/scripts/perf-sanity/gpt_oss_120b_fp4_grace_blackwell.yaml` around lines
3 - 5, The YAML filename incorrectly uses the "_grace_blackwell" suffix while
declaring standalone Blackwell GPUs (supported_gpus: [B200, B300]); rename the
file to gpt_oss_120b_fp4_blackwell.yaml and update all references to it (notably
the five test entries in the
tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml) so
they point to the new filename; ensure the file content still lists
supported_gpus: [B200, B300] and run a quick grep across the repo to catch any
other references to the old gpt_oss_120b_fp4_grace_blackwell.yaml name.
|
|
||
| **Example**: | ||
| **Test Case Names**: | ||
| ``` |
There was a problem hiding this comment.
Add language specifiers to fenced code blocks to satisfy MD040.
All four blocks contain plain-text templates or identifiers. Using ```text resolves the markdownlint warning without implying executable content.
📝 Proposed fix (all four blocks)
-```
+```text
perf/test_perf_sanity.py::test_e2e[aggr_upload-{config yaml file base name}]
perf/test_perf_sanity.py::test_e2e[aggr_upload-{config yaml file base name}-{server_config_name}]
Apply the same ` ```text ` opener at lines 40, 57, and 62.
</details>
Also applies to: 40-40, 57-57, 62-62
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>
[warning] 31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @tests/scripts/perf-sanity/README.md at line 31, Replace the four fenced code
blocks that currently start with plain (the blocks containing lines like "perf/test_perf_sanity.py::test_e2e[aggr_upload-{config yaml file base name}]" and the two variants with "-{server_config_name}") so they begin withtext
instead of ``` to satisfy MD040; update all four occurrences (the three
additional blocks that contain the other test identifier variants) so each code
fence explicitly uses the text language specifier.
</details>
<!-- fingerprinting:phantom:poseidon:churro -->
<!-- This is an auto-generated comment by CodeRabbit -->
| timeout: int = 300, | ||
| check_files: list[str] | None = None, | ||
| server_proc: subprocess.Popen | None = None, | ||
| server_proc: subprocess.Popen = None, |
There was a problem hiding this comment.
server_proc annotation is missing | None.
With Python 3.10+ (per project requirements), the idiomatic form for an optional parameter is T | None. Ruff RUF013 flags this as an implicit Optional.
🛠️ Proposed fix
- server_proc: subprocess.Popen = None,
+ server_proc: subprocess.Popen | None = None,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| server_proc: subprocess.Popen = None, | |
| server_proc: subprocess.Popen | None = None, |
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 14-14: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_common/http_utils.py` at line 14, The type annotation for the
parameter/variable server_proc in tests/test_common/http_utils.py currently uses
subprocess.Popen without explicitly allowing None, triggering Ruff RUF013;
update the annotation to use the Python 3.10 union form subprocess.Popen | None
(i.e., change server_proc: subprocess.Popen to server_proc: subprocess.Popen |
None) so the variable is explicitly optional and conforms to the project's
typing style.
| if check_files and iteration % 300 == 0: | ||
| for check_file in check_files: | ||
| error_lines = check_error(check_file) | ||
| if error_lines: | ||
| error_lines_str = ", ".join( | ||
| [f"line {line_idx}: {line_str}" for line_idx, line_str in error_lines] | ||
| ) | ||
| raise RuntimeError( | ||
| f"Found error in server file {check_file}: {error_lines_str}" | ||
| ) | ||
| if os.path.exists(check_file): | ||
| try: | ||
| with open(check_file, "r") as f: | ||
| content = f.read() | ||
| for line in content.splitlines(): | ||
| for keyword in ERROR_KEYWORDS: | ||
| if keyword in line: | ||
| raise RuntimeError( | ||
| f"Found '{keyword}' in server file {check_file}: {line}" | ||
| ) | ||
| except RuntimeError: | ||
| raise | ||
| except Exception as err: | ||
| print(f"Failed to read server file {check_file}: {err}") |
There was a problem hiding this comment.
File-check interval is effectively unreachable in realistic failure scenarios.
check_files is scanned only when iteration % 300 == 0. With the default timeout=300 each iteration takes:
- Best case (server rejects connections immediately): ~1 s/iteration → 300 iterations → first check fires at the very end of the timeout window (once, if at all).
- Worst case (requests hit the 5 s
timeout): ~6 s/iteration → only ~50 iterations beforetimeoutexpires → iteration 300 is never reached → error keywords incheck_filesare never detected.
The failure scenario where check_files is most needed (server is unresponsive/timing out) is exactly the scenario where the check never fires.
Replace the iteration counter with a time-based threshold:
🛠️ Proposed fix
start = time.monotonic()
- iteration = 0
+ last_file_check = start
while time.monotonic() - start < timeout:
if server_proc is not None:
exit_code = server_proc.poll()
if exit_code is not None:
raise RuntimeError(
f"Server process exited with code {exit_code} before becoming ready."
)
- iteration += 1
- if check_files and iteration % 300 == 0:
+ if check_files and time.monotonic() - last_file_check >= 30:
+ last_file_check = time.monotonic()
for check_file in check_files:🧰 Tools
🪛 Ruff (0.15.1)
[warning] 35-37: Abstract raise to an inner function
(TRY301)
[warning] 35-37: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 40-40: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_common/http_utils.py` around lines 26 - 41, The periodic file-scan
using "if check_files and iteration % 300 == 0" is unreliable; change it to a
time-based check using a timestamp (e.g., start_time = time.time()) and a small
interval (e.g., 1–5s) derived from the existing timeout so the scan runs
repeatedly during long request timeouts. Update the loop that references
check_files, ERROR_KEYWORDS and iteration to instead compare time.time() -
last_check_time >= check_interval, update last_check_time when scanning, and
keep the existing behavior of reading files and raising RuntimeError when
keywords are found.
|
PR_Github #36348 [ run ] completed with state
|
|
Closed in favor of e4277e4. |
…ts (#11361)"
This reverts commit 80dbb64.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.