chore: update gb200 alltoall & sanity check#624
Conversation
Signed-off-by: wenpengw-nv <wenpengw@nvidia.com>
Signed-off-by: wenpengw-nv <wenpengw@nvidia.com>
Signed-off-by: wenpengw-nv <wenpengw@nvidia.com>
Sanity Check Chart Generation ReportNew perf data files were detected in this PR. Please use the link above to Below is a report of whether the chart generation was successful for each op. Chart Generation Report for system: gb200, backend: trtllm, backend_version: 1.2.0rc6
|
WalkthroughAdjusts TRT‑LLM All2All SOL communication byte‑count model and removes p2p_latency_ms from SOL timing; updates Git LFS pointer for alltoall perf data; adds Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/aiconfigurator/sdk/perf_database.py`:
- Around line 5716-5728: remote_ranks is incorrectly computed as min(topk,
moe_ep_size) - 1 and ignores num_experts and whether any selected expert is
local; update the calculation to use num_experts and moe_ep_size and make local
selection explicit: compute remote_ranks = min(topk, num_experts, moe_ep_size)
and do NOT unconditionally subtract 1, and/or add an explicit parameter (e.g.,
local_selected or num_local_experts) to the enclosing helper so callers can
subtract local selections when appropriate; then use that remote_ranks value in
the data_bytes branches that reference remote_ranks (where op_name, num_tokens,
hidden_size are used).
- Around line 5720-5722: The combine branch treats all "*combine*" ops as BF16
(2 bytes/element); detect the low-precision path by checking op_name (e.g.,
op_name == "alltoall_combine_low_precision" or "alltoall_combine_low_precision"
in op_name) and set bytes_per_element = 0.5 for that case, otherwise keep 2 for
standard BF16, then compute data_bytes = num_tokens * remote_ranks * hidden_size
* bytes_per_element (update the assignment to data_bytes in the block containing
op_name and num_tokens/remote_ranks/hidden_size).
In `@tools/sanity_check/validate_database.ipynb`:
- Around line 1113-1123: The code unions token counts across all (node_num,
hidden_size, topk, num_experts) tuples into a single token_set while only using
the last tuple's (hs, tk, ne_val, nn_val) when querying kernel_data, which mixes
incompatible shapes; change the logic in the kernel_data traversal (the block
using qm, nn, h, t, ne, target_ep, token_set, hs, tk, ne_val, nn_val) to collect
token sets keyed by the full shape tuple (nn, h, t, ne) — e.g., build a mapping
shape_key -> token_set and record the corresponding (hs, tk, ne_val, nn_val) per
shape — then iterate over that mapping when building/plotting series so each
plotted line uses its own exact kernel_data lookup; alternatively, if you do not
want multiple shapes per (op_name, qm, target_ep), detect when more than one
distinct shape_key exists and raise an error (fail fast) rather than merging
them.
- Line 1174: The notebook call visualize_trtllm_alltoall(database) is using the
global `database` initialized earlier as h200_sxm + 1.2.0rc5, so update the
sanity-check invocation to point at the GB200 dataset version introduced in this
PR (GB200 / 1.2.0rc6) instead of reusing `database`; locate the visualization
call `visualize_trtllm_alltoall(database)` and replace its argument with the
GB200/1.2.0rc6 dataset reference (or construct a local variable for that
dataset) so a full "Run All" exercises the new NVLinkOneSided data.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a37e89a5-dd6f-4bab-883e-4ceb26391451
📒 Files selected for processing (3)
src/aiconfigurator/sdk/perf_database.pysrc/aiconfigurator/systems/data/gb200/trtllm/1.2.0rc6/trtllm_alltoall_perf.txttools/sanity_check/validate_database.ipynb
Signed-off-by: wenpengw-nv <wenpengw@nvidia.com>
|
@coderabbitai review the recent changes and see whether previous issues are resolved |
🧩 Analysis chain🏁 Script executed: Length of output: 59430 🏁 Script executed: Length of output: 4420 🏁 Script executed: Length of output: 7025
The sanity check charts attached by the author also visually confirm the SOL model is producing sensible results against the GB200 NVLinkOneSided silicon measurements. The PR looks good to merge. ✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/sanity_check/validate_database.ipynb`:
- Around line 857-865: The subplot title and axis labels use values from the
wrong loop scope and mislabeled axes: update the set_title call in the MoE
plotting block (where ax[workload_distribution_id * 2, i].set_title is called)
to use the current loop's quant_mode-specific values (current topk and
num_experts variables) rather than the last iterated ones, and fix the xlabel
calls so the top row uses the correct x-axis variable name (use "m" when
plotting over m_list and "s" when plotting over s_list) instead of always
setting "s"; ensure any other axis label (e.g., ylabel for memory) remains
appropriate for the corresponding subplot.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 509e2f19-a023-4d9b-9084-f8023e0cf793
📒 Files selected for processing (3)
src/aiconfigurator/sdk/perf_database.pysrc/aiconfigurator/systems/data/gb200/trtllm/1.2.0rc6/trtllm_alltoall_perf.txttools/sanity_check/validate_database.ipynb
| " ax[workload_distribution_id * 2, i].set_title(\n", | ||
| " f\"{workload_distribution_title} \\ntopk={topk} e={num_experts} tp={tp} ep={ep}\"\n", | ||
| " )\n", | ||
| " ax[workload_distribution_id * 2, i].set_xlabel(\"s\")\n", | ||
| " ax[workload_distribution_id * 2, i].set_ylabel(\"math sol %\")\n", | ||
| " # ax[0,i].set_ylim(0,1)\n", | ||
| " ax[workload_distribution_id*2, i].legend()\n", | ||
| " ax[workload_distribution_id*2+1, i].set_xlabel(\"s\")\n", | ||
| " ax[workload_distribution_id*2+1, i].set_ylabel(\"mem sol %\")\n", | ||
| " ax[workload_distribution_id * 2, i].legend()\n", | ||
| " ax[workload_distribution_id * 2 + 1, i].set_xlabel(\"s\")\n", | ||
| " ax[workload_distribution_id * 2 + 1, i].set_ylabel(\"mem sol %\")\n", |
There was a problem hiding this comment.
Fix the MoE subplot metadata.
The title on Line 857 uses topk / num_experts from the last quant_mode iterated, so it can misdescribe mixed-series subplots, and Lines 860 and 864 label m_list as s.
💡 Suggested fix
- ax[workload_distribution_id * 2, i].set_title(
- f"{workload_distribution_title} \ntopk={topk} e={num_experts} tp={tp} ep={ep}"
- )
- ax[workload_distribution_id * 2, i].set_xlabel("s")
+ ax[workload_distribution_id * 2, i].set_title(
+ f"{workload_distribution_title}\ntp={tp} ep={ep}"
+ )
+ ax[workload_distribution_id * 2, i].set_xlabel("num_tokens")
ax[workload_distribution_id * 2, i].set_ylabel("math sol %")
# ax[0,i].set_ylim(0,1)
ax[workload_distribution_id * 2, i].legend()
- ax[workload_distribution_id * 2 + 1, i].set_xlabel("s")
+ ax[workload_distribution_id * 2 + 1, i].set_xlabel("num_tokens")
ax[workload_distribution_id * 2 + 1, i].set_ylabel("mem sol %")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/sanity_check/validate_database.ipynb` around lines 857 - 865, The
subplot title and axis labels use values from the wrong loop scope and
mislabeled axes: update the set_title call in the MoE plotting block (where
ax[workload_distribution_id * 2, i].set_title is called) to use the current
loop's quant_mode-specific values (current topk and num_experts variables)
rather than the last iterated ones, and fix the xlabel calls so the top row uses
the correct x-axis variable name (use "m" when plotting over m_list and "s" when
plotting over s_list) instead of always setting "s"; ensure any other axis label
(e.g., ylabel for memory) remains appropriate for the corresponding subplot.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!


Overview:
Fix the AlltoAll SOL model for TRT-LLM to correctly account for per-rank deduplication in dispatch and combine operations, add NVLinkOneSided collected data for GB200 on trtllm 1.2.0rc6 (ep8/ep16/ep32), and add a sanity check notebook for AlltoAll data validation.
Details:
Added NVLinkOneSided silicon measurements for GB200 on trtllm 1.2.0rc6, covering ep8, ep16, and ep32 configurations.
Added visualization and validation cells in the sanity check notebook to compare AlltoAll SOL predictions against silicon measurements
Summary by CodeRabbit
New Features
Bug Fixes
Chores