chore: Add data from auto-collector pipeline 46210829 (h100_sxm_sglang_0.5.9) by dynamo-ops · Pull Request #597 · ai-dynamo/aiconfigurator

dynamo-ops · 2026-03-16T06:38:01Z

Error Summary for Auto-Collector Run

Collection summary for h100_sxm sglang:0.5.9

Error summary

{
    "backend": "sglang",
    "version": "0.5.9",
    "timestamp": "2026-03-16T06:36:57.049936",
    "total_errors": 487,
    "errors_by_module": {
        "sglang.gemm": 486,
        "sglang.wideep_moe": 1
    },
    "errors_by_type": {
        "RuntimeError": 484,
        "OutOfMemoryError": 3
    }
}

Summary by CodeRabbit

Chores
- Added seven new performance benchmark data files for H100 SXM with sglang v0.5.9. Datasets cover context attention mechanisms, matrix multiplication operations, generation-phase attention processing, MLA-based operations, and mixture-of-experts configurations. These files enable comprehensive AI system performance evaluation, detailed technical analysis, and ongoing optimization tracking for improved system efficiency and resource utilization.

Signed-off-by: dynamo-ops <170655669+dynamo-ops@users.noreply.github.com>

copy-pr-bot · 2026-03-16T06:38:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-16T06:41:16Z

Walkthrough

Seven new Git LFS pointer files are added to track large binary performance metric assets in the H100 SXM SGLang 0.5.9 directory. Each pointer contains standard Git LFS metadata (version, oid, size) without introducing code or logic changes.

Changes

Cohort / File(s)	Summary
Git LFS Performance Data Pointers `src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/context_attention_perf.txt`, `context_mla_perf.txt`, `gemm_perf.txt`, `generation_attention_perf.txt`, `generation_mla_perf.txt`, `mla_bmm_perf.txt`, `moe_perf.txt`	Added seven Git LFS pointer files for tracking large binary performance metric assets. Each file contains standard LFS metadata (version, oid, size) with no code or logic modifications.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 Seven scrolls of data we now shall keep,
Performance metrics, stashed so deep,
Git LFS pointers, neat and small,
Tracking benchmarks for H100's call! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description lacks required template sections (Overview, Details, Where should reviewer start, Related Issues) and primarily contains error summary data instead of explaining the changes being merged.	Restructure the description to follow the template: add Overview section, explain what data is being added and why, specify which files to review, and note any related issues.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding performance data files from an automated collection pipeline run for H100 SXM with sglang 0.5.9.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/gemm_perf.txt`:
- Around line 1-3: Block ingestion of the gemm_perf.txt artifact until collector
errors are resolved: stop persisting
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/gemm_perf.txt when the run
metadata reports errors (currently 486 GEMM module errors out of 487); require
either a successful rerun with zero collector/GEMM errors or attach a validation
report that proves coverage and data integrity for gemm_perf.txt before allowing
ingestion; update the ingestion gating logic (the collector/GEMM validation
step) to check the run metadata error count and reject artifacts with non-zero
GEMM/collector errors, and surface a clear error message referencing the
offending gemm_perf.txt artifact when rejecting.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 730bf8a2-96d7-4519-9807-a0b6c976df22

📥 Commits

Reviewing files that changed from the base of the PR and between ae86e39 and 2d9299b.

📒 Files selected for processing (7)

src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/context_attention_perf.txt
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/context_mla_perf.txt
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/gemm_perf.txt
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/generation_attention_perf.txt
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/generation_mla_perf.txt
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/mla_bmm_perf.txt
src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/moe_perf.txt

coderabbitai · 2026-03-16T06:41:19Z

src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/gemm_perf.txt

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ef3d903bda0116a2dbb99c3394e7a5763301e6197dcfd6f5d3af33c55f64517
+size 9096207


⚠️ Potential issue | 🟠 Major

Block ingest of this GEMM artifact until collector errors are resolved.

The pointer itself is valid, but this PR’s run metadata reports 486 GEMM module errors (out of 487 total). Shipping gemm_perf.txt from that run risks persisting incomplete/corrupted performance data. Please gate this update on a successful rerun (or attach a validation report proving coverage/quality for this artifact).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/gemm_perf.txt` around lines 1 - 3, Block ingestion of the gemm_perf.txt artifact until collector errors are resolved: stop persisting src/aiconfigurator/systems/data/h100_sxm/sglang/0.5.9/gemm_perf.txt when the run metadata reports errors (currently 486 GEMM module errors out of 487); require either a successful rerun with zero collector/GEMM errors or attach a validation report that proves coverage and data integrity for gemm_perf.txt before allowing ingestion; update the ingestion gating logic (the collector/GEMM validation step) to check the run metadata error count and reject artifacts with non-zero GEMM/collector errors, and surface a clear error message referencing the offending gemm_perf.txt artifact when rejecting.

Arsene12358 · 2026-03-20T05:21:40Z

Do we actually expect most of the GEMM test cases to fail?

Add data collected from pipeline 46210829 (h100_sxm_sglang_0.5.9)

2d9299b

Signed-off-by: dynamo-ops <170655669+dynamo-ops@users.noreply.github.com>

dynamo-ops added the automated label Mar 16, 2026

dynamo-ops requested review from Arsene12358, Harrilee, YijiaZhao, ilyasher, jasonqinzhou, simone-chen and tianhaox as code owners March 16, 2026 06:38

github-actions bot added the chore label Mar 16, 2026

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Add data from auto-collector pipeline 46210829 (h100_sxm_sglang_0.5.9)#597

chore: Add data from auto-collector pipeline 46210829 (h100_sxm_sglang_0.5.9)#597
dynamo-ops wants to merge 1 commit intomainfrom
auto-data-collection-46210829-h100_sxm_sglang_0.5.9

dynamo-ops commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 16, 2026

Uh oh!

coderabbitai bot commented Mar 16, 2026

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

Arsene12358 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dynamo-ops commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Error Summary for Auto-Collector Run

Collection summary for h100_sxm sglang:0.5.9

Error summary

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 16, 2026

Uh oh!

coderabbitai bot commented Mar 16, 2026

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Arsene12358 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dynamo-ops commented Mar 16, 2026 •

edited by coderabbitai bot

Loading