Conversation
WalkthroughThree files updated to enhance performance data fallback handling and validation robustness: a fallback mechanism in perf_database.py loads TensorRT-LLM data for SGLang when unavailable, test_sanity_check.py switches from fixture-based to subprocess-parametrized validation, and validate_database.ipynb adds environment-driven configuration with comprehensive error handling. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable poems in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/e2e/tools/test_sanity_check.py (1)
48-54: Bound the notebook subprocess with a timeout.A notebook regression can hang this child process forever and block the whole e2e shard until the outer CI timeout. Adding a local timeout makes failures deterministic and much easier to diagnose.
Suggested fix
result = sp.run( [sys.executable, "-c", "import import_ipynb; import validate_database"], cwd=SANITY_CHECK_DIR, env=env, capture_output=True, text=True, + timeout=600, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/tools/test_sanity_check.py` around lines 48 - 54, The subprocess call that runs the notebook using sp.run (assigning to result) needs a bounded timeout to avoid hangs; add a timeout argument (e.g., timeout=60 or an appropriate constant) to the sp.run invocation when calling [sys.executable, "-c", "import import_ipynb; import validate_database"] with cwd=SANITY_CHECK_DIR and env=env, and wrap the call in a try/except for subprocess.TimeoutExpired to fail the test with a clear error message and capture/print any partial stdout/stderr for diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/aiconfigurator/sdk/perf_database.py`:
- Around line 2052-2068: The code builds trtllm_custom_allreduce_path under the
current systems_root even though get_latest_database_version(self.system,
"trtllm") may have discovered the version in a different configured root; update
the logic to use the actual database root where that TRTLLM version was found
(rather than always using systems_root). Concretely, modify
get_latest_database_version or add a helper (e.g.,
get_latest_database_root_and_version or find_database_root_for_version) so you
can obtain both the root and version, then construct
trtllm_custom_allreduce_path using that discovered root together with
self.system_spec["data_dir"], "trtllm", the returned version and
PerfDataFilename.custom_allreduce; then call load_custom_allreduce_data and wrap
into LoadedOpData as before so query_custom_allreduce() can succeed when TRTLLM
lives in a different configured systems path.
In `@tools/sanity_check/validate_database.ipynb`:
- Around line 857-888: The try/except around database.query_moe in the
validate_database notebook swallows all exceptions and treats failures as
successes; change it to only catch the specific PerfDataNotAvailableError (or
whatever sentinel the database uses) and re-raise any other Exception, or better
yet pre-filter invalid (moe_tp_size=tp, moe_ep_size=ep, quant_mode) combinations
before calling database.query_moe to avoid calling it for unsupported configs;
update the exception handler around the database.query_moe call to catch
PerfDataNotAvailableError and continue, but let other exceptions propagate (or
explicitly raise/log and fail the notebook) so real regressions are not masked.
---
Nitpick comments:
In `@tests/e2e/tools/test_sanity_check.py`:
- Around line 48-54: The subprocess call that runs the notebook using sp.run
(assigning to result) needs a bounded timeout to avoid hangs; add a timeout
argument (e.g., timeout=60 or an appropriate constant) to the sp.run invocation
when calling [sys.executable, "-c", "import import_ipynb; import
validate_database"] with cwd=SANITY_CHECK_DIR and env=env, and wrap the call in
a try/except for subprocess.TimeoutExpired to fail the test with a clear error
message and capture/print any partial stdout/stderr for diagnostics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 15356501-4712-4e9d-8403-0948a0befb68
📒 Files selected for processing (3)
src/aiconfigurator/sdk/perf_database.pytests/e2e/tools/test_sanity_check.pytools/sanity_check/validate_database.ipynb
| " try:\n", | ||
| " db_time = database.query_moe(\n", | ||
| " num_tokens=m,\n", | ||
| " hidden_size=hidden_size,\n", | ||
| " inter_size=inter_size,\n", | ||
| " topk=topk,\n", | ||
| " num_experts=num_experts,\n", | ||
| " moe_tp_size=tp,\n", | ||
| " moe_ep_size=ep,\n", | ||
| " quant_mode=quant_mode,\n", | ||
| " workload_distribution=workload_distribution,\n", | ||
| " database_mode=DatabaseMode.SILICON,\n", | ||
| " )\n", | ||
| " except Exception as e:\n", | ||
| " print(f\"Error querying moe: {e}\")\n", | ||
| " break\n", | ||
| " percentage_of_math = sol_math / db_time\n", | ||
| " percentage_of_mem = sol_mem / db_time\n", | ||
| " sol_math_list.append(percentage_of_math)\n", | ||
| " sol_mem_list.append(percentage_of_mem)\n", | ||
| "\n", | ||
| " ax[workload_distribution_id*2, i].plot(\n", | ||
| " m_list, sol_math_list, color=color_list[color_id], label=f\"{quant_mode} math\"\n", | ||
| " )\n", | ||
| " ax[workload_distribution_id*2+1, i].plot(\n", | ||
| " m_list,\n", | ||
| " sol_mem_list,\n", | ||
| " color=color_list[color_id],\n", | ||
| " linestyle=\"--\",\n", | ||
| " label=f\"{quant_mode} mem\",\n", | ||
| " )\n", | ||
| " if len(m_list) == len(sol_math_list) and len(m_list) == len(sol_mem_list):\n", | ||
| " ax[workload_distribution_id*2, i].plot(\n", | ||
| " m_list, sol_math_list, color=color_list[color_id], label=f\"{quant_mode} math\"\n", | ||
| " )\n", | ||
| " ax[workload_distribution_id*2+1, i].plot(\n", | ||
| " m_list,\n", | ||
| " sol_mem_list,\n", | ||
| " color=color_list[color_id],\n", | ||
| " linestyle=\"--\",\n", | ||
| " label=f\"{quant_mode} mem\",\n", | ||
| " )\n", |
There was a problem hiding this comment.
Don’t turn arbitrary query_moe() failures into a passing sanity check.
This notebook is now part of the e2e gate, but the bare except Exception only prints and skips plotting. That masks real regressions in query_moe() and still lets the notebook pass. If the intent is to skip unsupported cases, catch PerfDataNotAvailableError specifically or pre-filter invalid (tp, ep, quant_mode) combinations before the query.
Suggested fix
- except Exception as e:
- print(f"Error querying moe: {e}")
+ except PerfDataNotAvailableError as e:
+ print(f"Skipping unsupported MoE case: {e}")
break📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| " try:\n", | |
| " db_time = database.query_moe(\n", | |
| " num_tokens=m,\n", | |
| " hidden_size=hidden_size,\n", | |
| " inter_size=inter_size,\n", | |
| " topk=topk,\n", | |
| " num_experts=num_experts,\n", | |
| " moe_tp_size=tp,\n", | |
| " moe_ep_size=ep,\n", | |
| " quant_mode=quant_mode,\n", | |
| " workload_distribution=workload_distribution,\n", | |
| " database_mode=DatabaseMode.SILICON,\n", | |
| " )\n", | |
| " except Exception as e:\n", | |
| " print(f\"Error querying moe: {e}\")\n", | |
| " break\n", | |
| " percentage_of_math = sol_math / db_time\n", | |
| " percentage_of_mem = sol_mem / db_time\n", | |
| " sol_math_list.append(percentage_of_math)\n", | |
| " sol_mem_list.append(percentage_of_mem)\n", | |
| "\n", | |
| " ax[workload_distribution_id*2, i].plot(\n", | |
| " m_list, sol_math_list, color=color_list[color_id], label=f\"{quant_mode} math\"\n", | |
| " )\n", | |
| " ax[workload_distribution_id*2+1, i].plot(\n", | |
| " m_list,\n", | |
| " sol_mem_list,\n", | |
| " color=color_list[color_id],\n", | |
| " linestyle=\"--\",\n", | |
| " label=f\"{quant_mode} mem\",\n", | |
| " )\n", | |
| " if len(m_list) == len(sol_math_list) and len(m_list) == len(sol_mem_list):\n", | |
| " ax[workload_distribution_id*2, i].plot(\n", | |
| " m_list, sol_math_list, color=color_list[color_id], label=f\"{quant_mode} math\"\n", | |
| " )\n", | |
| " ax[workload_distribution_id*2+1, i].plot(\n", | |
| " m_list,\n", | |
| " sol_mem_list,\n", | |
| " color=color_list[color_id],\n", | |
| " linestyle=\"--\",\n", | |
| " label=f\"{quant_mode} mem\",\n", | |
| " )\n", | |
| " try:\n", | |
| " db_time = database.query_moe(\n", | |
| " num_tokens=m,\n", | |
| " hidden_size=hidden_size,\n", | |
| " inter_size=inter_size,\n", | |
| " topk=topk,\n", | |
| " num_experts=num_experts,\n", | |
| " moe_tp_size=tp,\n", | |
| " moe_ep_size=ep,\n", | |
| " quant_mode=quant_mode,\n", | |
| " workload_distribution=workload_distribution,\n", | |
| " database_mode=DatabaseMode.SILICON,\n", | |
| " )\n", | |
| " except PerfDataNotAvailableError as e:\n", | |
| " print(f\"Skipping unsupported MoE case: {e}\")\n", | |
| " break\n", | |
| " percentage_of_math = sol_math / db_time\n", | |
| " percentage_of_mem = sol_mem / db_time\n", | |
| " sol_math_list.append(percentage_of_math)\n", | |
| " sol_mem_list.append(percentage_of_mem)\n", | |
| "\n", | |
| " if len(m_list) == len(sol_math_list) and len(m_list) == len(sol_mem_list):\n", | |
| " ax[workload_distribution_id*2, i].plot(\n", | |
| " m_list, sol_math_list, color=color_list[color_id], label=f\"{quant_mode} math\"\n", | |
| " )\n", | |
| " ax[workload_distribution_id*2+1, i].plot(\n", | |
| " m_list,\n", | |
| " sol_mem_list,\n", | |
| " color=color_list[color_id],\n", | |
| " linestyle=\"--\",\n", | |
| " label=f\"{quant_mode} mem\",\n", | |
| " )\n", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/sanity_check/validate_database.ipynb` around lines 857 - 888, The
try/except around database.query_moe in the validate_database notebook swallows
all exceptions and treats failures as successes; change it to only catch the
specific PerfDataNotAvailableError (or whatever sentinel the database uses) and
re-raise any other Exception, or better yet pre-filter invalid (moe_tp_size=tp,
moe_ep_size=ep, quant_mode) combinations before calling database.query_moe to
avoid calling it for unsupported configs; update the exception handler around
the database.query_moe call to catch PerfDataNotAvailableError and continue, but
let other exceptions propagate (or explicitly raise/log and fail the notebook)
so real regressions are not masked.
There was a problem hiding this comment.
That's true but I can't fix query_moe right now. I think it's better to still generate the rest of the charts even if some combinations fail for moe.
There was a problem hiding this comment.
@ilyasher, understood — that's a reasonable trade-off for a sanity-check notebook where partial output is better than no output.
One lightweight improvement that doesn't require fixing query_moe: you could log the exception type alongside the message so unexpected errors (e.g. AttributeError, KeyError) are still clearly visible in CI output and distinguishable from expected data-availability gaps:
except Exception as e:
print(f"Skipping MoE case (tp={tp}, ep={ep}, quant_mode={quant_mode}): [{type(e).__name__}] {e}")
breakThat way, if a real regression surfaces, it won't be silently swallowed — the error class will appear in the logs and someone can act on it. The existing # TODO: fix query_moe for all combos and remove try/except comment already signals that this is temporary, which is good.
( ˶ˆᗜˆ˵ )
7bf61d9 to
e4654a8
Compare
Signed-off-by: Ilya Sherstyuk <isherstyuk@nvidia.com>
Signed-off-by: Ilya Sherstyuk <isherstyuk@nvidia.com>
c413194 to
4cd27d3
Compare
Overview:
validate_database.ipynbsanity check notebook works with every system+backend combinationvalidate_database.ipynbso that it works with every system+backend comboa. Add
unsqueeze=Falseto fix matplotlib plot IndexError when axis size is 1b. Small fixes for moe sanity check
c. Wrap MLA in try/except since it's not needed for many models
INCOMPLETEis in the database dir for a certain backend version, that version will be ignored.Summary by CodeRabbit
New Features
Improvements