Skip to content

[llmd] Keep working#902

Merged
kpouget merged 18 commits intoopenshift-psap:mainfrom
kpouget:llmd
Feb 27, 2026
Merged

[llmd] Keep working#902
kpouget merged 18 commits intoopenshift-psap:mainfrom
kpouget:llmd

Conversation

@kpouget
Copy link
Copy Markdown
Contributor

@kpouget kpouget commented Feb 25, 2026

Summary by CodeRabbit

  • New Features

    • Prometheus reports: Resource Usage, GPU Performance, System Health
    • VLLM Metrics Analysis dashboard and registration
    • Per-flavor LLM test flows with isolated artifacts, reshaped deployments, dedicated cleanup
    • Pull-secret readiness validation (optional registry access test)
    • Benchmarks now accept endpoint URLs directly
  • Improvements

    • Extended pod readiness timeouts
    • Safer cluster-info handling and clearer "no data" messages
  • Chores

    • Collected logs moved to artifacts subdirectory
    • Configuration and docs updated to use endpoint_url parameters

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Feb 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ashishkamra for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Prometheus and VLLM visualizations, validates pull-secret readiness during RHOAI preparation, introduces multi‑flavor per‑flavor LLM inference test flows and cleanup, redirects cluster preload artifacts into an artifacts subdirectory, and applies defensive/string fixes across plotting and store code.

Changes

Cohort / File(s) Summary
Cluster preload tasks
projects/cluster/toolbox/cluster_preload_image/tasks/main.yml
Create {{ artifact_extra_logs_dir }}/artifacts, redirect describe/status/yaml outputs into it, and increase pod readiness retries 90 → 135.
Pull-secret validation / test prep
projects/llm-d/testing/prepare_llmd.py
Add wait_for_pull_secret_ready(registry,test_image,timeout) to verify pull-secret JSON/auth, MCP rollout, and optional image pull; invoked from prepare_rhoai(); rename parallel label.
LLM‑D tests (flavors & cleanup)
projects/llm-d/testing/test_llmd.py
Add multi‑flavor orchestration with per‑flavor artifact dirs, cleanup_llm_inference_resources(), reshape_isvc(), endpoint resolution, per‑flavor benchmarks, simple test, and per‑flavor state capture; update function signatures to accept flavor/endpoint/namespace.
Visualization registration & config
projects/llm-d/visualizations/llmd_inference/data/plots.yaml, .../plotting/__init__.py, .../plotting/report.py
Append four reports to plots.yaml and register new prometheus and vllm_metrics modules/reports during plotting init.
Prometheus plotting & reports
projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py, .../prometheus_reports.py
Add Prometheus plot registration and three report classes (Resource Usage, GPU Performance, System Health) and register them with stats registry.
VLLM metrics plotting & report
projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py
Add VLLM plot registrations, VLLMMetricsReport and register()/register_report() to build multi‑section HTML reports from Prometheus metrics.
Store / parsers changes
projects/llm-d/visualizations/llmd_inference/store/parsers.py
Rename main metrics key "main""sutest", add SUTEST_CONTAINER_LABELS and extra metric names, and make get_llmd_main_metrics(register=False) delegate to helper_prom_store.get_cluster_metrics(..., cluster_role="sutest").
Toolbox endpoint param rename & tasks
projects/llm-d/toolbox/llmd.py, projects/llm-d/toolbox/llmd_run_*_benchmark/..., projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml
Rename benchmark APIs/config vars from llmisvc_nameendpoint_url; remove oc-based dynamic URL lookup in benchmark tasks; add task to set ISVC metadata name/namespace before deployment.
Docs update
docs/toolbox.generated/Llmd.run_*_benchmark.rst
Update parameter docs to use endpoint_url instead of llmisvc_name.
Matrix benchmarking small fixes
projects/matrix_benchmarking/visualizations/helpers/plotting/prom/__init__.py, .../cpu_memory.py, .../store/prom.py
Improve no-data messages to include metrics context and guard entry.results.cluster_info access with getattr.
Config & presets
projects/llm-d/testing/config.yaml
Add multi‑flavor CI preset, flavors list, set cluster nodes count to 2, and enable simple test flag for inference_service.
ISVC resource metadata
projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
Set metadata.name/namespace to SET_AT_RUNTIME and remove two annotations.
Submodule pointer
projects/matrix_benchmarking/subproject
Update submodule commit reference to a4954a2f46a0914a80a2eb3f6382aae678f50b58.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,255,200,0.5)
    actor PrepUser as Test Preparation
    participant PrepFlow as prepare_rhoai()
    participant Secret as K8s Secret
    participant MCP as MachineConfigPool
    participant Worker as Worker Node
    participant Registry as Container Registry
    end

    PrepUser->>PrepFlow: start prepare_rhoai()
    PrepFlow->>Secret: apply pull-secret
    PrepFlow->>Secret: wait_for_pull_secret_ready(registry, test_image)
    Secret->>Secret: verify registry entry & auth JSON
    Secret->>MCP: verify MCP propagation/rollout
    alt test_image provided
        PrepFlow->>Worker: run test container to pull image
        Worker->>Registry: attempt authenticated pull
        Registry-->>Worker: allow/deny
    end
    Secret-->>PrepFlow: checks passed
    PrepFlow->>PrepUser: continue test preparation
Loading
sequenceDiagram
    rect rgba(200,200,255,0.5)
    actor Tester as User
    participant TestFlow as test_llmd.py
    participant K8sAPI as Kubernetes API
    participant Prom as Prometheus
    participant Parser as parsers.get_llmd_main_metrics
    participant PlotReg as Plotting Registry
    end

    Tester->>TestFlow: start per-flavor tests()
    TestFlow->>K8sAPI: cleanup_llm_inference_resources()
    K8sAPI->>K8sAPI: delete llminferenceservice resources
    loop Poll (retries)
        TestFlow->>K8sAPI: check for remaining resources
    end
    TestFlow->>Prom: reset/collect metrics
    TestFlow->>Parser: get_llmd_main_metrics(register=True)
    Parser->>Prom: query metrics (cluster_role="sutest")
    Prom-->>Parser: metrics data
    Parser->>PlotReg: register plots & reports
    PlotReg->>Prom: render reports (Prometheus/VLLM)
    Prom-->>Tester: deliver HTML visualization
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

lgtm

Poem

🐰
I hopped through logs and metrics bright,
I nudged pull-secrets into the light,
I reshaped ISVCs and chased stale pods,
Plotted VLLM peaks and data clods —
A rabbit’s little testing bite! 🥕📈

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "[llmd] Keep working" is vague and does not convey meaningful information about the substantial changes made across multiple modules and test orchestration systems. Provide a more descriptive title that captures the main objective, such as "[llmd] Add multi-flavor LLM inference test orchestration with Prometheus metrics visualization" or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented Feb 25, 2026

/test jump-ci llm-d scale_up
/cluster kpouget-ibm

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
projects/matrix_benchmarking/visualizations/helpers/store/prom.py (1)

315-323: Guard control_plane as well to fully avoid attribute crashes.

The new check protects missing cluster_info, but both lines still dereference .control_plane unguarded. If cluster_info is present but incomplete, this still raises AttributeError.

Proposed hardening + dedup
+    def _control_plane_nodes(entry):
+        cluster_info = getattr(entry.results, "cluster_info", None)
+        control_plane = getattr(cluster_info, "control_plane", []) if cluster_info else []
+        return [node.name for node in control_plane]
+
     def only_control_plane_nodes(entry, metrics):
-
-        control_plane_nodes = [node.name for node in (entry.results.cluster_info.control_plane if getattr(entry.results, "cluster_info", []) else [])]
+        control_plane_nodes = _control_plane_nodes(entry)
@@
     def no_control_plane_nodes(entry, metrics):
-        control_plane_nodes = [node.name for node in (entry.results.cluster_info.control_plane if getattr(entry.results, "cluster_info", []) else [])]
+        control_plane_nodes = _control_plane_nodes(entry)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/matrix_benchmarking/visualizations/helpers/store/prom.py` around
lines 315 - 323, The code assumes entry.results.cluster_info.control_plane
exists; change both places (the list comprehensions inside the generator and
inside no_control_plane_nodes) to first fetch cluster_info =
getattr(entry.results, "cluster_info", None) and then control_plane =
getattr(cluster_info, "control_plane", []) (or use hasattr(cluster_info,
"control_plane") ? []), and build control_plane_nodes = [node.name for node in
control_plane]; optionally factor this into a small helper function like
get_control_plane_nodes(entry) and call it from both the generator and
no_control_plane_nodes to avoid duplication.
projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py (1)

189-199: Dead code behind if False: — consider using a TODO or removing.

This API Server Request Latency section is permanently unreachable. If it's intentionally deferred, a # TODO: comment explaining why (e.g., metrics not yet available) would improve clarity and make it discoverable for future cleanup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`
around lines 189 - 199, The API Server Request Latency block is dead code
because it's guarded by if False; either remove this unreachable block entirely
or make it intentionally conditional (e.g., driven by a feature flag/config or a
constant) and add a clear TODO explaining why it's disabled; locate the block
that builds header using html.H3/html.P and calls report.Plot_and_Text (the
lines inside the if False) and replace the hard false guard with a named flag or
delete the block and add a // TODO: comment mentioning why the plots (Prom:
Sutest GET/PUT/LIST/PATCH Requests duration) are omitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/llm-d/testing/prepare_llmd.py`:
- Around line 125-132: The MCP check currently assumes result.stdout is valid;
update the logic in the loop that calls run.run (the variable result) so you
first verify the command succeeded and result.stdout is non-empty (e.g., check
result.exit_code/returncode or result.stdout truthiness) before interpreting
values; if the command failed or stdout is empty, log a warning/error and
sleep/retry instead of setting machine_config_updated, and only set
machine_config_updated = True when the command succeeded and the parsed status
indicates update completion (no "False" and/or contains "True").
- Around line 89-104: The loop currently only fetches secret_content once
(guarded by the locals() check), causing stale data to be re-parsed across
retries; update the block around secret_json_valid/secret_content so that you
always re-run the oc get secret command each iteration before attempting
json.loads (i.e., remove the if not 'secret_content' in locals() guard), parse
into secret_data, set secret_json_valid when parsing succeeds, and keep checking
registry in secret_data["auths"] so each retry uses fresh secret_content and can
observe updates.

In `@projects/llm-d/testing/test_llmd.py`:
- Around line 304-321: The loop currently treats a non-zero result.returncode as
a successful cleanup; change the logic so non-zero returncode is considered a
failed oc call and does not break the wait loop. Specifically, in the block
using run.run(...) (variable result) and the loop that computes
remaining_services/remaining_count, only break when result.returncode == 0 AND
result.stdout.strip() is empty; if result.returncode != 0, log the error
(include result.returncode and result.stderr/stdout) and continue retrying (and
on the last iteration raise a RuntimeError indicating the oc command failed),
and keep the existing behavior for the case where result.returncode == 0 but
resources still exist.

In `@projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py`:
- Around line 25-42: The Plot definitions in vllm_metrics.py currently embed
PromQL expressions (e.g., the dict value
"rate(vllm_e2e_latency_seconds_count[5m])" for the "vllm_e2e_request_rate" plot)
which are not used by the parser pipeline and conflict with the canonical
queries in parsers.py; update the plotting_prom.Plot calls (the two Plot
invocations that reference "vllm_e2e_latency_average" and
"vllm_e2e_request_rate") to either remove the dict values and pass plain metric
names only, or replace the dict values with the exact canonical query strings
from parsers.py (e.g., the kserve_vllm:e2e_request_latency_seconds_count rate
form) so they match get_uwm_metrics/parsers.py; ensure no PromQL expression
remains in the Plot dict unless it exactly matches the parser canonical query.

In `@projects/llm-d/visualizations/llmd_inference/store/parsers.py`:
- Around line 294-317: SUTEST_CONTAINER_EXTRA_METRICS_NAMES currently doesn't
match the keys used in SUTEST_CONTAINER_LABELS which _get_network_usage_metrics
uses to filter network metrics; update SUTEST_CONTAINER_EXTRA_METRICS_NAMES to
include the same container label keys (e.g., "LLM Inference Service" and "LLM
Inference Gateway") so that helper_prom_store.get_cluster_metrics called from
get_llmd_main_metrics will not skip network metrics when
_get_network_usage_metrics checks if plot_name is in network_metrics_names.

---

Nitpick comments:
In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`:
- Around line 189-199: The API Server Request Latency block is dead code because
it's guarded by if False; either remove this unreachable block entirely or make
it intentionally conditional (e.g., driven by a feature flag/config or a
constant) and add a clear TODO explaining why it's disabled; locate the block
that builds header using html.H3/html.P and calls report.Plot_and_Text (the
lines inside the if False) and replace the hard false guard with a named flag or
delete the block and add a // TODO: comment mentioning why the plots (Prom:
Sutest GET/PUT/LIST/PATCH Requests duration) are omitted.

In `@projects/matrix_benchmarking/visualizations/helpers/store/prom.py`:
- Around line 315-323: The code assumes entry.results.cluster_info.control_plane
exists; change both places (the list comprehensions inside the generator and
inside no_control_plane_nodes) to first fetch cluster_info =
getattr(entry.results, "cluster_info", None) and then control_plane =
getattr(cluster_info, "control_plane", []) (or use hasattr(cluster_info,
"control_plane") ? []), and build control_plane_nodes = [node.name for node in
control_plane]; optionally factor this into a small helper function like
get_control_plane_nodes(entry) and call it from both the generator and
no_control_plane_nodes to avoid duplication.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 790e86b and 3b366ba.

📒 Files selected for processing (14)
  • projects/cluster/toolbox/cluster_preload_image/tasks/main.yml
  • projects/llm-d/testing/prepare_llmd.py
  • projects/llm-d/testing/test_llmd.py
  • projects/llm-d/visualizations/llmd_inference/data/plots.yaml
  • projects/llm-d/visualizations/llmd_inference/plotting/__init__.py
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py
  • projects/llm-d/visualizations/llmd_inference/plotting/report.py
  • projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py
  • projects/llm-d/visualizations/llmd_inference/store/parsers.py
  • projects/matrix_benchmarking/subproject
  • projects/matrix_benchmarking/visualizations/helpers/plotting/prom/__init__.py
  • projects/matrix_benchmarking/visualizations/helpers/plotting/prom/cpu_memory.py
  • projects/matrix_benchmarking/visualizations/helpers/store/prom.py

@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 25, 2026

🟢 Test of 'llm-d test test_ci' succeeded after 00 hours 22 minutes 20 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_0: jump-ci
PR_POSITIONAL_ARG_1: scale_up

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
projects/llm-d/testing/prepare_llmd.py (1)

124-131: ⚠️ Potential issue | 🟠 Major

Don’t mark MCP updated on empty/ambiguous status output

Current logic sets machine_config_updated = True whenever "False" is absent. Empty or malformed output can incorrectly pass this gate.

🔧 Proposed fix
             result = run.run("oc get mcp -o jsonpath='{.items[*].status.conditions[?(@.type==\"Updated\")].status}'", capture_stdout=True)
-            if "False" in result.stdout:
+            statuses = result.stdout.strip().split()
+            if not statuses:
+                logging.warning("MCP Updated status output is empty, waiting...")
+                time.sleep(15)
+                continue
+            if "False" in statuses:
                 logging.info("Machine config pools still updating, waiting...")
                 time.sleep(30)
                 continue
-            else:
+            elif "True" in statuses:
                 logging.info("✓ Machine config pools updated")
                 machine_config_updated = True
+            else:
+                logging.warning(f"Unexpected MCP Updated status output: {result.stdout!r}, waiting...")
+                time.sleep(15)
+                continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/prepare_llmd.py` around lines 124 - 131, The current
check treats any non-"False" output as success which allows empty or malformed
output to pass; change the logic that reads result = run.run(...) to validate
result.stdout more robustly by stripping and parsing the output tokens
(result.stdout.strip()), ensure it's non-empty and contains at least one
explicit "True" (or that every token is a known status like "True"/"False" and
none are "False") before setting machine_config_updated = True, and otherwise
log the ambiguous output and continue sleeping/retrying; keep references to the
same run.run call, result.stdout, and the machine_config_updated assignment so
the change is localized.
🧹 Nitpick comments (4)
projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py (2)

266-267: Consider prefixing unused variables with underscore.

The unpacked variables ordered_vars, variables, and cfg are not used in this method. While this follows the interface contract, prefixing with underscores signals intent and silences linter warnings.

♻️ Proposed fix
-        ordered_vars, settings, setting_lists, variables, cfg = args
+        _ordered_vars, settings, setting_lists, _variables, _cfg = args
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py` around
lines 266 - 267, The unpacked variables ordered_vars, variables, and cfg in the
function where you call "entries = list(common.Matrix.all_records(settings,
setting_lists))" are unused; rename them to _ordered_vars, _variables, and _cfg
when unpacking args to indicate intentional unused variables and silence linters
(leave settings and setting_lists as-is so the function contract and behavior
remain unchanged).

34-42: Dict format in Plot() is inconsistent with other calls.

Based on learnings, Plot() expects a list of metric names, and dict values are ignored during metric extraction. Line 36 uses [{"vllm_e2e_request_rate": "rate(vllm_e2e_latency_seconds_count[5m])"}] while all other Plot() calls in this file use plain string lists like ["vllm_e2e_latency_average"].

For consistency and to avoid confusion, use the plain string format:

♻️ Proposed fix
     # E2E Request Rate - Requests per second
     plotting_prom.Plot(
-        [{"vllm_e2e_request_rate": "rate(vllm_e2e_latency_seconds_count[5m])"}],
+        ["vllm_e2e_request_rate"],
         "Prom: VLLM E2E Request Rate",
         title="VLLM End-to-End Request Rate",
         y_title="Requests/sec",
         get_metrics=get_uwm_metrics,
         as_timestamp=True
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py` around
lines 34 - 42, The Plot call is using a dict instead of the expected list of
metric name/expression strings; replace the list [{"vllm_e2e_request_rate":
"rate(vllm_e2e_latency_seconds_count[5m])"}] with a plain string list containing
the expression, e.g. ["rate(vllm_e2e_latency_seconds_count[5m])"], so the
Plot(...) invocation (referenced by Plot, vllm_e2e_request_rate and
get_uwm_metrics) matches the other calls and the metric extractor will pick up
the expression correctly.
projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py (2)

30-30: Consider prefixing unused variables with underscore.

The unpacked variables ordered_vars, variables, and cfg are unused in all three do_plot methods (lines 30, 88, 158). This is the same pattern as in vllm_metrics.py. Prefixing with underscores signals intent and silences linter warnings.

♻️ Proposed fix for all three methods
# Line 30:
-        ordered_vars, settings, setting_lists, variables, cfg = args
+        _ordered_vars, settings, setting_lists, _variables, _cfg = args

# Line 88:
-        ordered_vars, settings, setting_lists, variables, cfg = args
+        _ordered_vars, settings, setting_lists, _variables, _cfg = args

# Line 158:
-        ordered_vars, settings, setting_lists, variables, cfg = args
+        _ordered_vars, settings, setting_lists, _variables, _cfg = args
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`
at line 30, In the three do_plot methods in prometheus_reports.py, the unpacked
variables ordered_vars, variables, and cfg are unused; rename them to
_ordered_vars, _variables, and _cfg (or prefix with underscores) in the
unpacking line (where args are unpacked) to signal they are intentionally unused
and silence linters—apply the same change for each do_plot variant referenced
(match the pattern used in vllm_metrics.py) so only settings and setting_lists
remain as the used names.

189-199: Dead code block with if False:.

This section is unreachable and appears to be placeholder code for API Server Request Latency plots. Consider either removing it or converting to a TODO comment if this is planned future work.

♻️ Option 1: Remove dead code
         header.append(html.Br())

-        if False:
-            # API Server Request Latency Section
-            header.append(html.H3("⏱️ API Server Request Latency"))
-            header.append(html.P("API request duration analysis by operation type"))
-
-            header += report.Plot_and_Text("Prom: Sutest GET Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest PUT Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest LIST Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest PATCH Requests duration", args)
-
-            header.append(html.Br())
-
         # ETCD Performance Section
♻️ Option 2: Convert to TODO if planned
         header.append(html.Br())

-        if False:
-            # API Server Request Latency Section
-            header.append(html.H3("⏱️ API Server Request Latency"))
-            header.append(html.P("API request duration analysis by operation type"))
-
-            header += report.Plot_and_Text("Prom: Sutest GET Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest PUT Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest LIST Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest PATCH Requests duration", args)
-
-            header.append(html.Br())
+        # TODO: Add API Server Request Latency section once duration metrics are available
+        # Plots: GET/PUT/LIST/PATCH Requests duration

         # ETCD Performance Section
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`
around lines 189 - 199, Remove the unreachable dead code guarded by "if False:"
that appends the API Server Request Latency section (the block that references
header, html.H3, html.P, report.Plot_and_Text and html.Br()); either delete the
entire block or replace it with a single TODO comment referencing the intended
plots (e.g., "TODO: add API Server Request Latency plots using
report.Plot_and_Text for Sutest GET/PUT/LIST/PATCH") so the codebase has no dead
branches and the intended future work is documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/llm-d/testing/prepare_llmd.py`:
- Line 63: There's a duplicate shadowing import of time in prepare_llmd.py
(module already imports time at top); remove the redundant "import time" that
triggers Ruff F811 or, if a different module was intended, replace it with the
correct module name or an alias. Locate the second import (the one at line
referenced in the review) and delete it so only the module-level import remains,
or change it to a unique identifier if truly needed.

In `@projects/llm-d/testing/test_llmd.py`:
- Around line 305-314: The cleanup verification uses "oc get llminferenceservice
--no-headers" and checks result.stdout, which can be unreliable; update the
run.run invocation in the test (the call that assigns to result) to use a
resource-stable output format (e.g., add "-o name") and then compute
remaining_services and remaining_count by splitting non-empty lines of
result.stdout so the test accurately counts resources (adjust the variables
result, remaining_services, remaining_count and the logging message
accordingly).

---

Duplicate comments:
In `@projects/llm-d/testing/prepare_llmd.py`:
- Around line 124-131: The current check treats any non-"False" output as
success which allows empty or malformed output to pass; change the logic that
reads result = run.run(...) to validate result.stdout more robustly by stripping
and parsing the output tokens (result.stdout.strip()), ensure it's non-empty and
contains at least one explicit "True" (or that every token is a known status
like "True"/"False" and none are "False") before setting machine_config_updated
= True, and otherwise log the ambiguous output and continue sleeping/retrying;
keep references to the same run.run call, result.stdout, and the
machine_config_updated assignment so the change is localized.

---

Nitpick comments:
In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`:
- Line 30: In the three do_plot methods in prometheus_reports.py, the unpacked
variables ordered_vars, variables, and cfg are unused; rename them to
_ordered_vars, _variables, and _cfg (or prefix with underscores) in the
unpacking line (where args are unpacked) to signal they are intentionally unused
and silence linters—apply the same change for each do_plot variant referenced
(match the pattern used in vllm_metrics.py) so only settings and setting_lists
remain as the used names.
- Around line 189-199: Remove the unreachable dead code guarded by "if False:"
that appends the API Server Request Latency section (the block that references
header, html.H3, html.P, report.Plot_and_Text and html.Br()); either delete the
entire block or replace it with a single TODO comment referencing the intended
plots (e.g., "TODO: add API Server Request Latency plots using
report.Plot_and_Text for Sutest GET/PUT/LIST/PATCH") so the codebase has no dead
branches and the intended future work is documented.

In `@projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py`:
- Around line 266-267: The unpacked variables ordered_vars, variables, and cfg
in the function where you call "entries =
list(common.Matrix.all_records(settings, setting_lists))" are unused; rename
them to _ordered_vars, _variables, and _cfg when unpacking args to indicate
intentional unused variables and silence linters (leave settings and
setting_lists as-is so the function contract and behavior remain unchanged).
- Around line 34-42: The Plot call is using a dict instead of the expected list
of metric name/expression strings; replace the list [{"vllm_e2e_request_rate":
"rate(vllm_e2e_latency_seconds_count[5m])"}] with a plain string list containing
the expression, e.g. ["rate(vllm_e2e_latency_seconds_count[5m])"], so the
Plot(...) invocation (referenced by Plot, vllm_e2e_request_rate and
get_uwm_metrics) matches the other calls and the metric extractor will pick up
the expression correctly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b366ba and b6d67f5.

📒 Files selected for processing (13)
  • projects/llm-d/testing/prepare_llmd.py
  • projects/llm-d/testing/test_llmd.py
  • projects/llm-d/visualizations/llmd_inference/data/plots.yaml
  • projects/llm-d/visualizations/llmd_inference/plotting/__init__.py
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py
  • projects/llm-d/visualizations/llmd_inference/plotting/report.py
  • projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py
  • projects/llm-d/visualizations/llmd_inference/store/parsers.py
  • projects/matrix_benchmarking/subproject
  • projects/matrix_benchmarking/visualizations/helpers/plotting/prom/__init__.py
  • projects/matrix_benchmarking/visualizations/helpers/plotting/prom/cpu_memory.py
  • projects/matrix_benchmarking/visualizations/helpers/store/prom.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • projects/llm-d/visualizations/llmd_inference/plotting/report.py
  • projects/matrix_benchmarking/visualizations/helpers/store/prom.py
  • projects/matrix_benchmarking/visualizations/helpers/plotting/prom/cpu_memory.py
  • projects/llm-d/visualizations/llmd_inference/data/plots.yaml
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py
  • projects/matrix_benchmarking/subproject

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented Feb 26, 2026

/test jump-ci llm-d scale_up
/cluster kpouget-ibm
/only prepare_ci

@kpouget kpouget force-pushed the llmd branch 2 times, most recently from 5ca3417 to 124907d Compare February 26, 2026 17:12
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
projects/llm-d/testing/test_llmd.py (1)

423-433: ⚠️ Potential issue | 🟠 Major

oc get --no-headers is not a stable emptiness check for cleanup verification.

When no resources exist, output can still be non-empty (“No resources found ...”), which can falsely fail cleanup.

🔧 Proposed fix
-        result = run.run(f"oc get llminferenceservice -n {namespace} --no-headers",
+        result = run.run(f"oc get llminferenceservice -n {namespace} -o name",
                        capture_stdout=True)

-        if not result.stdout.strip():
+        remaining_services = [s for s in result.stdout.splitlines() if s.strip()]
+        if not remaining_services:
             logging.info("No llminferenceservice resources found - cleanup successful")
             break
         else:
-            remaining_services = result.stdout.strip().split('\n')
-            remaining_count = len([s for s in remaining_services if s.strip()])
+            remaining_count = len(remaining_services)
             logging.info(f"Still found {remaining_count} llminferenceservice resources, waiting...")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 423 - 433, The emptiness
check using result.stdout.strip() is unreliable because "oc get ..." can print
"No resources found" even when there are none; update the loop that runs the
command ("oc get llminferenceservice -n {namespace} --no-headers") to treat that
message as empty by checking result.stdout.lower() for "no resources" (or
similar phrasing) in addition to an empty string, e.g. replace the if condition
with one that succeeds when result.stdout.strip() == "" or "no resources" in
result.stdout.lower(), and keep the existing remaining_services/remaining_count
logic for the non-empty case.
🧹 Nitpick comments (2)
projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py (1)

189-199: Consider removing or documenting the disabled API latency section.

The if False: block contains dead code that will never execute. If this functionality is intentionally disabled pending future work, add a comment explaining why. Otherwise, consider removing it to improve code clarity.

♻️ Option 1: Add explanatory comment if keeping for future use
-        if False:
+        if False:  # TODO: Disabled pending API latency metrics implementation
             # API Server Request Latency Section
♻️ Option 2: Remove if not planned for use
         header.append(html.Br())

-        if False:
-            # API Server Request Latency Section
-            header.append(html.H3("⏱️ API Server Request Latency"))
-            header.append(html.P("API request duration analysis by operation type"))
-
-            header += report.Plot_and_Text("Prom: Sutest GET Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest PUT Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest LIST Requests duration", args)
-            header += report.Plot_and_Text("Prom: Sutest PATCH Requests duration", args)
-
-            header.append(html.Br())
-
         # ETCD Performance Section
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`
around lines 189 - 199, The block guarded by "if False:" (which builds the API
latency UI using header.append(html.H3(...)), header.append(html.P(...)),
report.Plot_and_Text(...) and header.append(html.Br())) is dead code; either
remove this whole conditional and its contents to eliminate unused code, or keep
it but replace "if False:" with an explanatory comment above (e.g., "Disabled:
API latency plots pending backend metrics" ) so future readers know it’s
intentionally disabled; locate the block around the use of header and
report.Plot_and_Text in prometheus_reports.py and apply one of these two
changes.
projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml (1)

54-60: Combine yq calls into a single operation for efficiency.

The two yq invocations can be merged into one using jq's pipe syntax, reducing file read/write overhead. The suggested combined form uses valid jq syntax supported by the Python yq version used throughout this project:

♻️ Suggested improvement: combine yq calls
 - name: Set the name and namespace in the LLM inference service YAML
   shell: |
-    yq -Yi '.metadata.name = "{{ llmd_deploy_llm_inference_service_name }}"' \
-       "{{ artifact_extra_logs_dir }}/src/llm_inference_service.yaml"
-    yq -Yi '.metadata.namespace = "{{ llmd_deploy_llm_inference_service_namespace }}"' \
+    yq -Yi '.metadata.name = "{{ llmd_deploy_llm_inference_service_name }}" | .metadata.namespace = "{{ llmd_deploy_llm_inference_service_namespace }}"' \
        "{{ artifact_extra_logs_dir }}/src/llm_inference_service.yaml"

Note: The -Yi flags are Python yq specific. If environments migrate to Go yq (mikefarah/yq), the flag syntax would need updating (Go yq uses -i only, without -Y). Consider documenting the yq version requirement if not already specified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml`
around lines 54 - 60, Replace the two separate yq invocations that set
.metadata.name and .metadata.namespace with a single yq invocation that applies
both updates in one expression (use the pipe/chain syntax to set .metadata.name
= "{{ llmd_deploy_llm_inference_service_name }}" and .metadata.namespace = "{{
llmd_deploy_llm_inference_service_namespace }}" in the same call), keep the
Python yq flags (-Yi) and target the same YAML file referenced in the task; this
reduces file I/O while preserving the existing variables
llmd_deploy_llm_inference_service_name and
llmd_deploy_llm_inference_service_namespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 203-205: Guard the deletion of spec.router to avoid KeyError by
checking for its existence before deleting: replace direct del
isvc_data['spec']['router'] with a conditional check like if 'router' in
isvc_data.get('spec', {}): and then delete (or pop) it; apply the same fix for
the second occurrence (the other deletion at lines 214-216) so both code paths
that handle the 'simple' and 'simple-x2' flavors safely remove spec.router only
when present.
- Around line 71-74: The code raises an undefined exception class
RuntimeException when the simple inference check in the conditional using
config.project.get_config("tests.llmd.inference_service.do_simple_test") fails;
update the raise statement to use the built-in RuntimeError instead (i.e.,
replace RuntimeException with RuntimeError) in the block that calls
test_llm_inference_simple(endpoint_url, llmisvc_name, namespace).

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py`:
- Around line 3-10: The module-level getattr(projects, "llm-d") causes
import-time AttributeError; move dynamic project resolution into register():
inside register() import importlib and call
importlib.import_module("projects.llm-d") (or the correct package path), then
obtain get_llmd_main_metrics from the imported module (e.g.,
imported_module.visualizations.llmd_inference.store.parsers.get_llmd_main_metrics)
and call it with register=True; add try/except to catch
ImportError/AttributeError and raise or log a clear error message indicating the
hyphenated package cannot be resolved at import time and that register() failed
to locate get_llmd_main_metrics.

---

Duplicate comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 423-433: The emptiness check using result.stdout.strip() is
unreliable because "oc get ..." can print "No resources found" even when there
are none; update the loop that runs the command ("oc get llminferenceservice -n
{namespace} --no-headers") to treat that message as empty by checking
result.stdout.lower() for "no resources" (or similar phrasing) in addition to an
empty string, e.g. replace the if condition with one that succeeds when
result.stdout.strip() == "" or "no resources" in result.stdout.lower(), and keep
the existing remaining_services/remaining_count logic for the non-empty case.

---

Nitpick comments:
In `@projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml`:
- Around line 54-60: Replace the two separate yq invocations that set
.metadata.name and .metadata.namespace with a single yq invocation that applies
both updates in one expression (use the pipe/chain syntax to set .metadata.name
= "{{ llmd_deploy_llm_inference_service_name }}" and .metadata.namespace = "{{
llmd_deploy_llm_inference_service_namespace }}" in the same call), keep the
Python yq flags (-Yi) and target the same YAML file referenced in the task; this
reduces file I/O while preserving the existing variables
llmd_deploy_llm_inference_service_name and
llmd_deploy_llm_inference_service_namespace.

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py`:
- Around line 189-199: The block guarded by "if False:" (which builds the API
latency UI using header.append(html.H3(...)), header.append(html.P(...)),
report.Plot_and_Text(...) and header.append(html.Br())) is dead code; either
remove this whole conditional and its contents to eliminate unused code, or keep
it but replace "if False:" with an explanatory comment above (e.g., "Disabled:
API latency plots pending backend metrics" ) so future readers know it’s
intentionally disabled; locate the block around the use of header and
report.Plot_and_Text in prometheus_reports.py and apply one of these two
changes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6d67f5 and 124907d.

📒 Files selected for processing (19)
  • docs/toolbox.generated/Llmd.run_guidellm_benchmark.rst
  • docs/toolbox.generated/Llmd.run_multiturn_benchmark.rst
  • projects/llm-d/testing/config.yaml
  • projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
  • projects/llm-d/testing/prepare_llmd.py
  • projects/llm-d/testing/test_llmd.py
  • projects/llm-d/toolbox/llmd.py
  • projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml
  • projects/llm-d/visualizations/llmd_inference/data/plots.yaml
  • projects/llm-d/visualizations/llmd_inference/plotting/__init__.py
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py
  • projects/llm-d/visualizations/llmd_inference/plotting/prometheus_reports.py
  • projects/llm-d/visualizations/llmd_inference/plotting/vllm_metrics.py
  • projects/llm-d/visualizations/llmd_inference/store/parsers.py
  • projects/matrix_benchmarking/subproject
✅ Files skipped from review due to trivial changes (2)
  • docs/toolbox.generated/Llmd.run_guidellm_benchmark.rst
  • projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • projects/matrix_benchmarking/subproject
  • projects/llm-d/visualizations/llmd_inference/plotting/init.py
  • projects/llm-d/visualizations/llmd_inference/store/parsers.py

Comment on lines +71 to 74
if config.project.get_config("tests.llmd.inference_service.do_simple_test"):
if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
raise RuntimeException("Simple inference test failed :/")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Verify RuntimeException usage and definition in Python sources
rg -nP '\bRuntimeException\b' --type=py
rg -nP '^\s*class\s+RuntimeException\b|^\s*def\s+RuntimeException\b' --type=py

Repository: openshift-psap/topsail

Length of output: 190


🏁 Script executed:

# Check imports and custom definitions in test_llmd.py
head -50 projects/llm-d/testing/test_llmd.py | grep -E '^import|^from|class RuntimeException'

Repository: openshift-psap/topsail

Length of output: 341


Use RuntimeError instead of RuntimeException; the latter is undefined in Python.

At line 73, RuntimeException is not a built-in Python exception and is not defined in this codebase. This will raise NameError at runtime when the code path is executed. Use RuntimeError instead.

🔧 Proposed fix
-                        if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
-                            raise RuntimeException("Simple inference test failed :/")
+                        if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
+                            raise RuntimeError("Simple inference test failed :/")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if config.project.get_config("tests.llmd.inference_service.do_simple_test"):
if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
raise RuntimeException("Simple inference test failed :/")
if config.project.get_config("tests.llmd.inference_service.do_simple_test"):
if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
raise RuntimeError("Simple inference test failed :/")
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 73-73: Abstract raise to an inner function

(TRY301)


[warning] 73-73: Avoid specifying long messages outside the exception class

(TRY003)


[error] 73-73: Undefined name RuntimeException

(F821)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 71 - 74, The code raises an
undefined exception class RuntimeException when the simple inference check in
the conditional using
config.project.get_config("tests.llmd.inference_service.do_simple_test") fails;
update the raise statement to use the built-in RuntimeError instead (i.e.,
replace RuntimeException with RuntimeError) in the block that calls
test_llm_inference_simple(endpoint_url, llmisvc_name, namespace).

Comment on lines +203 to +205
# Remove spec.router if it exists
del isvc_data['spec']['router']
logging.info("Removed spec.router for 'simple' flavor")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard spec.router removal to avoid KeyError.

Both simple and simple-x2 paths unconditionally delete spec.router, but comments say “if it exists”.

🔧 Proposed fix
-        del isvc_data['spec']['router']
+        isvc_data.get('spec', {}).pop('router', None)
@@
-        del isvc_data['spec']['router']
+        isvc_data.get('spec', {}).pop('router', None)

Also applies to: 214-216

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 203 - 205, Guard the
deletion of spec.router to avoid KeyError by checking for its existence before
deleting: replace direct del isvc_data['spec']['router'] with a conditional
check like if 'router' in isvc_data.get('spec', {}): and then delete (or pop)
it; apply the same fix for the second occurrence (the other deletion at lines
214-216) so both code paths that handle the 'simple' and 'simple-x2' flavors
safely remove spec.router only when present.

Comment on lines +3 to +10
llmd = getattr(projects, "llm-d")

def register():
"""
Register Prometheus metrics plots for LLM-D inference visualization
"""
get_llmd_main_metrics = llmd.visualizations.llmd_inference.store.parsers.get_llmd_main_metrics
get_llmd_main_metrics(register=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find all dynamic access sites for "llm-d"
echo "=== Step 1: Dynamic access sites for 'llm-d' ==="
rg -n -C3 'getattr\s*\(\s*projects\s*,\s*"llm-d"\s*\)' || echo "No matches found"

# 2) Inspect how the projects package exposes dynamic attributes
echo -e "\n=== Step 2: Projects package __init__.py files ==="
fd "__init__.py" projects --exec rg -n -C3 '__getattr__|setattr|llm-d' {} + || echo "No matches found"

# 3) Confirm where get_llmd_main_metrics is defined and accessed
echo -e "\n=== Step 3: get_llmd_main_metrics definition and usage ==="
rg -n -C3 '\bget_llmd_main_metrics\b' || echo "No matches found"

# 4) Check the actual file structure
echo -e "\n=== Step 4: File structure check ==="
ls -la projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py 2>/dev/null || echo "File not found or path incorrect"

# 5) Check if projects is a namespace package or regular package
echo -e "\n=== Step 5: Projects package structure ==="
find projects -name "__init__.py" -type f | head -20

Repository: openshift-psap/topsail

Length of output: 7458


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if projects/__init__.py exists and its content
echo "=== Projects __init__.py ==="
if [ -f projects/__init__.py ]; then
    wc -l projects/__init__.py
    echo "--- Content ---"
    cat -n projects/__init__.py
else
    echo "No projects/__init__.py found"
fi

# Check for namespace package indicators
echo -e "\n=== Directory structure of projects ==="
ls -la projects/ | head -20

# Check if there's a setup.py or pyproject.toml that might define namespace packages
echo -e "\n=== Setup files ==="
find . -maxdepth 2 -name "setup.py" -o -name "pyproject.toml" | head -10

# Try to understand the import: check if llm-d can be imported directly
echo -e "\n=== Check if llm-d is a valid python identifier in projects ==="
python3 << 'PYEOF'
import sys
import os
sys.path.insert(0, '/tmp/repo')

# Check if the import works
try:
    import projects
    print("projects imported successfully")
    print(f"projects type: {type(projects)}")
    print(f"projects.__file__: {getattr(projects, '__file__', 'N/A')}")
    print(f"projects.__path__: {getattr(projects, '__path__', 'N/A')}")
    
    # Try to get llm-d
    try:
        llmd = getattr(projects, "llm-d")
        print(f"✓ getattr(projects, 'llm-d') works: {llmd}")
    except AttributeError as e:
        print(f"✗ getattr(projects, 'llm-d') failed: {e}")
        
except Exception as e:
    print(f"Error: {e}")
PYEOF

Repository: openshift-psap/topsail

Length of output: 1622


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if llm-d has __init__.py
echo "=== llm-d structure ==="
ls -la projects/llm-d/

# Check the problematic prometheus.py file content
echo -e "\n=== Full content of prometheus.py ==="
cat -n projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py

# Search for where this module is imported
echo -e "\n=== Where is prometheus.py imported? ==="
rg -n "from.*prometheus import|import.*prometheus" || echo "No imports found"

# Search for usage of register() function from this module
echo -e "\n=== Uses of register() ==="
rg -n "prometheus.*register\(\)" || echo "No calls found"

# Check if importlib is used to load llm-d
echo -e "\n=== Importlib usage for dynamic import ==="
rg -n "importlib.*llm" || echo "No importlib llm imports found"

# Check for any __getattr__ in projects that might make this work
echo -e "\n=== __getattr__ definitions ==="
rg -n "__getattr__" projects/ || echo "No __getattr__ found"

Repository: openshift-psap/topsail

Length of output: 1615


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== projects/llm-d/visualizations/llmd_inference/plotting/__init__.py ==="
cat -n projects/llm-d/visualizations/llmd_inference/plotting/__init__.py

echo -e "\n=== Check if this __init__.py is ever actually executed ==="
rg -n "plotting" projects/ --type py -A2 | head -50

Repository: openshift-psap/topsail

Length of output: 5468


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Test if importlib can access llm-d ==="
python3 << 'PYEOF'
import sys
sys.path.insert(0, '/tmp/repo')

import importlib

try:
    llmd = importlib.import_module("projects.llm-d")
    print(f"✓ importlib.import_module('projects.llm-d') works: {llmd}")
except Exception as e:
    print(f"✗ importlib.import_module('projects.llm-d') failed: {type(e).__name__}: {e}")

# Check what the proper way to access it would be
try:
    import projects
    # Check what attributes are available
    print(f"\nAvailable attributes in projects: {dir(projects)[:5]}...")
except Exception as e:
    print(f"Error: {e}")
PYEOF

echo -e "\n=== Check if directory name should be llm_d instead ==="
ls -la projects/ | grep -E "llm|LLM"

Repository: openshift-psap/topsail

Length of output: 537


Move dynamic project resolution into register() with proper guards.

getattr(projects, "llm-d") on Line 3 executes at module import time and will fail with AttributeError because namespace packages don't expose hyphenated subdirectories as attributes—only the import system can resolve them. This breaks the entire visualizations module when register() is called. Move the resolution into register() using importlib.import_module() with clear error messaging instead.

Proposed fix
 import projects
-llmd = getattr(projects, "llm-d")
 
 def register():
     """
     Register Prometheus metrics plots for LLM-D inference visualization
     """
+    import importlib
+    try:
+        llmd = importlib.import_module("projects.llm-d")
+    except (ImportError, ModuleNotFoundError) as exc:
+        raise RuntimeError(
+            "Cannot register LLM-D Prometheus plots: project 'llm-d' is unavailable."
+        ) from exc
+
     try:
         get_llmd_main_metrics = llmd.visualizations.llmd_inference.store.parsers.get_llmd_main_metrics
     except AttributeError as exc:
         raise RuntimeError(
             "Cannot register LLM-D Prometheus plots: "
             "missing visualizations.llmd_inference.store.parsers.get_llmd_main_metrics."
         ) from exc
+
     get_llmd_main_metrics(register=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
llmd = getattr(projects, "llm-d")
def register():
"""
Register Prometheus metrics plots for LLM-D inference visualization
"""
get_llmd_main_metrics = llmd.visualizations.llmd_inference.store.parsers.get_llmd_main_metrics
get_llmd_main_metrics(register=True)
def register():
"""
Register Prometheus metrics plots for LLM-D inference visualization
"""
import importlib
try:
llmd = importlib.import_module("projects.llm-d")
except (ImportError, ModuleNotFoundError) as exc:
raise RuntimeError(
"Cannot register LLM-D Prometheus plots: project 'llm-d' is unavailable."
) from exc
try:
get_llmd_main_metrics = llmd.visualizations.llmd_inference.store.parsers.get_llmd_main_metrics
except AttributeError as exc:
raise RuntimeError(
"Cannot register LLM-D Prometheus plots: "
"missing visualizations.llmd_inference.store.parsers.get_llmd_main_metrics."
) from exc
get_llmd_main_metrics(register=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/visualizations/llmd_inference/plotting/prometheus.py` around
lines 3 - 10, The module-level getattr(projects, "llm-d") causes import-time
AttributeError; move dynamic project resolution into register(): inside
register() import importlib and call importlib.import_module("projects.llm-d")
(or the correct package path), then obtain get_llmd_main_metrics from the
imported module (e.g.,
imported_module.visualizations.llmd_inference.store.parsers.get_llmd_main_metrics)
and call it with register=True; add try/except to catch
ImportError/AttributeError and raise or log a clear error message indicating the
hyphenated package cannot be resolved at import time and that register() failed
to locate get_llmd_main_metrics.

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented Feb 26, 2026

/test jump-ci llm-d scale_up
/cluster kpouget-ibm
/only test_ci

@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 26, 2026

🔴 Test of 'llm-d test test_ci' failed after 01 hours 15 minutes 39 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_0: jump-ci
PR_POSITIONAL_ARG_1: scale_up

Failure indicator:

/tmp/topsail_202602261772128740/002__llm_d_testing/001__flavor_simple-x2/001__llmd__deploy_llm_inference_service/FAILURE | [001__llmd__deploy_llm_inference_service] ./run_toolbox.py llmd deploy_llm_inference_service --name=llama-llm-d-simple-x2 --namespace=llm-d-project --yaml_file=/tmp/topsail_202602261772128740/002__llm_d_testing/001__flavor_simple-x2/llama-3-1-8b-instruct-fp8.yaml --> 2


@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented Feb 26, 2026

/test jump-ci llm-d scale_up multi-flavor
/cluster kpouget-ibm
/only test_ci

@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 26, 2026

🔴 Test of 'llm-d test test_ci' failed after 00 hours 00 minutes 01 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_0: jump-ci
PR_POSITIONAL_ARG_1: scale_up
PR_POSITIONAL_ARG_2: multi-flavor

Failure indicator: Empty. (See run.log)

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented Feb 26, 2026

/test jump-ci llm-d scale_up multi-flavor
/cluster kpouget-ibm
/only test_ci

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml (1)

56-63: ⚠️ Potential issue | 🟠 Major

Avoid parsing tabular oc get output for terminal job state.

Line 59 depends on field positions from a human-readable table, which is fragile. Use structured JSONPath from job status to emit Complete/Failed deterministically.

Proposed patch
   - name: Wait for the multiturn benchmark job to complete or fail
     shell: |
       set -o pipefail;
-      oc get job "{{ llmd_run_multiturn_benchmark_name }}" -n "{{ target_namespace }}" --no-headers | awk '{print $2; print "Job duration:", $4 > "/dev/stderr"}'
+      succeeded=$(oc get job "{{ llmd_run_multiturn_benchmark_name }}" -n "{{ target_namespace }}" -o jsonpath='{.status.succeeded}')
+      failed=$(oc get job "{{ llmd_run_multiturn_benchmark_name }}" -n "{{ target_namespace }}" -o jsonpath='{.status.failed}')
+      if [ "${succeeded:-0}" -gt 0 ]; then
+        echo Complete
+      elif [ "${failed:-0}" -gt 0 ]; then
+        echo Failed
+      else
+        echo Running
+      fi
     register: job_status_result
     until: job_status_result.stdout in ['Complete', 'Failed']

Run this in a cluster context to confirm the current parser behavior:

#!/bin/bash
set -euo pipefail
job="${1:?job name}"
ns="${2:?namespace}"

echo "Raw table output:"
oc get job "$job" -n "$ns" --no-headers

echo "Status conditions:"
oc get job "$job" -n "$ns" -o jsonpath='{range .status.conditions[*]}{.type}{" "}{end}{"\n"}'

echo "Succeeded/Failed counters:"
oc get job "$job" -n "$ns" -o jsonpath='succeeded={.status.succeeded} failed={.status.failed}{"\n"}'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml` around
lines 56 - 63, The current task reads human-readable oc get job table output (in
the shell that registers job_status_result) which is fragile; change the shell
invocation that references llmd_run_multiturn_benchmark_name and
target_namespace to use structured JSON/jsonpath (e.g. check
.status.conditions[*].type and .status.succeeded/.status.failed) and emit a
deterministic string such as "Complete" or "Failed" (and keep any duration to
stderr if needed), then keep the until condition on job_status_result.stdout
checking for 'Complete' or 'Failed' as before; update the task that registers
job_status_result so the deterministic JSON/jsonpath output is produced instead
of parsing table columns.
♻️ Duplicate comments (2)
projects/llm-d/testing/test_llmd.py (2)

71-74: ⚠️ Potential issue | 🔴 Critical

Use RuntimeError instead of RuntimeException; the latter is undefined in Python.

RuntimeException is not a built-in Python exception and will raise NameError at runtime. This was flagged in a previous review but remains unfixed.

🐛 Proposed fix
                     if config.project.get_config("tests.llmd.inference_service.do_simple_test"):
                         if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
-                            raise RuntimeException("Simple inference test failed :/")
+                            raise RuntimeError("Simple inference test failed :/")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 71 - 74, The code raises an
undefined RuntimeException in the conditional that calls
test_llm_inference_simple; change the exception class to the built-in
RuntimeError so the failure path works correctly (replace RuntimeException with
RuntimeError in the block that checks
config.project.get_config("tests.llmd.inference_service.do_simple_test") and the
call to test_llm_inference_simple(endpoint_url, llmisvc_name, namespace)).

200-206: ⚠️ Potential issue | 🟠 Major

Guard spec.router deletion to avoid KeyError.

Line 204 unconditionally deletes spec.router, but the comment says "if it exists". If the YAML doesn't have this key, a KeyError will be raised. This was flagged in a previous review but remains unfixed.

🔧 Proposed fix
     if "simple" in flavor:
         logging.info("Applying 'simple' flavor modifications")
 
         # Remove spec.router if it exists
-        del isvc_data['spec']['router']
+        isvc_data.get('spec', {}).pop('router', None)
         logging.info("Removed spec.router for 'simple' flavor")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 200 - 206, The code
unconditionally deletes isvc_data['spec']['router'] which raises KeyError when
absent; update the "simple" flavor branch to check for the existence of the
'spec' and 'router' keys before deleting (e.g., verify 'spec' in isvc_data and
'router' in isvc_data['spec'] or use dict.pop with a default) so that removal is
safe—locate the deletion around the conditional handling of the "simple" flavor
where isvc_data is modified and replace it with a guarded check or safe removal
operation.
🧹 Nitpick comments (3)
projects/llm-d/testing/test_llmd.py (2)

497-499: Dead code: returncode check is unreachable.

Similar to the earlier observation, if run.run() raises on failure, line 497's check if result.returncode != 0 will never be true. Consider removing this branch.

Based on learnings: run.run() raises an exception on failure, so this check is unnecessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 497 - 499, The return-code
check after calling run.run() is dead/unreachable because run.run() raises on
failure; remove the unreachable branch that tests result.returncode != 0 in
test_llmd.py and instead rely on the existing exception handling around
run.run() (or catch the exception where appropriate). Locate the block that
calls run.run() (the variable/result named result and the if result.returncode
!= 0 branch) and delete that conditional and its return False, keeping any
meaningful logging inside the exception handler if needed.

263-270: Redundant returncode check after run.run().

Based on the learning that run.run() raises an exception on failure, the check at line 266 (if url_result.returncode == 0) is always true when reached. Consider removing this check or using the alternate approach with check=False if you need to handle non-zero exit codes gracefully.

♻️ Suggested simplification
     url_result = run.run(f"oc get llminferenceservice {llmisvc_name} -n {namespace} "
                         f"-ojsonpath='{{.status.url}}'", capture_stdout=True)
 
-    endpoint_url = url_result.stdout.strip() if url_result.returncode == 0 else ""
+    endpoint_url = url_result.stdout.strip()

Based on learnings: run.run() raises an exception on failure, so checking result.returncode != 0 after a run.run() call is unnecessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 263 - 270, The code
redundantly checks url_result.returncode after calling run.run(), which raises
on non-zero exits; update the snippet to either (A) remove the returncode check
and set endpoint_url = url_result.stdout.strip() and return it if non-empty, or
(B) call run.run(..., check=False) if you intend to handle failures manually and
then inspect url_result.returncode; adjust the block around run.run, url_result
and endpoint_url accordingly so there is no unreachable/always-true returncode
check.
projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml (1)

27-30: Fail fast if endpoint_url is empty or invalid.

This path now fully depends on user input, so adding an upfront assert will prevent delayed job failures with less actionable errors.

Proposed patch
+- name: Validate the benchmark endpoint URL
+  assert:
+    that:
+      - llmd_run_guidellm_benchmark_endpoint_url is string
+      - (llmd_run_guidellm_benchmark_endpoint_url | trim | length) > 0
+      - llmd_run_guidellm_benchmark_endpoint_url is match('^https?://')
+    fail_msg: "llmd_run_guidellm_benchmark_endpoint_url must be a non-empty http(s) URL"
+
 - name: Set the benchmark endpoint URL
   set_fact:
     benchmark_endpoint_url: "{{ llmd_run_guidellm_benchmark_endpoint_url }}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml` around
lines 27 - 30, Add a fail-fast validation before the "Set the benchmark endpoint
URL" task: verify that the input variable
llmd_run_guidellm_benchmark_endpoint_url is present and a non-empty valid URL,
and if not fail the play with a clear message; update the task sequence around
the set_fact that sets benchmark_endpoint_url to first run an assert/fail check
(using Ansible's assert or fail) referencing
llmd_run_guidellm_benchmark_endpoint_url and include a descriptive error message
so the run aborts immediately when the endpoint is empty or invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 214-216: The incomplete assignment to
isvc_data['spec']['replicas'] causes a SyntaxError; fix by assigning an integer
value (e.g., 2) when the "x2" flavor is detected and keep the logging
call—update the block that checks if "x2" in flavor to set
isvc_data['spec']['replicas'] = 2 and then call logging.info("Setting
spec.replicas = 2") so the test builds correctly.

---

Outside diff comments:
In `@projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml`:
- Around line 56-63: The current task reads human-readable oc get job table
output (in the shell that registers job_status_result) which is fragile; change
the shell invocation that references llmd_run_multiturn_benchmark_name and
target_namespace to use structured JSON/jsonpath (e.g. check
.status.conditions[*].type and .status.succeeded/.status.failed) and emit a
deterministic string such as "Complete" or "Failed" (and keep any duration to
stderr if needed), then keep the until condition on job_status_result.stdout
checking for 'Complete' or 'Failed' as before; update the task that registers
job_status_result so the deterministic JSON/jsonpath output is produced instead
of parsing table columns.

---

Duplicate comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 71-74: The code raises an undefined RuntimeException in the
conditional that calls test_llm_inference_simple; change the exception class to
the built-in RuntimeError so the failure path works correctly (replace
RuntimeException with RuntimeError in the block that checks
config.project.get_config("tests.llmd.inference_service.do_simple_test") and the
call to test_llm_inference_simple(endpoint_url, llmisvc_name, namespace)).
- Around line 200-206: The code unconditionally deletes
isvc_data['spec']['router'] which raises KeyError when absent; update the
"simple" flavor branch to check for the existence of the 'spec' and 'router'
keys before deleting (e.g., verify 'spec' in isvc_data and 'router' in
isvc_data['spec'] or use dict.pop with a default) so that removal is safe—locate
the deletion around the conditional handling of the "simple" flavor where
isvc_data is modified and replace it with a guarded check or safe removal
operation.

---

Nitpick comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 497-499: The return-code check after calling run.run() is
dead/unreachable because run.run() raises on failure; remove the unreachable
branch that tests result.returncode != 0 in test_llmd.py and instead rely on the
existing exception handling around run.run() (or catch the exception where
appropriate). Locate the block that calls run.run() (the variable/result named
result and the if result.returncode != 0 branch) and delete that conditional and
its return False, keeping any meaningful logging inside the exception handler if
needed.
- Around line 263-270: The code redundantly checks url_result.returncode after
calling run.run(), which raises on non-zero exits; update the snippet to either
(A) remove the returncode check and set endpoint_url = url_result.stdout.strip()
and return it if non-empty, or (B) call run.run(..., check=False) if you intend
to handle failures manually and then inspect url_result.returncode; adjust the
block around run.run, url_result and endpoint_url accordingly so there is no
unreachable/always-true returncode check.

In `@projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml`:
- Around line 27-30: Add a fail-fast validation before the "Set the benchmark
endpoint URL" task: verify that the input variable
llmd_run_guidellm_benchmark_endpoint_url is present and a non-empty valid URL,
and if not fail the play with a clear message; update the task sequence around
the set_fact that sets benchmark_endpoint_url to first run an assert/fail check
(using Ansible's assert or fail) referencing
llmd_run_guidellm_benchmark_endpoint_url and include a descriptive error message
so the run aborts immediately when the endpoint is empty or invalid.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 124907d and f41bdd5.

📒 Files selected for processing (11)
  • docs/toolbox.generated/Llmd.run_guidellm_benchmark.rst
  • docs/toolbox.generated/Llmd.run_multiturn_benchmark.rst
  • projects/llm-d/testing/config.yaml
  • projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
  • projects/llm-d/testing/test_llmd.py
  • projects/llm-d/toolbox/llmd.py
  • projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml
🚧 Files skipped from review as they are similar to previous changes (4)
  • projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd.py
  • projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
projects/llm-d/toolbox/llmd.py (1)

51-63: ⚠️ Potential issue | 🟠 Major

Avoid hard-breaking callers on renamed public kwargs.

Changing the public method kwargs from llmisvc_name to endpoint_url (Line 53, Line 77) is a backward-incompatible API break for existing keyword callers. Please add a temporary compatibility shim (accept old kwarg and map it) or explicitly ship this as a documented breaking change with migration guidance.

Suggested compatibility pattern
-    def run_multiturn_benchmark(
-            self,
-            endpoint_url,
+    def run_multiturn_benchmark(
+            self,
+            endpoint_url=None, llmisvc_name=None,
             name="multi-turn-benchmark", namespace="",
             image="quay.io/hayesphilip/multi-turn-benchmark", version="0.0.1",
             timeout=900, parallel=9
     ):
+        endpoint_url = endpoint_url or llmisvc_name
+        if not endpoint_url:
+            raise ValueError("endpoint_url is required")
         return RunAnsibleRole(locals())
-    def run_guidellm_benchmark(
-            self,
-            endpoint_url,
+    def run_guidellm_benchmark(
+            self,
+            endpoint_url=None, llmisvc_name=None,
             name="guidellm-benchmark", namespace="",
             image="ghcr.io/vllm-project/guidellm", version="v0.5.3",
             timeout=900, profile="sweep", max_seconds=30,
             processor="RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic",
             data="prompt_tokens=256,output_tokens=128"
     ):
+        endpoint_url = endpoint_url or llmisvc_name
+        if not endpoint_url:
+            raise ValueError("endpoint_url is required")
         return RunAnsibleRole(locals())

Also applies to: 75-89

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/toolbox/llmd.py` around lines 51 - 63, The public method
run_multiturn_benchmark was renamed from llmisvc_name to endpoint_url and this
breaks existing keyword callers; add a compatibility shim that accepts the old
keyword (llmisvc_name) and maps it to endpoint_url when provided (e.g., add
llmisvc_name=None to the signature or capture **kwargs and set endpoint_url =
endpoint_url or kwargs.get("llmisvc_name")), ensure that if both are provided
endpoint_url takes precedence, and add a brief inline comment referencing
run_multiturn_benchmark to remove the shim after a deprecation period.
♻️ Duplicate comments (2)
projects/llm-d/testing/test_llmd.py (2)

203-205: ⚠️ Potential issue | 🟠 Major

Guard spec.router removal to avoid KeyError.

The del statement will raise KeyError if spec.router doesn't exist in the YAML. This was flagged in a previous review.

🔧 Proposed fix
     if "simple" in flavor:
         logging.info("Applying 'simple' flavor modifications")
 
         # Remove spec.router if it exists
-        del isvc_data['spec']['router']
+        isvc_data.get('spec', {}).pop('router', None)
         logging.info("Removed spec.router for 'simple' flavor")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 203 - 205, The code
unconditionally deletes isvc_data['spec']['router'], which can raise KeyError;
update the test to safely remove the key by first ensuring 'spec' exists on
isvc_data and then using a safe removal (e.g., dict.pop with a default) for
'router' so no exception is raised if it's absent; reference the isvc_data dict
and the spec.router key in the test_llmd.py snippet when making this change.

71-74: ⚠️ Potential issue | 🔴 Critical

Use RuntimeError instead of RuntimeException; the latter is undefined in Python.

RuntimeException is not a built-in Python exception and will raise NameError at runtime. This was flagged in a previous review and is confirmed by static analysis (F821).

🐛 Proposed fix
                     if config.project.get_config("tests.llmd.inference_service.do_simple_test"):
                         if not test_llm_inference_simple(endpoint_url, llmisvc_name, namespace):
-                            raise RuntimeException("Simple inference test failed :/")
+                            raise RuntimeError("Simple inference test failed")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 71 - 74, Replace the
undefined Python exception RuntimeException with the built-in RuntimeError where
the simple inference test failure is raised (inside the conditional that calls
test_llm_inference_simple with endpoint_url, llmisvc_name, namespace); update
the raise statement to use RuntimeError("Simple inference test failed :/") so it
doesn't trigger a NameError/F821 at runtime.
🧹 Nitpick comments (3)
projects/llm-d/testing/test_llmd.py (3)

223-224: Chain the exception to preserve traceback.

Using raise ... from e preserves the original exception context for debugging.

♻️ Suggested improvement
     except Exception as e:
-        raise ValueError(f"Failed to write reshaped ISVC file to {output_path}: {e}")
+        raise ValueError(f"Failed to write reshaped ISVC file to {output_path}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 223 - 224, The except block
that currently re-raises a ValueError for failed writes to the reshaped ISVC
file should chain the original exception to preserve the traceback; update the
exception re-raise in the except Exception as e handler to use "raise
ValueError(f'Failed to write reshaped ISVC file to {output_path}: {e}') from e"
so the original exception context (e) is preserved when raising the new
ValueError.

336-372: Unused parameter llmisvc_name (same as run_multiturn_benchmark).

Same issue as above. Consider removing or documenting the retained parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 336 - 372, The function
run_guidellm_benchmark currently accepts an unused parameter llmisvc_name (same
situation as run_multiturn_benchmark); either remove llmisvc_name from the
function signature and any callers, or if it must be retained for API
compatibility, add a brief docstring note and use a no-op reference (e.g., mark
as unused with a comment) to avoid linter complaints; update
run_guidellm_benchmark's signature and any tests/call sites or add the
documentation change accordingly to reflect the chosen approach.

301-301: Unused parameter llmisvc_name.

The llmisvc_name parameter is not used in this function. Consider removing it or documenting why it's retained (e.g., for API consistency).

♻️ Option: Remove unused parameter
-def run_multiturn_benchmark(endpoint_url, llmisvc_name, namespace):
+def run_multiturn_benchmark(endpoint_url, namespace):

Note: This would require updating the call site at line 77.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` at line 301, The function
run_multiturn_benchmark has an unused parameter llmisvc_name — either remove
llmisvc_name from the function signature and all call sites that pass it, or if
it must be kept for API consistency, document it in the function docstring and
use it (e.g., include it in logging or as a nop-flag) so linters/test coverage
stop flagging it; update all callers of run_multiturn_benchmark accordingly to
match the new signature or to pass a meaningful value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 490-495: The inline shell command building the curl request in the
run.run call is fragile because payload_json may contain single quotes; change
the invocation so the JSON is passed safely via stdin or a heredoc/base64
instead of embedding it inside single quotes. Locate the run.run(...) that
references payload_json, namespace, deployment_name and endpoint_url and replace
the inline '-d '<payload_json>'' usage with a safe transfer (e.g., base64-encode
payload_json and decode in the container piped to curl -d `@-` or use a quoted
heredoc like <<'EOF' to feed curl), ensuring the curl command reads the body
from stdin or a file to avoid shell-quoting issues.
- Around line 99-101: The code currently overwrites the exception object by
setting overall_failed = True when a flavor fails, so the later
isinstance(overall_failed, Exception) check never passes; change the logic to
preserve the first exception (e.g., set overall_failed = flavor_failed or
introduce first_exception and assign first_exception = flavor_failed when it's
None) and keep a separate boolean if needed (e.g., had_failure = True) so the
existing re-raise path (isinstance(..., Exception)) will receive the original
Exception object and can re-raise it.
- Around line 263-279: The subprocess call handling is wrong because run.run()
defaults to check=True so returncode checks in the url_result handling are
unreachable; update both calls to run.run(...) that populate url_result (the two
calls fetching .status.url and .status.addresses[0].url) to pass check=False so
you can inspect url_result.returncode and url_result.stderr, and keep the
existing logic that sets endpoint_url from url_result.stdout and raises
RuntimeError using url_result.stderr when returncode != 0; alternatively, if you
prefer exceptions, remove the returncode checks and wrap the run.run calls in a
try/except for subprocess.CalledProcessError around the block that uses
llmisvc_name and namespace.

---

Outside diff comments:
In `@projects/llm-d/toolbox/llmd.py`:
- Around line 51-63: The public method run_multiturn_benchmark was renamed from
llmisvc_name to endpoint_url and this breaks existing keyword callers; add a
compatibility shim that accepts the old keyword (llmisvc_name) and maps it to
endpoint_url when provided (e.g., add llmisvc_name=None to the signature or
capture **kwargs and set endpoint_url = endpoint_url or
kwargs.get("llmisvc_name")), ensure that if both are provided endpoint_url takes
precedence, and add a brief inline comment referencing run_multiturn_benchmark
to remove the shim after a deprecation period.

---

Duplicate comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 203-205: The code unconditionally deletes
isvc_data['spec']['router'], which can raise KeyError; update the test to safely
remove the key by first ensuring 'spec' exists on isvc_data and then using a
safe removal (e.g., dict.pop with a default) for 'router' so no exception is
raised if it's absent; reference the isvc_data dict and the spec.router key in
the test_llmd.py snippet when making this change.
- Around line 71-74: Replace the undefined Python exception RuntimeException
with the built-in RuntimeError where the simple inference test failure is raised
(inside the conditional that calls test_llm_inference_simple with endpoint_url,
llmisvc_name, namespace); update the raise statement to use RuntimeError("Simple
inference test failed :/") so it doesn't trigger a NameError/F821 at runtime.

---

Nitpick comments:
In `@projects/llm-d/testing/test_llmd.py`:
- Around line 223-224: The except block that currently re-raises a ValueError
for failed writes to the reshaped ISVC file should chain the original exception
to preserve the traceback; update the exception re-raise in the except Exception
as e handler to use "raise ValueError(f'Failed to write reshaped ISVC file to
{output_path}: {e}') from e" so the original exception context (e) is preserved
when raising the new ValueError.
- Around line 336-372: The function run_guidellm_benchmark currently accepts an
unused parameter llmisvc_name (same situation as run_multiturn_benchmark);
either remove llmisvc_name from the function signature and any callers, or if it
must be retained for API compatibility, add a brief docstring note and use a
no-op reference (e.g., mark as unused with a comment) to avoid linter
complaints; update run_guidellm_benchmark's signature and any tests/call sites
or add the documentation change accordingly to reflect the chosen approach.
- Line 301: The function run_multiturn_benchmark has an unused parameter
llmisvc_name — either remove llmisvc_name from the function signature and all
call sites that pass it, or if it must be kept for API consistency, document it
in the function docstring and use it (e.g., include it in logging or as a
nop-flag) so linters/test coverage stop flagging it; update all callers of
run_multiturn_benchmark accordingly to match the new signature or to pass a
meaningful value.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f41bdd5 and 084e9ae.

📒 Files selected for processing (11)
  • docs/toolbox.generated/Llmd.run_guidellm_benchmark.rst
  • docs/toolbox.generated/Llmd.run_multiturn_benchmark.rst
  • projects/llm-d/testing/config.yaml
  • projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
  • projects/llm-d/testing/test_llmd.py
  • projects/llm-d/toolbox/llmd.py
  • projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/defaults/main/config.yml
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml
🚧 Files skipped from review as they are similar to previous changes (5)
  • projects/llm-d/toolbox/llmd_run_multiturn_benchmark/tasks/main.yml
  • projects/llm-d/testing/llmisvcs/llama-3-1-8b-instruct-fp8.yaml
  • projects/llm-d/toolbox/llmd_run_guidellm_benchmark/tasks/main.yml
  • docs/toolbox.generated/Llmd.run_guidellm_benchmark.rst
  • projects/llm-d/toolbox/llmd_deploy_llm_inference_service/tasks/main.yml

Comment on lines +99 to +101
if flavor_failed:
overall_failed = True
logging.error(f"Test failed for flavor: {flavor}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Exception object is lost; re-raise logic at line 120-121 will never trigger.

At line 100, overall_failed is set to True (boolean), but line 87 stores the actual exception in flavor_failed. Later at lines 120-121, the code checks isinstance(overall_failed, Exception), which will always be False since overall_failed is a boolean.

If the intent is to re-raise the first exception encountered, preserve it:

🔧 Proposed fix
                 if flavor_failed:
-                    overall_failed = True
+                    if not overall_failed:
+                        overall_failed = flavor_failed if isinstance(flavor_failed, Exception) else True
                     logging.error(f"Test failed for flavor: {flavor}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 99 - 101, The code
currently overwrites the exception object by setting overall_failed = True when
a flavor fails, so the later isinstance(overall_failed, Exception) check never
passes; change the logic to preserve the first exception (e.g., set
overall_failed = flavor_failed or introduce first_exception and assign
first_exception = flavor_failed when it's None) and keep a separate boolean if
needed (e.g., had_failure = True) so the existing re-raise path (isinstance(...,
Exception)) will receive the original Exception object and can re-raise it.

Comment on lines 263 to +279
url_result = run.run(f"oc get llminferenceservice {llmisvc_name} -n {namespace} "
f"-o jsonpath='{{.status.url}}'", capture_stdout=True)
f"-ojsonpath='{{.status.url}}'", capture_stdout=True)

if url_result.returncode == 0:
logging.info(f"LLM inference service URL: {url_result.stdout.strip()}")
endpoint_url = url_result.stdout.strip() if url_result.returncode == 0 else ""

return llmisvc_name, namespace
if endpoint_url:
logging.info(f"LLM inference service URL: {endpoint_url}")
return endpoint_url

# If that didn't work or is empty, try .status.addresses[0].url
logging.info("Trying alternate URL location at .status.addresses[0].url")
url_result = run.run(f"oc get llminferenceservice {llmisvc_name} -n {namespace} "
f"-ojsonpath='{{.status.addresses[0].url}}'", capture_stdout=True)

if url_result.returncode != 0:
logging.error("Failed to get LLM inference service URL from both locations")
raise RuntimeError(f"Failed to get LLM inference service URL: {url_result.stderr}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and examine the run.run() function implementation
find . -name "run.py" -path "*/core/library/*" -type f

Repository: openshift-psap/topsail

Length of output: 96


🏁 Script executed:

# Search for the run.run() function definition
rg -n "def run\(" --type=py | grep -E "(core|library)" | head -20

Repository: openshift-psap/topsail

Length of output: 396


🏁 Script executed:

# Read the run.run() function implementation
sed -n '105,200p' projects/core/library/run.py

Repository: openshift-psap/topsail

Length of output: 3292


Returncode checks are unreachable dead code.

The run.run() function uses check=True by default, which causes subprocess.run() to raise a CalledProcessError exception if the command fails. The returncode checks at lines 266 and 277 will never be reached on failure—the exception will be raised first.

Either remove these checks or pass check=False to run.run() if you want to inspect the return code instead.

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 279-279: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 263 - 279, The subprocess
call handling is wrong because run.run() defaults to check=True so returncode
checks in the url_result handling are unreachable; update both calls to
run.run(...) that populate url_result (the two calls fetching .status.url and
.status.addresses[0].url) to pass check=False so you can inspect
url_result.returncode and url_result.stderr, and keep the existing logic that
sets endpoint_url from url_result.stdout and raises RuntimeError using
url_result.stderr when returncode != 0; alternatively, if you prefer exceptions,
remove the returncode checks and wrap the run.run calls in a try/except for
subprocess.CalledProcessError around the block that uses llmisvc_name and
namespace.

Comment on lines 490 to 495
result = run.run(f"""
curl -s "{url}/v1/completions" \\
oc rsh -n {namespace} -c main deploy/{deployment_name} \\
curl -k -sSf "{endpoint_url}/v1/completions" \\
-H "Content-Type: application/json" \\
-d @{payload_file}
-d '{payload_json}'
""", capture_stdout=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Shell quoting fragility with inline JSON payload.

The JSON payload is embedded directly in a shell command using single quotes. If the payload ever contains single quotes (e.g., in prompts), the command will break. Current payload is safe but this is fragile for future modifications.

♻️ Safer alternative using heredoc or base64
# Option 1: Use base64 encoding
import base64
payload_b64 = base64.b64encode(payload_json.encode()).decode()
result = run.run(f"""
oc rsh -n {namespace} -c main deploy/{deployment_name} \\
  sh -c 'echo {payload_b64} | base64 -d | curl -k -sSf "{endpoint_url}/v1/completions" \\
  -H "Content-Type: application/json" -d `@-`'
""", capture_stdout=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@projects/llm-d/testing/test_llmd.py` around lines 490 - 495, The inline shell
command building the curl request in the run.run call is fragile because
payload_json may contain single quotes; change the invocation so the JSON is
passed safely via stdin or a heredoc/base64 instead of embedding it inside
single quotes. Locate the run.run(...) that references payload_json, namespace,
deployment_name and endpoint_url and replace the inline '-d '<payload_json>''
usage with a safe transfer (e.g., base64-encode payload_json and decode in the
container piped to curl -d `@-` or use a quoted heredoc like <<'EOF' to feed
curl), ensuring the curl command reads the body from stdin or a file to avoid
shell-quoting issues.

@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 26, 2026

🔴 Test of 'llm-d test test_ci' failed after 01 hours 22 minutes 09 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_0: jump-ci
PR_POSITIONAL_ARG_1: scale_up
PR_POSITIONAL_ARG_2: multi-flavor

Failure indicator:

/tmp/topsail_202602261772136668/004__plots/FAILURE | An error happened during the visualization post-processing ... (0_matbench_parse.log in /tmp/topsail_202602261772136668/004__plots). The test that was processed SUCCEEDED.


@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented Feb 27, 2026

/test ?

1 similar comment
@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 27, 2026

🔴 Test of 'llm-d test generate_plots_from_pr_args' failed after 00 hours 00 minutes 00 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: jump-ci-plot
PR_POSITIONAL_ARG_0: jump-ci-plot
PR_POSITIONAL_ARG_1: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/902/pull-ci-openshift-psap-topsail-main-jump-ci/2027113255020793856/artifacts/ci/005-test/artifacts/test-artifacts/002__llm_d_testing/

Failure indicator: Empty. (See run.log)

@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 27, 2026

🔴 Test of 'llm-d test generate_plots_from_pr_args' failed after 00 hours 00 minutes 01 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: jump-ci-plot
PR_POSITIONAL_ARG_0: jump-ci-plot
PR_POSITIONAL_ARG_1: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/902/pull-ci-openshift-psap-topsail-main-jump-ci/2027113255020793856/artifacts/ci/005-test/artifacts/test-artifacts/002__llm_d_testing/

Failure indicator: Empty. (See run.log)

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Feb 27, 2026

@kpouget: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/jump-ci 084e9ae link true /test jump-ci

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@topsail-bot
Copy link
Copy Markdown

topsail-bot bot commented Feb 27, 2026

🟢 Test of 'llm-d test generate_plots_from_pr_args' succeeded after 00 hours 00 minutes 32 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci-plot
PR_POSITIONAL_ARG_0: jump-ci-plot
PR_POSITIONAL_ARG_1: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift-psap_topsail/902/pull-ci-openshift-psap-topsail-main-jump-ci/2027113255020793856/artifacts/ci/005-test/artifacts/test-artifacts/002__llm_d_testing/

@kpouget kpouget merged commit 0e164eb into openshift-psap:main Feb 27, 2026
9 of 10 checks passed
@kpouget kpouget deleted the llmd branch February 27, 2026 10:57
@coderabbitai coderabbitai bot mentioned this pull request Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant