Skip to content

Evals CI/CD UX overview improves pt. 1#1602

Merged
prathmeshpatel merged 4 commits intomainfrom
evals-ux-improves-1
Mar 16, 2026
Merged

Evals CI/CD UX overview improves pt. 1#1602
prathmeshpatel merged 4 commits intomainfrom
evals-ux-improves-1

Conversation

@prathmeshpatel
Copy link
Copy Markdown
Collaborator

@prathmeshpatel prathmeshpatel commented Mar 13, 2026

Summary

Enhanced Sidebar

  • Added status labels below status dots in the suite list sidebar
  • Added suite count display in the sidebar header
  • Added failure count badge on the Overview button when suites have failed runs
  • Improved status information display with color-coded labels

Overview Panel Enhancements

  • Renamed TagAggregationPanel to OverviewPanel and updated all references
  • Added color-coded card borders based on pass rates (green for ≥95%, amber for ≥75%, red for <75%)
  • Enhanced "No data" state handling with alert icons and appropriate messaging
  • Improved trend indicators with color coding (green for positive, red for negative trends)
  • Added search functionality for filtering suites in the breakdown section
  • Enhanced progress bars with color coding matching pass rate thresholds

Screenshot 2026-03-13 at 12.38.57 AM.png

@chelojimenez
Copy link
Copy Markdown
Contributor

chelojimenez commented Mar 13, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@prathmeshpatel prathmeshpatel changed the title basic improves, status labels, suite count, progress bars, no data Evals CI/CD UX overview improves Mar 13, 2026
@prathmeshpatel prathmeshpatel changed the title Evals CI/CD UX overview improves Evals CI/CD UX overview improves pt. 1 Mar 13, 2026
@railway-app railway-app bot temporarily deployed to triumphant-alignment / staging March 13, 2026 21:46 Inactive
@railway-app railway-app bot temporarily deployed to triumphant-alignment / staging March 14, 2026 07:18 Inactive
@railway-app railway-app bot temporarily deployed to triumphant-alignment / staging-prathmesh March 14, 2026 07:21 Inactive
@prathmeshpatel prathmeshpatel marked this pull request as ready for review March 16, 2026 08:47
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 16, 2026
Base automatically changed from sdk-group-charts to main March 16, 2026 08:48
@dosubot dosubot bot added the enhancement New feature or request label Mar 16, 2026
@graphite-app
Copy link
Copy Markdown

graphite-app bot commented Mar 16, 2026

Merge activity

  • Mar 16, 8:50 AM UTC: Graphite rebased this pull request, because this pull request is set to merge when ready.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 16, 2026

Walkthrough

This pull request restructures the evaluation suite display interface by introducing a new OverviewPanel component that consolidates suite data visualization. The changes replace the previous TagAggregationPanel with the new component, enhance status indicator functionality in the sidebar, and update the tag aggregation panel with improved data visibility logic and search filtering capabilities. The refactoring extends the props and styling system to support labelClass fields and adds UI elements for filtering, searching, and displaying suite health metrics.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx (1)

206-206: ⚠️ Potential issue | 🔴 Critical

Critical: useState called after early return violates Rules of Hooks.

The useState declaration on line 220 occurs after the early return on line 206. React requires hooks to be called unconditionally and in the same order on every render. When tagGroups.length transitions from 0 to non-zero, this will cause runtime errors.

🐛 Required fix: Move state declaration before early return
  const multiLineChartConfig = useMemo(() => {
    const config: Record<string, { label: string; color: string }> = {};
    visibleGroups.forEach((g, i) => {
      config[g.tag] = {
        label: g.tag,
        color: GROUP_COLORS[i % GROUP_COLORS.length],
      };
    });
    return config;
  }, [visibleGroups]);

  // Single-tag trend data (for AccuracyChart)
  const singleTagTrendData = useMemo(() => {
    if (visibleGroups.length !== 1) return [];
    const trend = groupTrends.get(visibleGroups[0].tag) ?? [];
    return trend.map((value, i) => ({
      runIdDisplay: `#${i + 1}`,
      passRate: toPercent(value),
    }));
  }, [visibleGroups, groupTrends]);

  // Bar chart fallback data — exclude groups with no data
  const passRateBarData = useMemo(
    () =>
      visibleGroups
        .filter(
          (g) => g.totals.passed + g.totals.failed > 0 || g.totals.runs > 0,
        )
        .map((g) => ({
          tag: g.tag,
          passRate: g.passRate,
          suiteCount: g.suiteCount,
          passed: g.totals.passed,
          failed: g.totals.failed,
        })),
    [visibleGroups],
  );

+ // Search state for suite breakdown
+ const [suiteSearch, setSuiteSearch] = useState("");
+
  if (tagGroups.length === 0) return null;

  const toggleTag = (tag: string) => {
    // ...
  };

  const isMultiGroup = visibleGroups.length > 1;

- // Search state for suite breakdown
- const [suiteSearch, setSuiteSearch] = useState("");

Also applies to: 219-220

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx` at
line 206, The early return "if (tagGroups.length === 0) return null;" causes
hooks declared later (the useState on/around lines referencing tagGroups state)
to be conditionally invoked; move the useState hook(s) (the hook that declares
state for selectedTagGroup / any setSelectedTagGroup or similar used below) to
before this early return so all React hooks (useState in this component) are
called unconditionally and in the same order on every render; ensure any derived
values that depend on tagGroups remain computed after the hook declarations and
adjust the early return to occur only after hooks are declared.
🧹 Nitpick comments (4)
mcpjam-inspector/client/src/components/CiEvalsTab.tsx (1)

264-267: Hardcoding hasTags={true} may mask edge cases.

The isOverviewSelected now solely depends on !selectedSuiteId, and hasTags is hardcoded to true. This ensures the Overview button always renders, but the actual hasTags variable (line 90) still exists and is computed. If the intent is to always display Overview, consider removing the prop or documenting this decision.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx` around lines 264 -
267, The Overview button prop hasTags is hardcoded to true which hides the
computed hasTags variable defined earlier; update the usage in CiEvalsTab to
either pass the computed hasTags (replace hasTags={true} with hasTags={hasTags})
or remove the hasTags prop entirely if the Overview should always render, and
ensure isOverviewSelected still uses !selectedSuiteId and other props
(isLoading=queries.isOverviewLoading, filterTag) remain unchanged.
mcpjam-inspector/client/src/components/evals/overview-panel.tsx (2)

56-77: Sparkline height calculation can be simplified.

The expression (toPercent(value) / 100) * 100 is mathematically equivalent to toPercent(value). Since toPercent already returns a value in the 0-100 range, this simplifies readability.

♻️ Suggested simplification
  <div
    key={idx}
    className="w-1.5 rounded-sm bg-primary/70"
    style={{
-     height: `${Math.max(3, (toPercent(value) / 100) * 100)}%`,
+     height: `${Math.max(3, toPercent(value))}%`,
    }}
  />
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/overview-panel.tsx` around lines
56 - 77, The Sparkline component's bar height uses an unnecessary calculation:
replace the style height expression in Sparkline (currently Math.max(3,
(toPercent(value) / 100) * 100)) with Math.max(3, toPercent(value)) to simplify
and make intent clearer; update the inline style in the mapped div inside
function Sparkline to use toPercent(value) directly.

602-611: Table column header "St" is cryptic.

Consider using "Status" or a status icon for the column header instead of the abbreviation "St", which may confuse users.

♻️ Suggested change
- <div>St</div>
+ <div className="sr-only">Status</div>

Or simply use a small icon or tooltip to convey meaning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/overview-panel.tsx` around lines
602 - 611, Replace the cryptic header label "St" in the header row (the JSX div
that currently contains the text "St" inside the grid header block) with a
clearer affordance: use "Status" as the visible label or swap it for a small
status icon with an accessible tooltip/title attribute ("Status") so screen
readers and hover users understand the column purpose; update the corresponding
JSX element in the overview panel header row accordingly and ensure any
CSS/className (the grid header div) still lays out correctly with the new text
or icon.
mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx (1)

134-143: Consider extracting the IIFE into a useMemo for clarity.

While the immediately-invoked function expression works, extracting the failure count computation into a memoized value would improve readability and align with React's declarative patterns.

♻️ Suggested refactor
+ const failCount = useMemo(
+   () => suites.filter((e) => e.latestRun?.result === "failed").length,
+   [suites]
+ );

  // Then in JSX:
- {(() => {
-   const failCount = suites.filter(
-     (e) => e.latestRun?.result === "failed",
-   ).length;
-   return failCount > 0 ? (
-     <span className="shrink-0 flex h-5 min-w-[20px] items-center justify-center rounded-full bg-destructive px-1.5 text-[10px] font-bold text-destructive-foreground">
-       {failCount}
-     </span>
-   ) : null;
- })()}
+ {failCount > 0 && (
+   <span className="shrink-0 flex h-5 min-w-[20px] items-center justify-center rounded-full bg-destructive px-1.5 text-[10px] font-bold text-destructive-foreground">
+     {failCount}
+   </span>
+ )}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx`
around lines 134 - 143, Extract the inline IIFE that computes failCount into a
memoized value using React's useMemo: create const failCount = useMemo(() =>
suites.filter(e => e.latestRun?.result === "failed").length, [suites]) (ensure
useMemo is imported), then render the badge conditionally with {failCount > 0 &&
<span ...>{failCount}</span>} instead of the IIFE; this improves readability and
prevents re-computation on every render.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`:
- Line 206: The early return "if (tagGroups.length === 0) return null;" causes
hooks declared later (the useState on/around lines referencing tagGroups state)
to be conditionally invoked; move the useState hook(s) (the hook that declares
state for selectedTagGroup / any setSelectedTagGroup or similar used below) to
before this early return so all React hooks (useState in this component) are
called unconditionally and in the same order on every render; ensure any derived
values that depend on tagGroups remain computed after the hook declarations and
adjust the early return to occur only after hooks are declared.

---

Nitpick comments:
In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx`:
- Around line 264-267: The Overview button prop hasTags is hardcoded to true
which hides the computed hasTags variable defined earlier; update the usage in
CiEvalsTab to either pass the computed hasTags (replace hasTags={true} with
hasTags={hasTags}) or remove the hasTags prop entirely if the Overview should
always render, and ensure isOverviewSelected still uses !selectedSuiteId and
other props (isLoading=queries.isOverviewLoading, filterTag) remain unchanged.

In `@mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx`:
- Around line 134-143: Extract the inline IIFE that computes failCount into a
memoized value using React's useMemo: create const failCount = useMemo(() =>
suites.filter(e => e.latestRun?.result === "failed").length, [suites]) (ensure
useMemo is imported), then render the badge conditionally with {failCount > 0 &&
<span ...>{failCount}</span>} instead of the IIFE; this improves readability and
prevents re-computation on every render.

In `@mcpjam-inspector/client/src/components/evals/overview-panel.tsx`:
- Around line 56-77: The Sparkline component's bar height uses an unnecessary
calculation: replace the style height expression in Sparkline (currently
Math.max(3, (toPercent(value) / 100) * 100)) with Math.max(3, toPercent(value))
to simplify and make intent clearer; update the inline style in the mapped div
inside function Sparkline to use toPercent(value) directly.
- Around line 602-611: Replace the cryptic header label "St" in the header row
(the JSX div that currently contains the text "St" inside the grid header block)
with a clearer affordance: use "Status" as the visible label or swap it for a
small status icon with an accessible tooltip/title attribute ("Status") so
screen readers and hover users understand the column purpose; update the
corresponding JSX element in the overview panel header row accordingly and
ensure any CSS/className (the grid header div) still lays out correctly with the
new text or icon.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2ec55c23-77ba-4aed-a250-9da6c97533b6

📥 Commits

Reviewing files that changed from the base of the PR and between f7c6a0b and eed35b3.

📒 Files selected for processing (4)
  • mcpjam-inspector/client/src/components/CiEvalsTab.tsx
  • mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx
  • mcpjam-inspector/client/src/components/evals/overview-panel.tsx
  • mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx

@prathmeshpatel prathmeshpatel merged commit 778405b into main Mar 16, 2026
5 checks passed
@prathmeshpatel prathmeshpatel deleted the evals-ux-improves-1 branch March 16, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants