Skip to content

[SDK] Add group filtering#1595

Merged
prathmeshpatel merged 2 commits intomainfrom
sdk-group-charts
Mar 16, 2026
Merged

[SDK] Add group filtering#1595
prathmeshpatel merged 2 commits intomainfrom
sdk-group-charts

Conversation

@chelojimenez
Copy link
Copy Markdown
Contributor

@chelojimenez chelojimenez commented Mar 12, 2026

Note

Medium Risk
Adds a new tags field to SDK eval reporting and the inspector UI, plus a new tag-aggregation overview with filtering logic; risk is mainly around API/backfill compatibility and UI state/routing changes when tags are missing or not persisted as expected.

Overview
Adds tagging + tag-based grouping for CI eval suites. SDK reporting now accepts/sends tags, and inspector suite types include tags with a new inline TagEditor/TagBadges in the suite header.

Introduces a tag aggregation “Overview” mode. CI eval list view can now show a TagAggregationPanel that groups suites by tag (via groupSuitesByTag), compares pass-rate trends, and lets users filter by tag; the sidebar gains an Overview entry, displays suite tags, and filters suite list by the selected tag.

Written by Cursor Bugbot for commit bf74ac8. This will update automatically on new commits. Configure here.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Mar 12, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 12, 2026

Walkthrough

This pull request introduces tag support to the evaluation system. New types define tag storage on suites and aggregation structures for grouping. The groupSuitesByTag utility buckets suites by their tags and computes per-tag metrics. New components—TagAggregationPanel, TagEditor, and TagBadges—provide UI for viewing tag-grouped suites with trend charts, filtering, and tag management. The sidebar and main tab integrate these features with filtering state and overview navigation. Reporting types extend to carry tags through the SDK pipeline. A new verification script validates local SDK linkage.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx (2)

7-7: Unused import: Minus.

The Minus icon from lucide-react is imported but never referenced in the component.

🧹 Remove unused import
 import {
   ChevronDown,
   ChevronRight,
   TrendingUp,
   TrendingDown,
-  Minus,
 } from "lucide-react";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx` at
line 7, The import for the lucide-react icon named "Minus" in the
TagAggregationPanel component is unused; remove the "Minus" import from the
import list in tag-aggregation-panel.tsx (where icons are imported) so the
component no longer includes an unreferenced symbol.

49-62: getStatusDot is duplicated from ci-suite-list-sidebar.tsx.

This helper mirrors the implementation at lines 28-46 of ci-suite-list-sidebar.tsx. Consider extracting it to helpers.ts to maintain a single source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`
around lines 49 - 62, Extract the duplicated getStatusDot function into a shared
helper module (e.g., create helpers.ts), export it, and update both callers to
import and use that single exported getStatusDot; ensure the helper file exports
or imports the EvalSuiteOverviewEntry type (or a compatible type alias) so
signatures match, and remove the duplicate implementations from
tag-aggregation-panel and the other file (ci-suite-list-sidebar) so there is one
authoritative getStatusDot used by both.
mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx (1)

23-25: Unused prop: onFilterTagChange.

The onFilterTagChange callback is declared in props but never invoked within this component. If filtering is intended to occur here, wire it up; otherwise, remove the dead declaration to reduce interface clutter.

🧹 Remove unused prop if not needed
 interface CiSuiteListSidebarProps {
   suites: EvalSuiteOverviewEntry[];
   selectedSuiteId: string | null;
   onSelectSuite: (suiteId: string) => void;
   onSelectOverview: () => void;
   isOverviewSelected: boolean;
   isLoading?: boolean;
   filterTag?: string | null;
-  onFilterTagChange?: (tag: string | null) => void;
   hasTags: boolean;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx`
around lines 23 - 25, The prop onFilterTagChange is declared on the
CiSuiteListSidebar component but never used; either remove it from the component
props to eliminate dead API surface, or wire it up where tag filtering happens
(e.g., in the tag click/selection handler or the input change handler that
currently updates filterTag/hasTags) by calling onFilterTagChange(selectedTag)
when the user changes the filter; update any local state updates (filterTag) to
also invoke onFilterTagChange if you keep the prop, and add a null-check before
calling (onFilterTagChange && onFilterTagChange(tag)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs`:
- Around line 1-61: Add execution wiring so the verifier runs automatically: add
an npm script entry named e.g. "verify-local-sdk-link" in
mcpjam-inspector/package.json that invokes node
scripts/verify-local-sdk-link.mjs, and update the CI workflow
(.github/workflows/test.yml) to run this npm script as a step early in the test
job (or add a dedicated job) so the check executes in CI; reference the script
file verify-local-sdk-link.mjs and the package.json "scripts" section when
making the changes.

---

Nitpick comments:
In `@mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx`:
- Around line 23-25: The prop onFilterTagChange is declared on the
CiSuiteListSidebar component but never used; either remove it from the component
props to eliminate dead API surface, or wire it up where tag filtering happens
(e.g., in the tag click/selection handler or the input change handler that
currently updates filterTag/hasTags) by calling onFilterTagChange(selectedTag)
when the user changes the filter; update any local state updates (filterTag) to
also invoke onFilterTagChange if you keep the prop, and add a null-check before
calling (onFilterTagChange && onFilterTagChange(tag)).

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`:
- Line 7: The import for the lucide-react icon named "Minus" in the
TagAggregationPanel component is unused; remove the "Minus" import from the
import list in tag-aggregation-panel.tsx (where icons are imported) so the
component no longer includes an unreferenced symbol.
- Around line 49-62: Extract the duplicated getStatusDot function into a shared
helper module (e.g., create helpers.ts), export it, and update both callers to
import and use that single exported getStatusDot; ensure the helper file exports
or imports the EvalSuiteOverviewEntry type (or a compatible type alias) so
signatures match, and remove the duplicate implementations from
tag-aggregation-panel and the other file (ci-suite-list-sidebar) so there is one
authoritative getStatusDot used by both.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1307ab16-8ffd-4f56-b526-22f7d1f3b029

📥 Commits

Reviewing files that changed from the base of the PR and between abf2533 and 69b9b2b.

📒 Files selected for processing (11)
  • mcpjam-inspector/client/src/components/CiEvalsTab.tsx
  • mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx
  • mcpjam-inspector/client/src/components/evals/helpers.ts
  • mcpjam-inspector/client/src/components/evals/suite-header.tsx
  • mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx
  • mcpjam-inspector/client/src/components/evals/tag-editor.tsx
  • mcpjam-inspector/client/src/components/evals/types.ts
  • mcpjam-inspector/client/src/main.tsx
  • mcpjam-inspector/scripts/verify-local-sdk-link.mjs
  • sdk/src/eval-reporting-types.ts
  • sdk/src/report-eval-results.ts

Comment on lines +1 to +61
import fs from "node:fs";
import path from "node:path";

const inspectorDir = process.cwd();
const localSdkLink = path.join(inspectorDir, "node_modules", "@mcpjam", "sdk");
const expectedSdkDir = path.resolve(inspectorDir, "../sdk");

function normalizePath(targetPath) {
return process.platform === "win32"
? targetPath.replace(/\\/g, "/").toLowerCase()
: targetPath;
}

function fail(message) {
console.error(message);
process.exit(1);
}

if (!fs.existsSync(localSdkLink)) {
fail(`Expected linked SDK at ${localSdkLink}, but it does not exist.`);
}

if (!fs.existsSync(expectedSdkDir)) {
fail(
`Expected sibling SDK checkout at ${expectedSdkDir}, but it does not exist.`,
);
}

const linkStats = fs.lstatSync(localSdkLink);
if (!linkStats.isSymbolicLink()) {
fail(
`Expected ${localSdkLink} to be a local npm link/junction to ../sdk, but it is not a symlink.`,
);
}

const resolvedLinkedSdk = fs.realpathSync(localSdkLink);
const resolvedExpectedSdk = fs.realpathSync(expectedSdkDir);

if (normalizePath(resolvedLinkedSdk) !== normalizePath(resolvedExpectedSdk)) {
fail(
`Expected @mcpjam/sdk to resolve to ${resolvedExpectedSdk}, but it resolved to ${resolvedLinkedSdk}.`,
);
}

const sdkPackageJsonPath = path.join(resolvedLinkedSdk, "package.json");
if (!fs.existsSync(sdkPackageJsonPath)) {
fail(
`Expected SDK package manifest at ${sdkPackageJsonPath}, but it does not exist.`,
);
}

const sdkPackage = JSON.parse(fs.readFileSync(sdkPackageJsonPath, "utf8"));
if (sdkPackage.name !== "@mcpjam/sdk") {
fail(
`Expected linked package name to be @mcpjam/sdk, but found ${sdkPackage.name ?? "unknown"}.`,
);
}

console.log(
`Verified repo-local SDK link: ${localSdkLink} -> ${resolvedLinkedSdk}`,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire this verifier into an actual execution path.

This script is solid, but it appears uninvoked: it is not referenced in mcpjam-inspector/package.json scripts or .github/workflows/test.yml. As-is, it won’t protect local/CI flows unless someone runs it manually.

Suggested wiring (outside this file)
# mcpjam-inspector/package.json
"scripts": {
+  "verify:sdk-link": "node scripts/verify-local-sdk-link.mjs",
   ...
}

# .github/workflows/test.yml (if link integrity is required in CI)
+ - name: Verify local SDK link
+   run: npm run verify:sdk-link
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 1 - 61, Add
execution wiring so the verifier runs automatically: add an npm script entry
named e.g. "verify-local-sdk-link" in mcpjam-inspector/package.json that invokes
node scripts/verify-local-sdk-link.mjs, and update the CI workflow
(.github/workflows/test.yml) to run this npm script as a step early in the test
job (or add a dedicated job) so the check executes in CI; reference the script
file verify-local-sdk-link.mjs and the package.json "scripts" section when
making the changes.

@chelojimenez
Copy link
Copy Markdown
Contributor Author

chelojimenez commented Mar 12, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
mcpjam-inspector/scripts/verify-local-sdk-link.mjs (1)

4-6: Fragile working-directory assumption.

process.cwd() presumes invocation from the mcpjam-inspector root. Should the script be executed from the repository root or another directory, every subsequent path check will silently test the wrong locations and emit misleading errors.

Consider deriving the base path from import.meta.url for robustness:

🛠️ Proposed fix
-const inspectorDir = process.cwd();
+import { fileURLToPath } from "node:url";
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const inspectorDir = path.resolve(__dirname, "..");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 4 - 6, The
script assumes process.cwd() is the inspector root which is fragile; replace the
working-directory base with a path derived from the script file location (use
import.meta.url) and recompute inspectorDir, localSdkLink, and expectedSdkDir
from that base so the checks always reference the inspector package location
regardless of where the script is invoked; locate the constants inspectorDir,
localSdkLink, and expectedSdkDir and update their computation to resolve from
import.meta.url (using URL and path utilities) instead of process.cwd().
sdk/src/report-eval-results.ts (1)

470-480: Normalize and dedupe tags before sending them.

The SDK now accepts tag text from arbitrary callers, while the inspector groups stored tags verbatim. Forwarding raw values here means duplicates, whitespace variants, and blank tags can turn into duplicate groups or empty chips downstream. Canonicalizing once before shouldUseOneShotUpload keeps both upload paths aligned.

♻️ Suggested shape
+function normalizeTags(tags?: string[]): string[] | undefined {
+  const normalized = Array.from(
+    new Set((tags ?? []).map((tag) => tag.trim()).filter(Boolean)),
+  );
+  return normalized.length > 0 ? normalized : undefined;
+}
+
 async function reportEvalResultsInternal(
   input: ReportEvalResultsInput
 ): Promise<ReportEvalResultsOutput> {
   if (!input.suiteName || input.suiteName.trim().length === 0) {
     throw new Error("suiteName is required");
@@
 
   const config = createRuntimeConfig(input);
+  const tags = normalizeTags(input.tags);
   const uploadedResults = await uploadWidgetSnapshots(config, input.results);
   const externalRunId = input.externalRunId ?? generateExternalRunId();
   const resultsWithIterationIds = withExternalIterationIds(
     uploadedResults,
     externalRunId
@@
     shouldUseOneShotUpload(
-      { ...input, externalRunId, results: resultsWithIterationIds },
+      { ...input, externalRunId, results: resultsWithIterationIds, tags },
       config
     )
   ) {
@@
-        tags: input.tags,
+        tags,
         results: resultsWithIterationIds,
       }
     );
   }
@@
-    tags: input.tags,
+    tags,
   });

Also applies to: 513-540

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/src/report-eval-results.ts` around lines 470 - 480, The tags array from
input.tags should be normalized and deduplicated before constructing the request
body and before calling shouldUseOneShotUpload; implement a small normalization
step (trim each tag, filter out empty strings, convert to a canonical case or
format as agreed, then dedupe preserving order) and use the resulting
normalizedTags when building the body object (replace input.tags with
normalizedTags) and in the call/site that evaluates shouldUseOneShotUpload so
both upload paths receive the same canonical tag set; update any references to
input.tags within reportEvalResults (or the function that contains the shown
body) to use normalizedTags.
mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx (1)

149-175: Use stable series ids instead of raw tag text to prevent Recharts path parsing issues.

Recharts resolves string dataKey values as lodash-compatible paths, so tags containing dots or brackets (e.g., release.v1, foo[0]) will be interpreted as nested paths rather than literal keys. Using generated series identifiers shields the chart from this gotcha while preserving the original tag as the display label.

🧭 Safer series-key pattern
+  const seriesKeyByTag = useMemo(
+    () => new Map(visibleGroups.map((g, i) => [g.tag, `series_${i}`])),
+    [visibleGroups],
+  );
+
   const multiLineTrendData = useMemo(() => {
     if (!hasTrendData) return [];
     const maxLen = Math.max(
       ...visibleGroups.map((g) => groupTrends.get(g.tag)?.length ?? 0),
     );
     return Array.from({ length: maxLen }, (_, i) => {
       const point: Record<string, string | number> = { index: `#${i + 1}` };
       for (const g of visibleGroups) {
         const trend = groupTrends.get(g.tag) ?? [];
+        const seriesKey = seriesKeyByTag.get(g.tag)!;
         if (i < trend.length) {
-          point[g.tag] = toPercent(trend[i]);
+          point[seriesKey] = toPercent(trend[i]);
         }
       }
       return point;
     });
-  }, [hasTrendData, visibleGroups, groupTrends]);
+  }, [hasTrendData, visibleGroups, groupTrends, seriesKeyByTag]);
 
   const multiLineChartConfig = useMemo(() => {
     const config: Record<string, { label: string; color: string }> = {};
     visibleGroups.forEach((g, i) => {
-      config[g.tag] = {
+      const seriesKey = seriesKeyByTag.get(g.tag)!;
+      config[seriesKey] = {
         label: g.tag,
         color: GROUP_COLORS[i % GROUP_COLORS.length],
       };
     });
     return config;
-  }, [visibleGroups]);
+  }, [visibleGroups, seriesKeyByTag]);
@@
                 {visibleGroups.map((g, i) => (
                   <Area
                     key={g.tag}
                     type="monotone"
-                    dataKey={g.tag}
+                    dataKey={seriesKeyByTag.get(g.tag)!}
+                    name={g.tag}
                     stroke={GROUP_COLORS[i % GROUP_COLORS.length]}
                     fill={GROUP_COLORS[i % GROUP_COLORS.length]}

Also applies to: 392-405

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`
around lines 149 - 175, The chart code currently uses raw tag text as data keys
(in multiLineTrendData and multiLineChartConfig), which breaks when tags contain
dots/brackets; instead generate stable series IDs (e.g., slug or uuid per
visibleGroups entry) and use those IDs as the keys in the point objects and in
the config map while preserving the original tag as the label (e.g., config[id]
= { label: g.tag, color: ... }); update the useMemo that builds
multiLineTrendData to lookup trend arrays by g.tag but assign point[id] =
toPercent(...), and update multiLineChartConfig to use id keys; apply the same
change to the other chart block with the same pattern so Recharts receives
literal keys, not path-like tag strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx`:
- Line 37: The component CiEvalsTab keeps filterTag across workspace/tag changes
causing stale filters; add a useEffect that watches allTags (and/or hasTags) and
calls setFilterTag(null) when the current filterTag is no longer present in
allTags or when hasTags becomes false so the UI recovers; update the same logic
where similar tag state is managed (the other tag-related state at the block
referenced around lines 89-95) to reconcile and clear stale tag values when
available tags change.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs`:
- Around line 52-57: The JSON.parse/fs.readFileSync call that reads
sdkPackageJsonPath can throw a raw SyntaxError; wrap the read+parse of
sdkPackageJsonPath in a try/catch around the existing use of sdkPackage so that
any read or parse error is caught and reported via the existing fail(...)
pattern instead of letting a raw stack trace surface. In the catch, call
fail(...) with a clear message that includes sdkPackageJsonPath and the caught
error.message (and optionally the original error) so downstream logic using
sdkPackage and the fail function remains consistent.

---

Nitpick comments:
In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`:
- Around line 149-175: The chart code currently uses raw tag text as data keys
(in multiLineTrendData and multiLineChartConfig), which breaks when tags contain
dots/brackets; instead generate stable series IDs (e.g., slug or uuid per
visibleGroups entry) and use those IDs as the keys in the point objects and in
the config map while preserving the original tag as the label (e.g., config[id]
= { label: g.tag, color: ... }); update the useMemo that builds
multiLineTrendData to lookup trend arrays by g.tag but assign point[id] =
toPercent(...), and update multiLineChartConfig to use id keys; apply the same
change to the other chart block with the same pattern so Recharts receives
literal keys, not path-like tag strings.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs`:
- Around line 4-6: The script assumes process.cwd() is the inspector root which
is fragile; replace the working-directory base with a path derived from the
script file location (use import.meta.url) and recompute inspectorDir,
localSdkLink, and expectedSdkDir from that base so the checks always reference
the inspector package location regardless of where the script is invoked; locate
the constants inspectorDir, localSdkLink, and expectedSdkDir and update their
computation to resolve from import.meta.url (using URL and path utilities)
instead of process.cwd().

In `@sdk/src/report-eval-results.ts`:
- Around line 470-480: The tags array from input.tags should be normalized and
deduplicated before constructing the request body and before calling
shouldUseOneShotUpload; implement a small normalization step (trim each tag,
filter out empty strings, convert to a canonical case or format as agreed, then
dedupe preserving order) and use the resulting normalizedTags when building the
body object (replace input.tags with normalizedTags) and in the call/site that
evaluates shouldUseOneShotUpload so both upload paths receive the same canonical
tag set; update any references to input.tags within reportEvalResults (or the
function that contains the shown body) to use normalizedTags.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 182500a1-b9ca-4680-97db-99b2cca40b35

📥 Commits

Reviewing files that changed from the base of the PR and between 69b9b2b and 9a0a8f8.

📒 Files selected for processing (10)
  • mcpjam-inspector/client/src/components/CiEvalsTab.tsx
  • mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx
  • mcpjam-inspector/client/src/components/evals/helpers.ts
  • mcpjam-inspector/client/src/components/evals/suite-header.tsx
  • mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx
  • mcpjam-inspector/client/src/components/evals/tag-editor.tsx
  • mcpjam-inspector/client/src/components/evals/types.ts
  • mcpjam-inspector/scripts/verify-local-sdk-link.mjs
  • sdk/src/eval-reporting-types.ts
  • sdk/src/report-eval-results.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • mcpjam-inspector/client/src/components/evals/helpers.ts
  • mcpjam-inspector/client/src/components/evals/tag-editor.tsx
  • mcpjam-inspector/client/src/components/evals/suite-header.tsx


const [deletingSuiteId, setDeletingSuiteId] = useState<string | null>(null);
const [deletingRunId, setDeletingRunId] = useState<string | null>(null);
const [filterTag, setFilterTag] = useState<string | null>(null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear stale tag filters when the available tags change.

filterTag survives workspace switches and tag edits, but it is never reconciled with allTags. If the selected tag disappears, the sidebar keeps filtering by a value the UI no longer exposes, and when hasTags flips false the user is left with an empty list and no control to recover.

🩹 Minimal fix
   const allTags = useMemo(
     () =>
       Array.from(new Set(sdkSuites.flatMap((e) => e.suite.tags ?? []))).sort(),
     [sdkSuites],
   );
+
+  useEffect(() => {
+    if (filterTag && !allTags.includes(filterTag)) {
+      setFilterTag(null);
+    }
+  }, [filterTag, allTags]);

Also applies to: 89-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx` at line 37, The
component CiEvalsTab keeps filterTag across workspace/tag changes causing stale
filters; add a useEffect that watches allTags (and/or hasTags) and calls
setFilterTag(null) when the current filterTag is no longer present in allTags or
when hasTags becomes false so the UI recovers; update the same logic where
similar tag state is managed (the other tag-related state at the block
referenced around lines 89-95) to reconcile and clear stale tag values when
available tags change.

Comment on lines +52 to +57
const sdkPackage = JSON.parse(fs.readFileSync(sdkPackageJsonPath, "utf8"));
if (sdkPackage.name !== "@mcpjam/sdk") {
fail(
`Expected linked package name to be @mcpjam/sdk, but found ${sdkPackage.name ?? "unknown"}.`,
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unguarded JSON parsing may yield ungraceful exit.

A malformed package.json will throw a raw SyntaxError with a stack trace, diverging from the structured fail() pattern employed elsewhere.

🛡️ Proposed fix
-const sdkPackage = JSON.parse(fs.readFileSync(sdkPackageJsonPath, "utf8"));
+let sdkPackage;
+try {
+  sdkPackage = JSON.parse(fs.readFileSync(sdkPackageJsonPath, "utf8"));
+} catch {
+  fail(`Failed to parse SDK manifest at ${sdkPackageJsonPath}.`);
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 52 - 57, The
JSON.parse/fs.readFileSync call that reads sdkPackageJsonPath can throw a raw
SyntaxError; wrap the read+parse of sdkPackageJsonPath in a try/catch around the
existing use of sdkPackage so that any read or parse error is caught and
reported via the existing fail(...) pattern instead of letting a raw stack trace
surface. In the catch, call fail(...) with a clear message that includes
sdkPackageJsonPath and the caught error.message (and optionally the original
error) so downstream logic using sdkPackage and the fail function remains
consistent.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Mar 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk/src/report-eval-results.ts (1)

470-480: ⚠️ Potential issue | 🟠 Major

Canonicalize tags before sending them upstream.

UI-authored tags are trimmed, lowercased, and deduped in mcpjam-inspector/client/src/components/evals/tag-editor.tsx, but programmatic SDK callers hit this path directly. Forwarding input.tags unchanged lets values like Prod, prod, prod, duplicates, or empty strings turn into separate groups and blank chips in the inspector, because grouping is exact-string based.

🧹 Suggested normalization
+function normalizeTags(tags?: string[]): string[] | undefined {
+  const normalized = Array.from(
+    new Set(
+      (tags ?? [])
+        .map((tag) => tag.trim().toLowerCase())
+        .filter(Boolean),
+    ),
+  );
+  return normalized.length > 0 ? normalized : undefined;
+}
...
   const config = createRuntimeConfig(input);
   const uploadedResults = await uploadWidgetSnapshots(config, input.results);
   const externalRunId = input.externalRunId ?? generateExternalRunId();
+  const tags = normalizeTags(input.tags);
...
     shouldUseOneShotUpload(
-      { ...input, externalRunId, results: resultsWithIterationIds },
+      { ...input, tags, externalRunId, results: resultsWithIterationIds },
       config
     )
...
-        tags: input.tags,
+        tags,
...
-    tags: input.tags,
+    tags,

Also applies to: 510-540

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/src/report-eval-results.ts` around lines 470 - 480, Normalize input.tags
before constructing the request body: replace using a small canonicalization
step that maps each tag string via trim(), toLowerCase(), filters out empty
strings, and removes duplicates (e.g., using a Set) and then use that normalized
array when assigning body.tags in report-eval-results.ts (referencing the body
variable and input.tags). Apply the same normalization wherever tags from input
are forwarded in this file (the other request construction block around the
later results handling) so SDK callers send trimmed, lowercased, deduped,
non-empty tags to the server.
♻️ Duplicate comments (2)
mcpjam-inspector/client/src/components/CiEvalsTab.tsx (1)

37-37: ⚠️ Potential issue | 🟠 Major

Clear filterTag when the available tags change.

This state survives workspace switches and tag edits, but it is never reconciled with allTags/hasTags. If the selected tag disappears—or the last tagged suite is untagged—the sidebar stays filtered by a value the UI no longer exposes and can land in an empty state with no way back.

🩹 Minimal fix
   const allTags = useMemo(
     () =>
       Array.from(new Set(sdkSuites.flatMap((e) => e.suite.tags ?? []))).sort(),
     [sdkSuites],
   );
+
+  useEffect(() => {
+    if (filterTag && (!hasTags || !allTags.includes(filterTag))) {
+      setFilterTag(null);
+    }
+  }, [filterTag, hasTags, allTags]);

Also applies to: 89-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx` at line 37, The
filterTag state (filterTag / setFilterTag) can outlive available tags (allTags /
hasTags) and leave the UI filtered by a tag that no longer exists; add
reconciliation logic that clears filterTag when the available tags
change—specifically, add an effect that runs on changes to allTags and hasTags
and calls setFilterTag(null) if filterTag is not present in allTags or if
hasTags becomes false (also ensure the same check is applied where similar state
is used around lines 89-95).
mcpjam-inspector/scripts/verify-local-sdk-link.mjs (1)

52-57: ⚠️ Potential issue | 🟡 Minor

Keep manifest read/parse failures on the existing fail(...) path.

A malformed or unreadable package.json still throws a raw stack trace here instead of the structured failure message used everywhere else.

🛡️ Minimal fix
-const sdkPackage = JSON.parse(fs.readFileSync(sdkPackageJsonPath, "utf8"));
+let sdkPackage;
+try {
+  sdkPackage = JSON.parse(fs.readFileSync(sdkPackageJsonPath, "utf8"));
+} catch (error) {
+  fail(
+    `Failed to read SDK manifest at ${sdkPackageJsonPath}: ${
+      error instanceof Error ? error.message : String(error)
+    }`,
+  );
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 52 - 57, The
code reads and parses the SDK package manifest directly which throws raw errors;
wrap the fs.readFileSync/JSON.parse for sdkPackageJsonPath in a try/catch and on
any error call the existing fail(...) path with a clear message (e.g., "Failed
to read/parse SDK package.json at <sdkPackageJsonPath>: <error.message>")
instead of letting the exception bubble, then proceed to the existing name check
against sdkPackage.name; reference sdkPackageJsonPath, sdkPackage, and the
fail(...) call when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcpjam-inspector/client/src/components/evals/suite-header.tsx`:
- Around line 534-549: The TagEditor updates are overwriting each other because
it uses suite.tags directly and waits for updateSuite to return; fix by
introducing a local optimistic state and an in-flight flag: create local state
(e.g., localTags) initialized from suite.tags and an isUpdating boolean, pass
localTags to TagEditor (and disable edits via a prop or ignore inputs when
isUpdating), update localTags synchronously on user edits, set isUpdating=true,
call updateSuite({ suiteId: suite._id, tags: newTags }), then set
isUpdating=false and reconcile localTags with the returned/authoritative value
(or suite.tags when the query refreshes); alternatively serialize updates by
rejecting new edits while isUpdating is true. Ensure you update references to
TagEditor, updateSuite, and suite.tags accordingly.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`:
- Around line 154-160: The chart data currently uses raw tag strings (g.tag) as
object keys and dataKey values, which Recharts treats as Lodash-style paths and
breaks series with dots/brackets; update the data construction (in the function
that builds points using visibleGroups, groupTrends, toPercent) to assign values
to generated safe keys like series_0, series_1, ... (index derived from the
visibleGroups iteration) instead of g.tag, and create a separate mapping from
those generated keys back to display labels (and use that mapping when rendering
legends/labels and in any places where dataKey is set, replacing dataKey={g.tag}
with dataKey={generatedKey}). Ensure the same change is applied to the other
occurrences noted (the blocks around lines 166-175 and 392-404) so all data
objects and dataKey props use the safe series_<index> identifiers while
retaining g.tag only for display.

In `@mcpjam-inspector/client/src/components/evals/tag-editor.tsx`:
- Around line 77-82: The icon-only tag buttons in the TagEditor component (the
remove button calling handleRemove(tag) and the corresponding add button further
down) lack accessible labels; update those button elements to include
descriptive accessible labels (e.g., aria-label or title) that reference the tag
(for example "Remove tag {tag}" or "Add tag {tag}") so assistive tech can
identify the control—locate the buttons that call handleRemove(tag) and the add
handler (e.g., handleAdd or similar) and add aria-label attributes with the tag
name.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs`:
- Around line 4-6: The script uses process.cwd() to build inspectorDir, which
breaks when run from another working directory; change inspectorDir to the
script's directory by deriving it from import.meta.url (via fileURLToPath and
path.dirname) so localSdkLink and expectedSdkDir are resolved relative to the
script file itself; update the top of the module to import fileURLToPath from
'url' if not already present and replace inspectorDir, localSdkLink, and
expectedSdkDir computations to use that script-dir-based inspectorDir.

---

Outside diff comments:
In `@sdk/src/report-eval-results.ts`:
- Around line 470-480: Normalize input.tags before constructing the request
body: replace using a small canonicalization step that maps each tag string via
trim(), toLowerCase(), filters out empty strings, and removes duplicates (e.g.,
using a Set) and then use that normalized array when assigning body.tags in
report-eval-results.ts (referencing the body variable and input.tags). Apply the
same normalization wherever tags from input are forwarded in this file (the
other request construction block around the later results handling) so SDK
callers send trimmed, lowercased, deduped, non-empty tags to the server.

---

Duplicate comments:
In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx`:
- Line 37: The filterTag state (filterTag / setFilterTag) can outlive available
tags (allTags / hasTags) and leave the UI filtered by a tag that no longer
exists; add reconciliation logic that clears filterTag when the available tags
change—specifically, add an effect that runs on changes to allTags and hasTags
and calls setFilterTag(null) if filterTag is not present in allTags or if
hasTags becomes false (also ensure the same check is applied where similar state
is used around lines 89-95).

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs`:
- Around line 52-57: The code reads and parses the SDK package manifest directly
which throws raw errors; wrap the fs.readFileSync/JSON.parse for
sdkPackageJsonPath in a try/catch and on any error call the existing fail(...)
path with a clear message (e.g., "Failed to read/parse SDK package.json at
<sdkPackageJsonPath>: <error.message>") instead of letting the exception bubble,
then proceed to the existing name check against sdkPackage.name; reference
sdkPackageJsonPath, sdkPackage, and the fail(...) call when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b74be7bb-7e99-4988-a1b1-634470a128e2

📥 Commits

Reviewing files that changed from the base of the PR and between 9a0a8f8 and 50a98ed.

📒 Files selected for processing (12)
  • mcpjam-inspector/client/src/components/CiEvalsTab.tsx
  • mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx
  • mcpjam-inspector/client/src/components/evals/helpers.ts
  • mcpjam-inspector/client/src/components/evals/suite-header.tsx
  • mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx
  • mcpjam-inspector/client/src/components/evals/tag-editor.tsx
  • mcpjam-inspector/client/src/components/evals/types.ts
  • mcpjam-inspector/client/src/hooks/__tests__/use-chat-session.minimal-mode.test.tsx
  • mcpjam-inspector/scripts/verify-local-sdk-link.mjs
  • mcpjam-inspector/server/routes/web/__tests__/chat-v2.guest.test.ts
  • sdk/src/eval-reporting-types.ts
  • sdk/src/report-eval-results.ts
✅ Files skipped from review due to trivial changes (1)
  • mcpjam-inspector/client/src/hooks/tests/use-chat-session.minimal-mode.test.tsx
🚧 Files skipped from review as they are similar to previous changes (2)
  • mcpjam-inspector/client/src/components/evals/helpers.ts
  • mcpjam-inspector/client/src/components/evals/types.ts

Comment on lines +534 to +549
{!readOnlyConfig && (
<TagEditor
tags={suite.tags ?? []}
onTagsChange={async (newTags) => {
try {
await updateSuite({
suiteId: suite._id,
tags: newTags,
});
} catch (error) {
toast.error("Failed to update tags");
console.error("Failed to update tags:", error);
}
}}
/>
)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Serialize tag updates or keep an optimistic local copy.

TagEditor computes the next value from its tags prop, but this handler writes straight to Convex and waits for the suite query to catch up. Two quick edits can therefore be based on stale suite.tags and overwrite each other—for example, removing a tag and immediately adding another can re-send the removed tag. Pass TagEditor an optimistic tag array, or disable edits while an update is in flight.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/suite-header.tsx` around lines
534 - 549, The TagEditor updates are overwriting each other because it uses
suite.tags directly and waits for updateSuite to return; fix by introducing a
local optimistic state and an in-flight flag: create local state (e.g.,
localTags) initialized from suite.tags and an isUpdating boolean, pass localTags
to TagEditor (and disable edits via a prop or ignore inputs when isUpdating),
update localTags synchronously on user edits, set isUpdating=true, call
updateSuite({ suiteId: suite._id, tags: newTags }), then set isUpdating=false
and reconcile localTags with the returned/authoritative value (or suite.tags
when the query refreshes); alternatively serialize updates by rejecting new
edits while isUpdating is true. Ensure you update references to TagEditor,
updateSuite, and suite.tags accordingly.

Comment on lines +154 to +160
return Array.from({ length: maxLen }, (_, i) => {
const point: Record<string, string | number> = { index: `#${i + 1}` };
for (const g of visibleGroups) {
const trend = groupTrends.get(g.tag) ?? [];
if (i < trend.length) {
point[g.tag] = toPercent(trend[i]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "tag-aggregation-panel.tsx" | head -5

Repository: MCPJam/inspector

Length of output: 132


🏁 Script executed:

cat -n mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx | sed -n '150,180p'

Repository: MCPJam/inspector

Length of output: 1335


🏁 Script executed:

cat -n mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx | sed -n '385,410p'

Repository: MCPJam/inspector

Length of output: 1135


🌐 Web query:

Recharts dataKey special characters path expression handling

💡 Result:

In Recharts, dataKey is ultimately resolved by getValueByDataKey(), which:

  • if dataKey is a string/number, calls get(obj, dataKey, defaultValue) (where get is es-toolkit/compat/get) [1]
  • if dataKey is a function, calls dataKey(obj) [1]

Recharts’ public API documents dataKey as string | number | function (so functions are officially supported) [2].

What this means for “special characters” in keys

Because Recharts uses a Lodash-compatible get implementation [3], string dataKeys behave like _.get paths (dot/bracket path syntax) [4]. So if your actual field name contains characters that conflict with path parsing (most commonly .), a plain dataKey="a.b" will be interpreted as “property b inside a”, not a literal key named "a.b".

Recommended ways to handle it

  1. Use a function dataKey (most reliable for literal keys with dots, spaces, brackets, etc.):

    <Line dataKey={(d) => d["a.b"]} />

    (Functions are supported by Recharts’ dataKey type [2] and are handled directly in getValueByDataKey [1].)

  2. If you want to keep string paths, you must use Lodash-style bracket path quoting for “weird” keys (since string paths are parsed like _.get) [4]. Example patterns include a['b.c'] / ['a.b'] (depending on where the key appears).

If you share one concrete example of your data shape + the exact key name (e.g. "cpu.util%", "a.b", "foo[bar]"), I can give the exact dataKey string (if possible) and the function form.

Sources: [1] [2] [3] [4]


Replace raw tag names with safe series identifiers for chart data keys.

dataKey={g.tag} passes user-supplied tag strings directly to Recharts, which interprets them as Lodash-style paths. Tags containing dots (e.g., gpt-4.1) or brackets (e.g., team[alpha]) will be parsed as nested property access, causing those series to silently vanish from the chart. Store data under generated keys like series_0, series_1, etc., and map them back to display labels separately.

Also applies to: 166-175, 392-404

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`
around lines 154 - 160, The chart data currently uses raw tag strings (g.tag) as
object keys and dataKey values, which Recharts treats as Lodash-style paths and
breaks series with dots/brackets; update the data construction (in the function
that builds points using visibleGroups, groupTrends, toPercent) to assign values
to generated safe keys like series_0, series_1, ... (index derived from the
visibleGroups iteration) instead of g.tag, and create a separate mapping from
those generated keys back to display labels (and use that mapping when rendering
legends/labels and in any places where dataKey is set, replacing dataKey={g.tag}
with dataKey={generatedKey}). Ensure the same change is applied to the other
occurrences noted (the blocks around lines 166-175 and 392-404) so all data
objects and dataKey props use the safe series_<index> identifiers while
retaining g.tag only for display.

Comment on lines +77 to +82
<button
onClick={() => handleRemove(tag)}
className="ml-0.5 hover:text-destructive"
>
<X className="h-2.5 w-2.5" />
</button>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Label the icon-only tag actions.

When tags already exist, both add/remove actions are exposed as icon-only buttons, so assistive tech gets unlabeled controls here.

♿ Minimal fix
           <button
+            type="button"
             onClick={() => handleRemove(tag)}
+            aria-label={`Remove tag ${tag}`}
             className="ml-0.5 hover:text-destructive"
           >
             <X className="h-2.5 w-2.5" />
           </button>
...
         <button
+          type="button"
           onClick={() => setIsAdding(true)}
+          aria-label={tags.length === 0 ? "Add tag" : "Add another tag"}
           className="flex items-center gap-0.5 text-[10px] text-muted-foreground hover:text-foreground"
         >

Also applies to: 96-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-editor.tsx` around lines 77
- 82, The icon-only tag buttons in the TagEditor component (the remove button
calling handleRemove(tag) and the corresponding add button further down) lack
accessible labels; update those button elements to include descriptive
accessible labels (e.g., aria-label or title) that reference the tag (for
example "Remove tag {tag}" or "Add tag {tag}") so assistive tech can identify
the control—locate the buttons that call handleRemove(tag) and the add handler
(e.g., handleAdd or similar) and add aria-label attributes with the tag name.

Comment on lines +4 to +6
const inspectorDir = process.cwd();
const localSdkLink = path.join(inspectorDir, "node_modules", "@mcpjam", "sdk");
const expectedSdkDir = path.resolve(inspectorDir, "../sdk");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve the repo paths from the script file, not the current working directory.

This verifier only works when it is launched from mcpjam-inspector/. Running it from the repo root—or from a CI step in another directory—points both paths at the wrong locations and yields a false failure.

🛠️ Minimal fix
 import fs from "node:fs";
 import path from "node:path";
+import { fileURLToPath } from "node:url";
 
-const inspectorDir = process.cwd();
+const scriptDir = path.dirname(fileURLToPath(import.meta.url));
+const inspectorDir = path.resolve(scriptDir, "..");
 const localSdkLink = path.join(inspectorDir, "node_modules", "@mcpjam", "sdk");
 const expectedSdkDir = path.resolve(inspectorDir, "../sdk");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const inspectorDir = process.cwd();
const localSdkLink = path.join(inspectorDir, "node_modules", "@mcpjam", "sdk");
const expectedSdkDir = path.resolve(inspectorDir, "../sdk");
import fs from "node:fs";
import path from "node:path";
import { fileURLToPath } from "node:url";
const scriptDir = path.dirname(fileURLToPath(import.meta.url));
const inspectorDir = path.resolve(scriptDir, "..");
const localSdkLink = path.join(inspectorDir, "node_modules", "@mcpjam", "sdk");
const expectedSdkDir = path.resolve(inspectorDir, "../sdk");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 4 - 6, The
script uses process.cwd() to build inspectorDir, which breaks when run from
another working directory; change inspectorDir to the script's directory by
deriving it from import.meta.url (via fileURLToPath and path.dirname) so
localSdkLink and expectedSdkDir are resolved relative to the script file itself;
update the top of the module to import fileURLToPath from 'url' if not already
present and replace inspectorDir, localSdkLink, and expectedSdkDir computations
to use that script-dir-based inspectorDir.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Mar 16, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
mcpjam-inspector/scripts/verify-local-sdk-link.mjs (3)

52-52: ⚠️ Potential issue | 🟡 Minor

Guard manifest read/parse with fail(...) for consistent exits.

Line 52 can throw a raw exception (readFileSync/JSON.parse) and bypass your structured failure path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` at line 52, Wrap the
manifest read and parse for sdkPackage in a try/catch and route any errors to
the existing fail(...) path so we get a consistent exit; specifically surround
the readFileSync/JSON.parse of sdkPackageJsonPath (the sdkPackage assignment)
with a try block and in catch call fail(...) with a helpful message and the
caught error (or stringify it) so raw throws don't bypass your structured
failure handling.

4-6: ⚠️ Potential issue | 🟠 Major

Resolve paths from the script location, not cwd.

Line 4 makes this check caller-directory dependent. Running from repo root (or CI in another working dir) can fail incorrectly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 4 - 6, The
script uses process.cwd() (inspectorDir) making localSdkLink and expectedSdkDir
caller-dependent; change inspectorDir to the script's directory by deriving it
from import.meta.url (e.g., using fileURLToPath(import.meta.url) and
path.dirname) and then compute localSdkLink and expectedSdkDir relative to that
inspectorDir so the checks are deterministic regardless of the current working
directory; update references to inspectorDir, localSdkLink, and expectedSdkDir
accordingly.

1-61: ⚠️ Potential issue | 🟠 Major

Wire this verifier into npm scripts/CI or it won’t protect workflows.

The file is solid, but as currently integrated it appears manual-only, so regressions can slip through unattended runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs` around lines 1 - 61, Add
the verifier script into project automation so it runs automatically: add a
package.json script entry named something like "verify-local-sdk-link" that runs
the scripts/verify-local-sdk-link.mjs file (so developers can run npm run
verify-local-sdk-link) and then invoke that script from your CI workflow (e.g.,
add a step in .github/workflows/ci.yml to run npm ci and npm run
verify-local-sdk-link) and/or add it to relevant prepublish/prepack or pretest
scripts in package.json to ensure the check blocks regressions; reference the
script file name verify-local-sdk-link.mjs, the package.json scripts section,
and the CI workflow (e.g., .github/workflows/ci.yml) when making the changes.
mcpjam-inspector/client/src/components/CiEvalsTab.tsx (1)

37-37: ⚠️ Potential issue | 🟠 Major

Reconcile filterTag when available tags change.

filterTag survives data/workspace changes but is never validated against allTags, so stale filters can strand the UI in an empty state.

🧭 Minimal fix
   const allTags = useMemo(
     () =>
       Array.from(new Set(sdkSuites.flatMap((e) => e.suite.tags ?? []))).sort(),
     [sdkSuites],
   );
+
+  useEffect(() => {
+    if (filterTag && !allTags.includes(filterTag)) {
+      setFilterTag(null);
+    }
+  }, [filterTag, allTags]);

Also applies to: 89-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx` at line 37, filterTag
can become stale when available tags (allTags) change; add a useEffect in the
CiEvalsTab component that watches allTags (or the tags state used for the
dropdown) and reconciles filterTag by calling setFilterTag(null) or
setFilterTag(toValidTag) when the current filterTag is not present in allTags.
Reference the filterTag and setFilterTag state variables and the allTags (or
tags list used around lines 89-95) to locate where to add this validation and
update so the UI does not remain empty after workspace/data changes.
mcpjam-inspector/client/src/components/evals/tag-editor.tsx (1)

77-82: ⚠️ Potential issue | 🟠 Major

Add accessible labels (and explicit button types) to icon-only tag actions.

At Line 77 and Line 96, these controls are unlabeled for assistive tech when rendered as icon-only. Please add aria-label and type="button".

♿ Minimal fix
           <button
+            type="button"
             onClick={() => handleRemove(tag)}
+            aria-label={`Remove tag ${tag}`}
             className="ml-0.5 hover:text-destructive"
           >
             <X className="h-2.5 w-2.5" />
           </button>
...
         <button
+          type="button"
           onClick={() => setIsAdding(true)}
+          aria-label={tags.length === 0 ? "Add tag" : "Add another tag"}
           className="flex items-center gap-0.5 text-[10px] text-muted-foreground hover:text-foreground"
         >

Also applies to: 96-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-editor.tsx` around lines 77
- 82, The icon-only tag action buttons (the remove button that calls
handleRemove(tag) and the similar edit/remove controls further down) lack
accessible labels and explicit types; update each icon-only <button> (e.g., the
one rendering <X className=... /> and the corresponding edit/remove buttons) to
include type="button" and an appropriate aria-label (for example
aria-label={`Remove tag ${tag}`} and aria-label={`Edit tag ${tag}`}) so screen
readers can describe the action and the buttons behave correctly.
mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx (1)

154-160: ⚠️ Potential issue | 🟠 Major

Use synthetic series keys for Recharts dataKey instead of raw tag text.

At Line 159/396, user tag values are used directly as object keys and dataKey. Tags containing path-like characters can break series resolution in Recharts.

🛠️ Safe-key pattern
+ const seriesKeyByTag = new Map(
+   visibleGroups.map((g, i) => [g.tag, `series_${i}`]),
+ );

  return Array.from({ length: maxLen }, (_, i) => {
    const point: Record<string, string | number> = { index: `#${i + 1}` };
    for (const g of visibleGroups) {
      const trend = groupTrends.get(g.tag) ?? [];
      if (i < trend.length) {
-       point[g.tag] = toPercent(trend[i]);
+       point[seriesKeyByTag.get(g.tag)!] = toPercent(trend[i]);
      }
    }
    return point;
  });

- config[g.tag] = { label: g.tag, color: GROUP_COLORS[i % GROUP_COLORS.length] };
+ const key = seriesKeyByTag.get(g.tag)!;
+ config[key] = { label: g.tag, color: GROUP_COLORS[i % GROUP_COLORS.length] };

- <Area dataKey={g.tag} ... />
+ <Area dataKey={seriesKeyByTag.get(g.tag)!} ... />
In Recharts 2.15.4, are string `dataKey` values resolved as object paths (e.g., dot/bracket notation), and is using function or synthetic keys recommended for literal keys containing `.` or `[]`?

Also applies to: 166-175, 392-404

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`
around lines 154 - 160, The code builds chart points using raw tag text as
object keys (see tag-aggregation-panel.tsx: visibleGroups, groupTrends,
toPercent, maxLen) which can break Recharts since dataKey strings may be
interpreted as object paths; create a stable synthetic key map (e.g., const
safeKey = `tag_${index}` or an encoded version of g.tag) for each visible group
and use that synthetic key as the property name when building point
(point[safeKey] = toPercent(...)) and as the Recharts dataKey, while preserving
the original g.tag in a parallel map/array for legends, tooltips and labels so
you can resolve back to the literal tag when rendering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mcpjam-inspector/client/src/components/evals/helpers.ts`:
- Around line 252-255: In the bucket-building logic where you iterate over
`tags` and push `entry` into `buckets` (the loop using `for (const tag of tags)`
with `buckets.get(tag)` and `buckets.set(tag, bucket)`), deduplicate the
per-suite `tags` array before iterating so a suite with duplicate tag values is
only counted once; e.g., produce a unique set/array of tags for the current
`entry` and iterate that instead. Apply the same deduplication change to the
other occurrence that starts at the block using `tags` around lines 271-273 to
prevent inflated `suiteCount` and totals. Ensure you still use the same
`buckets`, `tag`, and `entry` variables so the rest of the logic is unchanged.

---

Duplicate comments:
In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx`:
- Line 37: filterTag can become stale when available tags (allTags) change; add
a useEffect in the CiEvalsTab component that watches allTags (or the tags state
used for the dropdown) and reconciles filterTag by calling setFilterTag(null) or
setFilterTag(toValidTag) when the current filterTag is not present in allTags.
Reference the filterTag and setFilterTag state variables and the allTags (or
tags list used around lines 89-95) to locate where to add this validation and
update so the UI does not remain empty after workspace/data changes.

In `@mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx`:
- Around line 154-160: The code builds chart points using raw tag text as object
keys (see tag-aggregation-panel.tsx: visibleGroups, groupTrends, toPercent,
maxLen) which can break Recharts since dataKey strings may be interpreted as
object paths; create a stable synthetic key map (e.g., const safeKey =
`tag_${index}` or an encoded version of g.tag) for each visible group and use
that synthetic key as the property name when building point (point[safeKey] =
toPercent(...)) and as the Recharts dataKey, while preserving the original g.tag
in a parallel map/array for legends, tooltips and labels so you can resolve back
to the literal tag when rendering.

In `@mcpjam-inspector/client/src/components/evals/tag-editor.tsx`:
- Around line 77-82: The icon-only tag action buttons (the remove button that
calls handleRemove(tag) and the similar edit/remove controls further down) lack
accessible labels and explicit types; update each icon-only <button> (e.g., the
one rendering <X className=... /> and the corresponding edit/remove buttons) to
include type="button" and an appropriate aria-label (for example
aria-label={`Remove tag ${tag}`} and aria-label={`Edit tag ${tag}`}) so screen
readers can describe the action and the buttons behave correctly.

In `@mcpjam-inspector/scripts/verify-local-sdk-link.mjs`:
- Line 52: Wrap the manifest read and parse for sdkPackage in a try/catch and
route any errors to the existing fail(...) path so we get a consistent exit;
specifically surround the readFileSync/JSON.parse of sdkPackageJsonPath (the
sdkPackage assignment) with a try block and in catch call fail(...) with a
helpful message and the caught error (or stringify it) so raw throws don't
bypass your structured failure handling.
- Around line 4-6: The script uses process.cwd() (inspectorDir) making
localSdkLink and expectedSdkDir caller-dependent; change inspectorDir to the
script's directory by deriving it from import.meta.url (e.g., using
fileURLToPath(import.meta.url) and path.dirname) and then compute localSdkLink
and expectedSdkDir relative to that inspectorDir so the checks are deterministic
regardless of the current working directory; update references to inspectorDir,
localSdkLink, and expectedSdkDir accordingly.
- Around line 1-61: Add the verifier script into project automation so it runs
automatically: add a package.json script entry named something like
"verify-local-sdk-link" that runs the scripts/verify-local-sdk-link.mjs file (so
developers can run npm run verify-local-sdk-link) and then invoke that script
from your CI workflow (e.g., add a step in .github/workflows/ci.yml to run npm
ci and npm run verify-local-sdk-link) and/or add it to relevant
prepublish/prepack or pretest scripts in package.json to ensure the check blocks
regressions; reference the script file name verify-local-sdk-link.mjs, the
package.json scripts section, and the CI workflow (e.g.,
.github/workflows/ci.yml) when making the changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c733974d-0819-473f-a4d0-5d478dd3c28a

📥 Commits

Reviewing files that changed from the base of the PR and between 50a98ed and 6907244.

📒 Files selected for processing (10)
  • mcpjam-inspector/client/src/components/CiEvalsTab.tsx
  • mcpjam-inspector/client/src/components/evals/ci-suite-list-sidebar.tsx
  • mcpjam-inspector/client/src/components/evals/helpers.ts
  • mcpjam-inspector/client/src/components/evals/suite-header.tsx
  • mcpjam-inspector/client/src/components/evals/tag-aggregation-panel.tsx
  • mcpjam-inspector/client/src/components/evals/tag-editor.tsx
  • mcpjam-inspector/client/src/components/evals/types.ts
  • mcpjam-inspector/scripts/verify-local-sdk-link.mjs
  • sdk/src/eval-reporting-types.ts
  • sdk/src/report-eval-results.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • mcpjam-inspector/client/src/components/evals/suite-header.tsx
  • sdk/src/eval-reporting-types.ts

Comment on lines +252 to +255
for (const tag of tags) {
const bucket = buckets.get(tag) ?? [];
bucket.push(entry);
buckets.set(tag, bucket);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deduplicate per-suite tags before bucketing to prevent inflated aggregates.

At Line 252, iterating raw tags can double-count a suite when duplicate tag values exist in payloads, which inflates suiteCount and totals.

🩹 Minimal fix
-      for (const tag of tags) {
+      for (const tag of new Set(tags)) {
         const bucket = buckets.get(tag) ?? [];
         bucket.push(entry);
         buckets.set(tag, bucket);
       }

Also applies to: 271-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/helpers.ts` around lines 252 -
255, In the bucket-building logic where you iterate over `tags` and push `entry`
into `buckets` (the loop using `for (const tag of tags)` with `buckets.get(tag)`
and `buckets.set(tag, bucket)`), deduplicate the per-suite `tags` array before
iterating so a suite with duplicate tag values is only counted once; e.g.,
produce a unique set/array of tags for the current `entry` and iterate that
instead. Apply the same deduplication change to the other occurrence that starts
at the block using `tags` around lines 271-273 to prevent inflated `suiteCount`
and totals. Ensure you still use the same `buckets`, `tag`, and `entry`
variables so the rest of the logic is unchanged.

@prathmeshpatel prathmeshpatel merged commit f7c6a0b into main Mar 16, 2026
8 checks passed
@prathmeshpatel prathmeshpatel deleted the sdk-group-charts branch March 16, 2026 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants