Skip to content

Conversation

@mldangelo
Copy link
Member

Summary

Add the ability to duplicate evaluations with all results, configuration, prompts, and relationships via the UI menu.

Changes

Backend

  • New Eval.copy() method - Deep copies eval with batched result copying (1000 rows per batch) to handle large evaluations
  • API endpoint - POST /api/eval/:id/copy with Zod validation and proper error handling
  • Relationship copying - Preserves all relationships for prompts, tags, and datasets
  • Transaction-based - Ensures atomicity with rollback on failure

Frontend

  • Reusable dialog component - ConfirmEvalNameDialog handles both copy and rename operations
  • Size warnings - Alerts users when copying large evaluations (>10K results)
  • Copy menu item - Added to eval actions menu with ContentCopy icon
  • UX improvements - Opens copied eval in new tab (Google Docs pattern), keyboard shortcuts (Enter/Esc), auto-focus and text selection

Implementation Details

Batching Strategy:

  • Copies results in 1,000-row batches to prevent memory exhaustion
  • Handles large evals efficiently with progress logging

Deduplication:

  • Prompts, tags, and datasets are shared resources (deduplicated)
  • Copy operation relinks via junction tables rather than duplicating

Component Reusability:

  • Replaced separate EditEvalNameDialog with flexible ConfirmEvalNameDialog
  • Single component handles both copy and rename use cases

Files Changed

  • src/models/eval.ts - Added copy() method
  • src/server/apiSchemas.ts - Added Copy schemas
  • src/server/routes/eval.ts - Added POST endpoint
  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx - New reusable dialog
  • src/app/src/pages/eval/components/ResultsView.tsx - UI integration
  • docs/plans/2025-10-31-eval-copy-design.md - Implementation plan
  • CHANGELOG.md - Documented feature
  • Removed: EditEvalNameDialog.tsx and its test (replaced with flexible component)

Testing

  • ✅ TypeScript compilation passes
  • ✅ Linting passes
  • ✅ Formatting applied
  • ✅ Sanitized logging throughout

Future Work

  • Backend tests for Eval.copy() method
  • Frontend tests for ConfirmEvalNameDialog component

- Add EditEvalNameDialog component with proper MUI styling
- Replace window.prompt() with modern dialog interface
- Add comprehensive test coverage (14 tests)
- Improve UX with loading states, error handling, and keyboard support
- Validate input before saving (trim whitespace, prevent empty names)
Add the ability to duplicate evaluations with all results, configuration, prompts, and relationships. Implements backend copy method with batched result copying (1000 rows per batch) to handle large evals, API endpoint for copy operations, and reusable frontend dialog component that handles both copy and rename operations with size warnings for large evaluations.

Backend changes include new Eval.copy() method with transaction-based atomicity, POST /api/eval/:id/copy endpoint with Zod validation, and proper relationship copying for prompts, tags, and datasets.

Frontend changes include flexible ConfirmEvalNameDialog component with keyboard shortcuts and loading states, Copy menu item in ResultsView, and integration that opens copied eval in new tab following Google Docs pattern.
@mldangelo mldangelo requested a review from Copilot October 31, 2025 12:01
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 139 to 142
onClick={handleConfirm}
variant="contained"
disabled={isLoading || !name.trim() || (name.trim() === currentName && !showSizeWarning)}
startIcon={isLoading ? <CircularProgress size={20} /> : null}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enable copy button when using default name

The confirm dialog disables the primary button whenever the entered name equals currentName unless showSizeWarning is true. In the copy flow the dialog is opened with currentName prefilled to ${description} (Copy) and showSizeWarning is only set for very large evals, so for the common case (<10k results) the button stays disabled and the user cannot create a copy unless they change the name. This makes the new "Copy" action effectively unusable for most evaluations.

Useful? React with 👍 / 👎.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the ability to copy/duplicate evaluations in the OSS promptfoo application, adapting the feature from the cloud implementation. Users can now copy evaluations with all results, configuration, prompts, and relationships through a new UI menu option and API endpoint.

  • Implements a backend Eval.copy() method with batched result copying to handle large evaluations efficiently
  • Adds a POST /api/eval/:id/copy API endpoint with Zod schema validation
  • Creates a reusable ConfirmEvalNameDialog component for both copy and rename operations
  • Integrates the copy functionality into the ResultsView UI with size warnings for large evaluations

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/models/eval.ts Adds copy() method with batched copying of results and relationships within a transaction
src/server/routes/eval.ts Implements POST endpoint for eval copying with validation and error handling
src/server/apiSchemas.ts Defines Zod schemas for copy API params, request, and response
src/app/src/pages/eval/components/ResultsView.tsx Integrates copy functionality into UI menu and replaces window.prompt() with dialog component
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx New reusable dialog component for both copy and rename operations with size warnings
docs/plans/2025-10-31-eval-copy-design.md Comprehensive design document detailing implementation approach
CHANGELOG.md Documents new eval copy feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a new evaluation copy feature spanning frontend and backend layers. It includes a design document, a new React dialog component (ConfirmEvalNameDialog) for confirming copy/rename operations, UI integration in ResultsView with menu options and handlers, a backend model method (Eval.copy) that deep clones evaluation data and relationships using batched result copying, and corresponding API schema and POST endpoint definitions for the copy operation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • ConfirmEvalNameDialog component: Validation logic, size warning thresholds (10000 and 50000 items), Enter key handling without Shift modifier, and async error handling flow
  • Eval.copy() method: Transaction handling for relationships, batched result copying logic (chunks of 1000), result ID/timestamp remapping, and progress logging
  • ResultsView integration: Dialog state management for both edit name and copy operations, error propagation from async handlers, and proper cleanup
  • API error handling: Distinction between Zod validation errors (400) and generic failures (500), and logging of copy operations

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "feat(webui): add eval copy functionality" directly and accurately summarizes the main change in the changeset. The title uses conventional commit formatting (feat scope), is concise and clear, and specifically identifies the primary feature addition—eval copy capability in the WebUI. A developer scanning the git history would immediately understand this PR adds evaluation duplication functionality through the UI menu. The title avoids vagueness, noise, and generic phrasing.
Description Check ✅ Passed The pull request description is directly related to the changeset and provides meaningful information about the implementation. It clearly explains the feature (duplicating evaluations with all results, configuration, prompts, and relationships via the UI menu), covers both backend and frontend changes, describes the implementation strategy with specific details about batching, deduplication, and component reusability, and lists affected files. The description is comprehensive and addresses the core functionality demonstrated in the raw summary across all modified components.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/eval-copy

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
CHANGELOG.md (1)

11-11: Bring entry to guidelines: add PR number, use “eval(s)”, and include API/core bullets.

  • Add PR number (#6079).
  • Prefer “eval(s)” over “evaluations” for consistency.
  • This PR also adds a public POST endpoint and model method; add concise Added bullets for api/core.

Suggested edits:

- - feat(webui): add eval copy functionality to duplicate evaluations with all results, configuration, and relationships via UI menu
+ - feat(webui): add Copy action to Results menu to duplicate evals (deep‑copies results, config, prompts, tags, datasets) (#6079)
+ - feat(api): add POST /api/eval/:id/copy to duplicate evals atomically with batched result copying (#6079)
+ - feat(core): add Eval.copy() to deep‑copy eval data and relationships with progress logging (#6079)
src/server/apiSchemas.ts (1)

65-76: Consider adding validation constraints to the id parameter.

The Copy schema is clean, but the Params.id field lacks validation constraints. For consistency with MetadataKeys.Params (line 56), consider adding length constraints:

 Copy: {
   Params: z.object({
-    id: z.string(),
+    id: z.string().min(3).max(128),
   }),
   Request: z.object({
     description: z.string().optional(),
   }),

This prevents edge cases with empty or excessively long IDs at the validation layer.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between baf9d23 and cd94ac7.

📒 Files selected for processing (7)
  • CHANGELOG.md (1 hunks)
  • docs/plans/2025-10-31-eval-copy-design.md (1 hunks)
  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx (1 hunks)
  • src/app/src/pages/eval/components/ResultsView.tsx (7 hunks)
  • src/models/eval.ts (1 hunks)
  • src/server/apiSchemas.ts (1 hunks)
  • src/server/routes/eval.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
src/server/**/*.ts

📄 CodeRabbit inference engine (src/server/CLAUDE.md)

src/server/**/*.ts: Sanitize all logged request/response data. Do not stringify or interpolate req/res directly; pass structured objects to the logger so sensitive fields are auto-redacted.
Use Drizzle ORM (with schema from src/database/schema.ts and helpers like eq) for database access instead of raw SQL.
Implement the server using Express 5 APIs for HTTP handling.

Files:

  • src/server/routes/eval.ts
  • src/server/apiSchemas.ts
src/server/routes/**/*.ts

📄 CodeRabbit inference engine (src/server/CLAUDE.md)

src/server/routes/**/*.ts: Always validate request bodies with Zod schemas before processing route handlers.
Wrap all HTTP responses in the standard ApiResponse shape: { success, data? } on success and { success: false, error } on failure.
Use try/catch in route handlers; log errors and return 400 for validation errors and 500 for unexpected errors.
Use appropriate HTTP status codes (200, 201, 400, 404, 500) for API responses.
Organize API endpoint handlers in src/server/routes (e.g., routes/eval.ts, routes/config.ts, routes/results.ts, routes/share.ts).

Files:

  • src/server/routes/eval.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; reuse existing interfaces where possible

**/*.{ts,tsx}: Maintain consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let and avoid var
Use object shorthand syntax when possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

**/*.{ts,tsx}: Use TypeScript with strict type checking enabled
Follow consistent import order (Biome will sort imports)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object property shorthand when possible
Use async/await for asynchronous code instead of raw promises/callbacks
When logging, pass sensitive data via the logger context object so it is auto-sanitized; avoid interpolating secrets into message strings
Manually sanitize sensitive objects with sanitizeObject before storing or emitting outside logging contexts

Files:

  • src/server/routes/eval.ts
  • src/server/apiSchemas.ts
  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/models/eval.ts
  • src/app/src/pages/eval/components/ResultsView.tsx
src/**

📄 CodeRabbit inference engine (AGENTS.md)

Place core application/library logic under src/

Files:

  • src/server/routes/eval.ts
  • src/server/apiSchemas.ts
  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/models/eval.ts
  • src/app/src/pages/eval/components/ResultsView.tsx
src/server/apiSchemas.ts

📄 CodeRabbit inference engine (src/server/CLAUDE.md)

Define and maintain request/response Zod schemas in src/server/apiSchemas.ts and import them in routes.

Files:

  • src/server/apiSchemas.ts
src/app/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (src/app/CLAUDE.md)

src/app/src/**/*.{ts,tsx}: Never use fetch() directly; always use callApi() from @app/utils/api for all HTTP requests
Access Zustand state outside React components via store.getState(); do not call hooks outside components
Use the @app/* path alias for internal imports as configured in Vite

Files:

  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/app/src/pages/eval/components/ResultsView.tsx
src/app/src/{components,pages}/**/*.tsx

📄 CodeRabbit inference engine (src/app/CLAUDE.md)

src/app/src/{components,pages}/**/*.tsx: Use the class-based ErrorBoundary component (@app/components/ErrorBoundary) to wrap error-prone UI
Access theme via useTheme() from @mui/material/styles instead of hardcoding theme values
Use useMemo/useCallback only when profiling indicates benefit; avoid unnecessary memoization
Implement explicit loading and error states for components performing async operations
Prefer MUI composition and the sx prop for styling over ad-hoc inline styles

Files:

  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/app/src/pages/eval/components/ResultsView.tsx
**/*.{tsx,jsx}

📄 CodeRabbit inference engine (.cursor/rules/react-components.mdc)

**/*.{tsx,jsx}: Use icons from @mui/icons-material
Prefer commonly used icons from @mui/icons-material for intuitive experience

Files:

  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/app/src/pages/eval/components/ResultsView.tsx
src/app/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

In React app code under src/app, use callApi from @app/utils/api for all API requests; do not call fetch() directly

Files:

  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/app/src/pages/eval/components/ResultsView.tsx
src/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

React hooks: use useMemo for computed values (non-callables) and useCallback for stable function references (callables)

Files:

  • src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx
  • src/app/src/pages/eval/components/ResultsView.tsx
CHANGELOG.md

📄 CodeRabbit inference engine (.cursor/rules/changelog.mdc)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Each changelog entry must include the PR number formatted as (#1234) or temporary placeholder (#XXXX)
Each changelog entry must use a Conventional Commit prefix: feat:, fix:, chore:, docs:, test:, or refactor:
Each changelog entry must be concise and on a single line
Each changelog entry must be user-focused, describing what changed and why it matters to users
Each changelog entry must include a scope in parentheses, e.g., feat(providers): or fix(evaluator):
Use common scopes for consistency: providers, evaluator, webui or app, cli, redteam, core, assertions, config, database
Place all dependency updates under the Dependencies category
Place all test changes under the Tests category
Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features
After a PR number is assigned, replace (#XXXX) placeholders with the actual PR number
Be specific, use active voice, include context, and avoid repeating the PR title in changelog entries
Group related changes with multiple bullets in the same category when needed; use one entry per logical change

CHANGELOG.md: All user-facing changes require a CHANGELOG.md entry before creating a PR
Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Each changelog entry must include PR number (#1234) or placeholder (#XXXX)
Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Changelog entries must include the PR number in format (#1234)
Use conventional commit prefixes in changelog entries: feat:,...

Files:

  • CHANGELOG.md
🧠 Learnings (13)
📓 Common learnings
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.mdx : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
📚 Learning: 2025-10-05T17:00:16.553Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:16.553Z
Learning: Applies to src/server/routes/**/*.ts : Organize API endpoint handlers in src/server/routes (e.g., routes/eval.ts, routes/config.ts, routes/results.ts, routes/share.ts).

Applied to files:

  • src/server/routes/eval.ts
📚 Learning: 2025-10-05T17:00:16.553Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:16.553Z
Learning: Applies to src/server/routes/**/*.ts : Use try/catch in route handlers; log errors and return 400 for validation errors and 500 for unexpected errors.

Applied to files:

  • src/server/routes/eval.ts
📚 Learning: 2025-10-05T17:00:16.553Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:16.553Z
Learning: Applies to src/server/apiSchemas.ts : Define and maintain request/response Zod schemas in src/server/apiSchemas.ts and import them in routes.

Applied to files:

  • src/server/routes/eval.ts
  • src/server/apiSchemas.ts
📚 Learning: 2025-10-05T16:55:26.262Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: site/docs/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:55:26.262Z
Learning: Applies to site/docs/**/*.{md,mdx} : Use the term "eval" not "evaluation" in documentation and examples

Applied to files:

  • docs/plans/2025-10-31-eval-copy-design.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.

Applied to files:

  • docs/plans/2025-10-31-eval-copy-design.md
  • src/models/eval.ts
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.

Applied to files:

  • docs/plans/2025-10-31-eval-copy-design.md
📚 Learning: 2025-10-06T15:44:51.431Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/react-components.mdc:0-0
Timestamp: 2025-10-06T15:44:51.431Z
Learning: Applies to **/*.{tsx,jsx} : Use icons from mui/icons-material

Applied to files:

  • src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-06T15:44:51.431Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/react-components.mdc:0-0
Timestamp: 2025-10-06T15:44:51.431Z
Learning: Applies to **/*.{tsx,jsx} : Prefer commonly used icons from mui/icons-material for intuitive experience

Applied to files:

  • src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-27T08:53:44.103Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to src/app/**/*.{ts,tsx,js,jsx} : In React app code under src/app, use callApi from app/utils/api for all API requests; do not call fetch() directly

Applied to files:

  • src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-06T03:43:01.653Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-06T03:43:01.653Z
Learning: Applies to src/app/**/*.{ts,tsx} : In the React app (src/app), use callApi from app/utils/api for all API calls instead of fetch()

Applied to files:

  • src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-05T16:56:39.114Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:56:39.114Z
Learning: Applies to src/app/src/{components,pages}/**/*.tsx : Access theme via useTheme() from mui/material/styles instead of hardcoding theme values

Applied to files:

  • src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-05T16:56:39.114Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:56:39.114Z
Learning: Applies to src/app/src/**/*.{ts,tsx} : Never use fetch() directly; always use callApi() from app/utils/api for all HTTP requests

Applied to files:

  • src/app/src/pages/eval/components/ResultsView.tsx
🧬 Code graph analysis (3)
src/server/routes/eval.ts (2)
src/server/apiSchemas.ts (1)
  • ApiSchemas (5-78)
src/models/eval.ts (1)
  • Eval (166-1110)
src/models/eval.ts (2)
src/globalConfig/accounts.ts (1)
  • getUserEmail (35-38)
src/database/tables.ts (6)
  • evalsTable (56-77)
  • evalsToPromptsTable (156-171)
  • tagsTable (41-52)
  • evalsToTagsTable (177-192)
  • evalsToDatasetsTable (223-240)
  • evalResultsTable (79-154)
src/app/src/pages/eval/components/ResultsView.tsx (1)
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx (1)
  • ConfirmEvalNameDialog (28-149)
🪛 LanguageTool
docs/plans/2025-10-31-eval-copy-design.md

[style] ~145-~145: ‘New Records’ might be wordy. Consider a shorter alternative.
Context: ... ## What Gets Copied? ### ✅ Must Copy (New Records with New IDs) 1. evalsTable record...

(EN_WORDINESS_PREMIUM_NEW_RECORDS)


[style] ~380-~380: Consider a different adjective to strengthen your wording.
Context: ... - ✅ Handled (check existence) 6. Deep mutation issues - Use `structuredC...

(DEEP_PROFOUND)


[style] ~702-~702: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...shold** - Show warning at 10K results, "very large" at 50K 5. ✅ Open in new tab - Yes ...

(EN_WEAK_ADJECTIVE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: CodeQL analysis (javascript-typescript)
  • GitHub Check: Agent
  • GitHub Check: Tusk Tester
  • GitHub Check: Redteam (Production API)
  • GitHub Check: Build Docs
  • GitHub Check: webui tests
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Redteam (Staging API)
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Share Test
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Build on Node 20.x
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (14)
src/server/routes/eval.ts (1)

610-655: LGTM! Well-structured copy endpoint.

The implementation follows best practices:

  • Zod validation for params and request body
  • Appropriate status codes (201, 400, 404, 500)
  • Structured logging without sensitive data
  • Clean error handling with type-specific responses
src/models/eval.ts (4)

943-968: LGTM! Clean setup with proper deep cloning.

The method signature and initialization logic are well-designed:

  • structuredClone() prevents mutation issues with nested objects
  • Default description pattern "${description} (Copy)" is user-friendly
  • Logging includes useful context (source/target IDs, distinctTestCount)
  • getUserEmail() integration follows OSS patterns

970-984: LGTM! Eval record creation follows established patterns.

The new eval record is properly initialized:

  • All required fields are set with appropriate values
  • Uses sanitizeRuntimeOptions() to prevent serialization issues
  • Empty results object is correct for v4 architecture
  • Mirrors the pattern from Eval.create() method

986-1048: LGTM! Relationship copying with proper deduplication.

The transaction-based relationship copying is well-designed:

  • Wraps all operations for atomicity (rollback on failure)
  • Correctly relinks to existing shared resources (prompts, tags, datasets)
  • onConflictDoNothing() handles deduplication elegantly
  • Follows the same pattern as Eval.create() for consistency

1050-1101: LGTM! Efficient batched copying with progress tracking.

The batched results copying is well-implemented:

  • BATCH_SIZE = 1000 prevents memory exhaustion on large evals
  • Stable ordering (orderBy(evalResultsTable.id)) ensures consistent pagination
  • New UUIDs and timestamps for each copied result
  • Progress logging aids debugging without overwhelming logs
  • Returns properly hydrated Eval instance via findById()

All operations within the transaction ensure atomicity.

src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx (4)

13-40: LGTM! Well-designed reusable interface.

The component props are thoughtfully structured:

  • Clear separation of required vs. optional props
  • onConfirm returns Promise<void> for proper async error handling
  • Size warning props (showSizeWarning, itemCount, itemLabel) are optional, making the component flexible for both copy and rename use cases
  • Default itemLabel = 'items' provides sensible fallback

41-61: LGTM! Proper state management and focus handling.

The state and effects are well-implemented:

  • Appropriate state for tracking name, loading, and errors
  • Derived state (isLargeOperation, isVeryLargeOperation) keeps logic clean
  • Reset on open ensures clean state for each dialog session
  • setTimeout(100) allows dialog to render before focusing
  • inputRef.current?.select() provides excellent UX by selecting all text

63-93: LGTM! Robust handler logic with good UX.

The event handlers are well-implemented:

  • Validation prevents empty/whitespace-only inputs
  • Short-circuit for unchanged names (rename case) avoids unnecessary API calls
  • Proper async error handling with user-friendly error messages
  • event.key === 'Enter' && !event.shiftKey correctly excludes Shift+Enter
  • Loading state prevents double-submission

95-149: LGTM! Clean and accessible UI implementation.

The render logic is well-structured:

  • Conditional size warning with appropriate severity levels (info at 10k, warning at 50k)
  • TextField properly wired with error states, helper text, and disabled state
  • Button disabled logic correctly handles all cases: isLoading || !name.trim() || (name.trim() === currentName && !showSizeWarning)
  • Loading indicator on button provides clear visual feedback
  • MUI components used correctly throughout
src/app/src/pages/eval/components/ResultsView.tsx (4)

9-9: LGTM! Clean imports and state management.

The new imports and state follow project conventions:

  • ContentCopyIcon from @mui/icons-material (as per guidelines)
  • Import uses @app/* path alias correctly
  • State variables clearly named for their purpose

Also applies to: 39-39, 342-343


391-408: LGTM! Clean implementation following project patterns.

The rename handler is well-implemented:

  • Uses callApi() from @app/utils/api (as per guidelines)
  • invariant ensures config is loaded before update
  • PATCH method with proper headers and body
  • Throws error for dialog to catch and display
  • Updates local state with setConfig() to keep UI in sync

410-432: LGTM! Copy handler follows best practices.

The copy handler is well-designed:

  • Uses callApi() correctly (as per guidelines)
  • POST method with proper JSON payload
  • window.open() opens copy in new tab (Google Docs pattern)
  • Success toast with distinctTestCount.toLocaleString() provides clear feedback
  • Error handling throws for dialog component to display

793-798: LGTM! Clean UI integration with proper wiring.

The menu and dialog integration is well-executed:

  • Menu items include helpful tooltips for discoverability
  • ContentCopyIcon provides clear visual cue for copy action
  • Dialog props correctly configured for each use case:
    • Edit name: simple dialog without size warning
    • Copy: includes showSizeWarning and itemCount={totalResultsCount}
  • Both handlers properly wired to respective dialogs

Also applies to: 826-838, 904-924

docs/plans/2025-10-31-eval-copy-design.md (1)

1-730: LGTM! Comprehensive and well-aligned design document.

The design document is excellent:

  • Clear database architecture analysis with all relevant tables
  • Implementation strategy matches the actual code in the PR
  • API design section aligns with apiSchemas.ts and route implementation
  • Frontend UI design matches the ConfirmEvalNameDialog component
  • Testing strategy provides clear guidance for future test additions
  • Proper use of "eval" terminology (per coding guidelines)

This document will serve as valuable reference for maintainers.

itemLabel?: string;
}

export const ConfirmEvalNameDialog = ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The props make it seem like this dialog is designed to be generic (e.g. accepting name and actionButtonText) but it's unclear when and for what it might be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConfirmEvalNameDialog is intentionally designed to be reusable for both rename and copy operations

} catch (error) {
console.error('Failed to update table:', error);
}
const handleSaveEvalName = async (newName: string) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to wrap this function in useCallback to ensure the invariant does not receive stale data.

};

const handleCopyEval = async (description: string) => {
invariant(evalId, 'Eval ID must be set before copying');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above re: wrapping in useCallback.

});

if (!response.ok) {
throw new Error('Failed to update eval name');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show an error toast.

});

if (!response.ok) {
throw new Error('Failed to copy evaluation');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show an error toast.

actionButtonText="Save"
onConfirm={handleSaveEvalName}
/>
<ConfirmEvalNameDialog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a name confirmation dialog being used for copying?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-used the component. the re-name flow was bad and this also makes it nicer.

onClose={() => setCopyDialogOpen(false)}
title="Copy Evaluation"
label="Description"
currentName={`${config?.description || 'Evaluation'} (Copy)`}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inclusion of "(Copy)" is confusing here since this is current, original name.

const db = getDb();

// Create the new eval record first
db.insert(evalsTable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this insertion occur prior to the transaction?


const response = ApiSchemas.Eval.Copy.Response.parse({
id: newEval.id,
distinctTestCount,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this included?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for user feedback in the success toast: "Copied 1,234 results successfully". May not be necessary / worth the complexity.


logger.error('Failed to copy eval', {
error,
evalId: req.params.id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should be consistent with the above i.e. use id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to use req.params.id in error handler at src/server/routes/eval.ts:651 because the id variable is scoped inside the try block (from Zod parse).

Copy link
Contributor

@will-holley will-holley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good aside from some slop logic. Did not test locally.

Fixed 2 critical bugs identified in self-review:

1. **Atomicity violation** (src/models/eval.ts): Moved eval record
   creation inside transaction to prevent orphaned records if copy
   fails partway through. Previously, eval was created outside the
   transaction, causing data integrity issues on failure.

2. **Dialog logic bug** (ConfirmEvalNameDialog.tsx): Fixed early return
   condition that prevented copying small evals (<10K results) when
   using default name. Changed from checking `!showSizeWarning` to
   checking `itemCount === undefined` to properly distinguish rename
   vs copy operations.

3. **Performance optimization** (eval.ts, routes/eval.ts): Removed
   duplicate `getResultsCount()` query by passing count as optional
   parameter to `copy()` method.

See docs/plans/2025-10-31-eval-copy-critical-review.md for full analysis.
- Add useCallback wrappers for handleSaveEvalName and handleCopyEval to prevent unnecessary re-renders
- Add error toast notifications for failed copy/rename operations
- Fix Date.now() inefficiency in Eval.copy() batch loop by calling once per batch instead of per result
- Replace deprecated onKeyPress with onKeyDown in ConfirmEvalNameDialog
- Fix logging consistency to use 'id' variable instead of 'req.params.id' in error handler
Use more conventional 'Copy of [name]' pattern instead of '[name] (Copy)'
to match user expectations (e.g., Google Docs pattern)
…e]' pattern"

This reverts commit d764496.

The '[name] (Copy)' pattern is preferred.
Add 40 tests covering:
- Rendering with various props and states
- Size warnings for large/very large operations
- Input validation (empty, whitespace, valid input)
- Rename mode vs copy mode behavior
- Loading states and async operations
- Error handling and display
- Keyboard interactions (Enter, Shift+Enter)
- Dialog lifecycle and state management
- Cancel button behavior
- Edge cases (boundary values)
Rewrote tests to focus on:
- Component rendering and props
- Size warning display logic
- Button states based on props
- Edge case thresholds (10K, 50K)

Removed tests that relied on fireEvent for user interaction, which
were causing timeouts in CI due to MUI's complex async behavior.
Tests now run reliably in ~300ms instead of timing out.
Applied Biome formatter to collapse multi-line statements.
@use-tusk
Copy link
Contributor

use-tusk bot commented Oct 31, 2025

❌ Generated 21 tests - 19 passed, 2 failed (281369b) View tests ↗

Test Summary

  • ConfirmEvalNameDialog - 14 ✅, 2 ❌
  • ResultsView - 5 ✅

Results

Tusk's tests show solid coverage of the eval copy feature with 19 passing tests across two components. The ConfirmEvalNameDialog component has strong test coverage for core workflows—input validation, async handling, error recovery, and keyboard shortcuts all pass. However, two critical failures indicate state management issues: the dialog's isLoading state doesn't reset when reopened, and error messages don't clear when users modify input after an error. The ResultsView tests all pass, validating the integration points like size warnings for large evaluations (>10K results), API response parsing, and menu/dialog interactions. Overall, the feature is well-tested at the integration level, but the dialog component has state lifecycle bugs that need fixing before this is production-ready.

Key Points

  • Dialog loading state persists across reopen - When a user closes the dialog during a confirm operation and reopens it, isLoading remains true, leaving the confirm button stuck in a disabled state. This breaks the UX for rapid successive operations.

  • Error messages don't clear on input change - After a failed confirmation, error text persists even when the user modifies the input field. Users can't see that they've recovered from the error, creating confusion and poor error recovery UX.

  • Size warning works correctly - Large evaluations (>10K results) properly trigger a warning dialog, preventing users from accidentally copying massive datasets without awareness.

  • API integration solid - The copy endpoint correctly returns id and distinctTestCount, which are properly used to open the new eval in a new tab and display success feedback.

  • Core dialog interactions pass - Input trimming, Enter key confirmation, loading state UI feedback, and async sequencing all work as expected. The component handles the happy path and most error cases well.


View check history

Commit Status Output Created (UTC)
cd94ac7 ❌ Generated 22 tests - 21 passed, 1 failed Tests Oct 31, 2025 12:01PM
14f1c9f ⏩ Skipped due to new commit on branch Output Oct 31, 2025 2:59PM
9e53811 ⏩ Skipped due to new commit on branch Output Oct 31, 2025 3:01PM
0b9ba7c ⏩ Skipped due to new commit on branch Output Oct 31, 2025 3:03PM
0ffca23 ⏩ Skipped due to new commit on branch Output Oct 31, 2025 3:17PM
06f2c12 ⏩ Skipped due to new commit on branch Output Oct 31, 2025 3:22PM
f1b11cb ⏩ Skipped due to new commit on branch Output Oct 31, 2025 3:28PM
d764496 🔄 Running Tusk Tester Output Oct 31, 2025 3:38PM
2e6f784 ⏩ Skipped due to new commit on branch Output Oct 31, 2025 4:18PM
ea3457e ⏩ No tests generated Output Oct 31, 2025 4:28PM
b8e45e4 ❌ Generated 22 tests - 21 passed, 1 failed Tests Oct 31, 2025 4:36PM
e3488e4 ⏩ Skipped due to new commit on branch Output Oct 31, 2025 7:42PM
a8185dd ⏩ Skipped due to new commit on branch Output Oct 31, 2025 7:47PM
281369b ❌ Generated 21 tests - 19 passed, 2 failed Tests Oct 31, 2025 8:00PM

View output in GitHub ↗

Was Tusk helpful? Give feedback by reacting with 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants