-
-
Notifications
You must be signed in to change notification settings - Fork 755
feat(webui): add eval copy functionality #6079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add EditEvalNameDialog component with proper MUI styling - Replace window.prompt() with modern dialog interface - Add comprehensive test coverage (14 tests) - Improve UX with loading states, error handling, and keyboard support - Validate input before saving (trim whitespace, prevent empty names)
Add the ability to duplicate evaluations with all results, configuration, prompts, and relationships. Implements backend copy method with batched result copying (1000 rows per batch) to handle large evals, API endpoint for copy operations, and reusable frontend dialog component that handles both copy and rename operations with size warnings for large evaluations. Backend changes include new Eval.copy() method with transaction-based atomicity, POST /api/eval/:id/copy endpoint with Zod validation, and proper relationship copying for prompts, tags, and datasets. Frontend changes include flexible ConfirmEvalNameDialog component with keyboard shortcuts and loading states, Copy menu item in ResultsView, and integration that opens copied eval in new tab following Google Docs pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| onClick={handleConfirm} | ||
| variant="contained" | ||
| disabled={isLoading || !name.trim() || (name.trim() === currentName && !showSizeWarning)} | ||
| startIcon={isLoading ? <CircularProgress size={20} /> : null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable copy button when using default name
The confirm dialog disables the primary button whenever the entered name equals currentName unless showSizeWarning is true. In the copy flow the dialog is opened with currentName prefilled to ${description} (Copy) and showSizeWarning is only set for very large evals, so for the common case (<10k results) the button stays disabled and the user cannot create a copy unless they change the name. This makes the new "Copy" action effectively unusable for most evaluations.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds the ability to copy/duplicate evaluations in the OSS promptfoo application, adapting the feature from the cloud implementation. Users can now copy evaluations with all results, configuration, prompts, and relationships through a new UI menu option and API endpoint.
- Implements a backend
Eval.copy()method with batched result copying to handle large evaluations efficiently - Adds a
POST /api/eval/:id/copyAPI endpoint with Zod schema validation - Creates a reusable
ConfirmEvalNameDialogcomponent for both copy and rename operations - Integrates the copy functionality into the ResultsView UI with size warnings for large evaluations
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/models/eval.ts | Adds copy() method with batched copying of results and relationships within a transaction |
| src/server/routes/eval.ts | Implements POST endpoint for eval copying with validation and error handling |
| src/server/apiSchemas.ts | Defines Zod schemas for copy API params, request, and response |
| src/app/src/pages/eval/components/ResultsView.tsx | Integrates copy functionality into UI menu and replaces window.prompt() with dialog component |
| src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx | New reusable dialog component for both copy and rename operations with size warnings |
| docs/plans/2025-10-31-eval-copy-design.md | Comprehensive design document detailing implementation approach |
| CHANGELOG.md | Documents new eval copy feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📝 WalkthroughWalkthroughThis pull request introduces a new evaluation copy feature spanning frontend and backend layers. It includes a design document, a new React dialog component (ConfirmEvalNameDialog) for confirming copy/rename operations, UI integration in ResultsView with menu options and handlers, a backend model method (Eval.copy) that deep clones evaluation data and relationships using batched result copying, and corresponding API schema and POST endpoint definitions for the copy operation. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
CHANGELOG.md (1)
11-11: Bring entry to guidelines: add PR number, use “eval(s)”, and include API/core bullets.
- Add PR number (#6079).
- Prefer “eval(s)” over “evaluations” for consistency.
- This PR also adds a public POST endpoint and model method; add concise Added bullets for api/core.
Suggested edits:
- - feat(webui): add eval copy functionality to duplicate evaluations with all results, configuration, and relationships via UI menu + - feat(webui): add Copy action to Results menu to duplicate evals (deep‑copies results, config, prompts, tags, datasets) (#6079) + - feat(api): add POST /api/eval/:id/copy to duplicate evals atomically with batched result copying (#6079) + - feat(core): add Eval.copy() to deep‑copy eval data and relationships with progress logging (#6079)src/server/apiSchemas.ts (1)
65-76: Consider adding validation constraints to the id parameter.The Copy schema is clean, but the
Params.idfield lacks validation constraints. For consistency withMetadataKeys.Params(line 56), consider adding length constraints:Copy: { Params: z.object({ - id: z.string(), + id: z.string().min(3).max(128), }), Request: z.object({ description: z.string().optional(), }),This prevents edge cases with empty or excessively long IDs at the validation layer.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CHANGELOG.md(1 hunks)docs/plans/2025-10-31-eval-copy-design.md(1 hunks)src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx(1 hunks)src/app/src/pages/eval/components/ResultsView.tsx(7 hunks)src/models/eval.ts(1 hunks)src/server/apiSchemas.ts(1 hunks)src/server/routes/eval.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
src/server/**/*.ts
📄 CodeRabbit inference engine (src/server/CLAUDE.md)
src/server/**/*.ts: Sanitize all logged request/response data. Do not stringify or interpolate req/res directly; pass structured objects to the logger so sensitive fields are auto-redacted.
Use Drizzle ORM (with schema from src/database/schema.ts and helpers like eq) for database access instead of raw SQL.
Implement the server using Express 5 APIs for HTTP handling.
Files:
src/server/routes/eval.tssrc/server/apiSchemas.ts
src/server/routes/**/*.ts
📄 CodeRabbit inference engine (src/server/CLAUDE.md)
src/server/routes/**/*.ts: Always validate request bodies with Zod schemas before processing route handlers.
Wrap all HTTP responses in the standard ApiResponse shape: { success, data? } on success and { success: false, error } on failure.
Use try/catch in route handlers; log errors and return 400 for validation errors and 500 for unexpected errors.
Use appropriate HTTP status codes (200, 201, 400, 404, 500) for API responses.
Organize API endpoint handlers in src/server/routes (e.g., routes/eval.ts, routes/config.ts, routes/results.ts, routes/share.ts).
Files:
src/server/routes/eval.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)
Prefer not to introduce new TypeScript types; reuse existing interfaces where possible
**/*.{ts,tsx}: Maintain consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let and avoid var
Use object shorthand syntax when possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks
**/*.{ts,tsx}: Use TypeScript with strict type checking enabled
Follow consistent import order (Biome will sort imports)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object property shorthand when possible
Use async/await for asynchronous code instead of raw promises/callbacks
When logging, pass sensitive data via the logger context object so it is auto-sanitized; avoid interpolating secrets into message strings
Manually sanitize sensitive objects with sanitizeObject before storing or emitting outside logging contexts
Files:
src/server/routes/eval.tssrc/server/apiSchemas.tssrc/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/models/eval.tssrc/app/src/pages/eval/components/ResultsView.tsx
src/**
📄 CodeRabbit inference engine (AGENTS.md)
Place core application/library logic under src/
Files:
src/server/routes/eval.tssrc/server/apiSchemas.tssrc/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/models/eval.tssrc/app/src/pages/eval/components/ResultsView.tsx
src/server/apiSchemas.ts
📄 CodeRabbit inference engine (src/server/CLAUDE.md)
Define and maintain request/response Zod schemas in src/server/apiSchemas.ts and import them in routes.
Files:
src/server/apiSchemas.ts
src/app/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (src/app/CLAUDE.md)
src/app/src/**/*.{ts,tsx}: Never use fetch() directly; always use callApi() from @app/utils/api for all HTTP requests
Access Zustand state outside React components via store.getState(); do not call hooks outside components
Use the @app/* path alias for internal imports as configured in Vite
Files:
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/app/src/pages/eval/components/ResultsView.tsx
src/app/src/{components,pages}/**/*.tsx
📄 CodeRabbit inference engine (src/app/CLAUDE.md)
src/app/src/{components,pages}/**/*.tsx: Use the class-based ErrorBoundary component (@app/components/ErrorBoundary) to wrap error-prone UI
Access theme via useTheme() from @mui/material/styles instead of hardcoding theme values
Use useMemo/useCallback only when profiling indicates benefit; avoid unnecessary memoization
Implement explicit loading and error states for components performing async operations
Prefer MUI composition and the sx prop for styling over ad-hoc inline styles
Files:
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/app/src/pages/eval/components/ResultsView.tsx
**/*.{tsx,jsx}
📄 CodeRabbit inference engine (.cursor/rules/react-components.mdc)
**/*.{tsx,jsx}: Use icons from @mui/icons-material
Prefer commonly used icons from @mui/icons-material for intuitive experience
Files:
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/app/src/pages/eval/components/ResultsView.tsx
src/app/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
In React app code under src/app, use callApi from @app/utils/api for all API requests; do not call fetch() directly
Files:
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/app/src/pages/eval/components/ResultsView.tsx
src/app/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
React hooks: use useMemo for computed values (non-callables) and useCallback for stable function references (callables)
Files:
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsxsrc/app/src/pages/eval/components/ResultsView.tsx
CHANGELOG.md
📄 CodeRabbit inference engine (.cursor/rules/changelog.mdc)
CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Each changelog entry must include the PR number formatted as (#1234) or temporary placeholder (#XXXX)
Each changelog entry must use a Conventional Commit prefix: feat:, fix:, chore:, docs:, test:, or refactor:
Each changelog entry must be concise and on a single line
Each changelog entry must be user-focused, describing what changed and why it matters to users
Each changelog entry must include a scope in parentheses, e.g., feat(providers): or fix(evaluator):
Use common scopes for consistency: providers, evaluator, webui or app, cli, redteam, core, assertions, config, database
Place all dependency updates under the Dependencies category
Place all test changes under the Tests category
Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features
After a PR number is assigned, replace (#XXXX) placeholders with the actual PR number
Be specific, use active voice, include context, and avoid repeating the PR title in changelog entries
Group related changes with multiple bullets in the same category when needed; use one entry per logical change
CHANGELOG.md: All user-facing changes require a CHANGELOG.md entry before creating a PR
Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Each changelog entry must include PR number (#1234) or placeholder (#XXXX)
Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)
CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Changelog entries must include the PR number in format (#1234)
Use conventional commit prefixes in changelog entries: feat:,...
Files:
CHANGELOG.md
🧠 Learnings (13)
📓 Common learnings
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.mdx : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
📚 Learning: 2025-10-05T17:00:16.553Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:16.553Z
Learning: Applies to src/server/routes/**/*.ts : Organize API endpoint handlers in src/server/routes (e.g., routes/eval.ts, routes/config.ts, routes/results.ts, routes/share.ts).
Applied to files:
src/server/routes/eval.ts
📚 Learning: 2025-10-05T17:00:16.553Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:16.553Z
Learning: Applies to src/server/routes/**/*.ts : Use try/catch in route handlers; log errors and return 400 for validation errors and 500 for unexpected errors.
Applied to files:
src/server/routes/eval.ts
📚 Learning: 2025-10-05T17:00:16.553Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/server/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:16.553Z
Learning: Applies to src/server/apiSchemas.ts : Define and maintain request/response Zod schemas in src/server/apiSchemas.ts and import them in routes.
Applied to files:
src/server/routes/eval.tssrc/server/apiSchemas.ts
📚 Learning: 2025-10-05T16:55:26.262Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: site/docs/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:55:26.262Z
Learning: Applies to site/docs/**/*.{md,mdx} : Use the term "eval" not "evaluation" in documentation and examples
Applied to files:
docs/plans/2025-10-31-eval-copy-design.md
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/src/pages/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Applied to files:
docs/plans/2025-10-31-eval-copy-design.mdsrc/models/eval.ts
📚 Learning: 2025-07-18T17:24:58.606Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/docusaurus.mdc:0-0
Timestamp: 2025-07-18T17:24:58.606Z
Learning: Applies to site/docs/**/*.md : Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
Applied to files:
docs/plans/2025-10-31-eval-copy-design.md
📚 Learning: 2025-10-06T15:44:51.431Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/react-components.mdc:0-0
Timestamp: 2025-10-06T15:44:51.431Z
Learning: Applies to **/*.{tsx,jsx} : Use icons from mui/icons-material
Applied to files:
src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-06T15:44:51.431Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/react-components.mdc:0-0
Timestamp: 2025-10-06T15:44:51.431Z
Learning: Applies to **/*.{tsx,jsx} : Prefer commonly used icons from mui/icons-material for intuitive experience
Applied to files:
src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-27T08:53:44.103Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to src/app/**/*.{ts,tsx,js,jsx} : In React app code under src/app, use callApi from app/utils/api for all API requests; do not call fetch() directly
Applied to files:
src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-06T03:43:01.653Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-06T03:43:01.653Z
Learning: Applies to src/app/**/*.{ts,tsx} : In the React app (src/app), use callApi from app/utils/api for all API calls instead of fetch()
Applied to files:
src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-05T16:56:39.114Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:56:39.114Z
Learning: Applies to src/app/src/{components,pages}/**/*.tsx : Access theme via useTheme() from mui/material/styles instead of hardcoding theme values
Applied to files:
src/app/src/pages/eval/components/ResultsView.tsx
📚 Learning: 2025-10-05T16:56:39.114Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/app/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:56:39.114Z
Learning: Applies to src/app/src/**/*.{ts,tsx} : Never use fetch() directly; always use callApi() from app/utils/api for all HTTP requests
Applied to files:
src/app/src/pages/eval/components/ResultsView.tsx
🧬 Code graph analysis (3)
src/server/routes/eval.ts (2)
src/server/apiSchemas.ts (1)
ApiSchemas(5-78)src/models/eval.ts (1)
Eval(166-1110)
src/models/eval.ts (2)
src/globalConfig/accounts.ts (1)
getUserEmail(35-38)src/database/tables.ts (6)
evalsTable(56-77)evalsToPromptsTable(156-171)tagsTable(41-52)evalsToTagsTable(177-192)evalsToDatasetsTable(223-240)evalResultsTable(79-154)
src/app/src/pages/eval/components/ResultsView.tsx (1)
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx (1)
ConfirmEvalNameDialog(28-149)
🪛 LanguageTool
docs/plans/2025-10-31-eval-copy-design.md
[style] ~145-~145: ‘New Records’ might be wordy. Consider a shorter alternative.
Context: ... ## What Gets Copied? ### ✅ Must Copy (New Records with New IDs) 1. evalsTable record...
(EN_WORDINESS_PREMIUM_NEW_RECORDS)
[style] ~380-~380: Consider a different adjective to strengthen your wording.
Context: ... - ✅ Handled (check existence) 6. Deep mutation issues - Use `structuredC...
(DEEP_PROFOUND)
[style] ~702-~702: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...shold** - Show warning at 10K results, "very large" at 50K 5. ✅ Open in new tab - Yes ...
(EN_WEAK_ADJECTIVE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: CodeQL analysis (javascript-typescript)
- GitHub Check: Agent
- GitHub Check: Tusk Tester
- GitHub Check: Redteam (Production API)
- GitHub Check: Build Docs
- GitHub Check: webui tests
- GitHub Check: Test on Node 24.x and windows-latest
- GitHub Check: Redteam (Staging API)
- GitHub Check: Test on Node 22.x and windows-latest
- GitHub Check: Test on Node 20.x and windows-latest
- GitHub Check: Test on Node 22.x and macOS-latest
- GitHub Check: Test on Node 24.x and ubuntu-latest
- GitHub Check: Test on Node 22.x and ubuntu-latest
- GitHub Check: Test on Node 20.x and macOS-latest
- GitHub Check: Test on Node 20.x and ubuntu-latest
- GitHub Check: Share Test
- GitHub Check: Build on Node 24.x
- GitHub Check: Build on Node 22.x
- GitHub Check: Build on Node 20.x
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (14)
src/server/routes/eval.ts (1)
610-655: LGTM! Well-structured copy endpoint.The implementation follows best practices:
- Zod validation for params and request body
- Appropriate status codes (201, 400, 404, 500)
- Structured logging without sensitive data
- Clean error handling with type-specific responses
src/models/eval.ts (4)
943-968: LGTM! Clean setup with proper deep cloning.The method signature and initialization logic are well-designed:
structuredClone()prevents mutation issues with nested objects- Default description pattern
"${description} (Copy)"is user-friendly- Logging includes useful context (source/target IDs, distinctTestCount)
- getUserEmail() integration follows OSS patterns
970-984: LGTM! Eval record creation follows established patterns.The new eval record is properly initialized:
- All required fields are set with appropriate values
- Uses
sanitizeRuntimeOptions()to prevent serialization issues- Empty
resultsobject is correct for v4 architecture- Mirrors the pattern from
Eval.create()method
986-1048: LGTM! Relationship copying with proper deduplication.The transaction-based relationship copying is well-designed:
- Wraps all operations for atomicity (rollback on failure)
- Correctly relinks to existing shared resources (prompts, tags, datasets)
onConflictDoNothing()handles deduplication elegantly- Follows the same pattern as
Eval.create()for consistency
1050-1101: LGTM! Efficient batched copying with progress tracking.The batched results copying is well-implemented:
BATCH_SIZE = 1000prevents memory exhaustion on large evals- Stable ordering (
orderBy(evalResultsTable.id)) ensures consistent pagination- New UUIDs and timestamps for each copied result
- Progress logging aids debugging without overwhelming logs
- Returns properly hydrated
Evalinstance viafindById()All operations within the transaction ensure atomicity.
src/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx (4)
13-40: LGTM! Well-designed reusable interface.The component props are thoughtfully structured:
- Clear separation of required vs. optional props
onConfirmreturnsPromise<void>for proper async error handling- Size warning props (
showSizeWarning,itemCount,itemLabel) are optional, making the component flexible for both copy and rename use cases- Default
itemLabel = 'items'provides sensible fallback
41-61: LGTM! Proper state management and focus handling.The state and effects are well-implemented:
- Appropriate state for tracking name, loading, and errors
- Derived state (
isLargeOperation,isVeryLargeOperation) keeps logic clean- Reset on open ensures clean state for each dialog session
setTimeout(100)allows dialog to render before focusinginputRef.current?.select()provides excellent UX by selecting all text
63-93: LGTM! Robust handler logic with good UX.The event handlers are well-implemented:
- Validation prevents empty/whitespace-only inputs
- Short-circuit for unchanged names (rename case) avoids unnecessary API calls
- Proper async error handling with user-friendly error messages
event.key === 'Enter' && !event.shiftKeycorrectly excludes Shift+Enter- Loading state prevents double-submission
95-149: LGTM! Clean and accessible UI implementation.The render logic is well-structured:
- Conditional size warning with appropriate severity levels (info at 10k, warning at 50k)
- TextField properly wired with error states, helper text, and disabled state
- Button disabled logic correctly handles all cases:
isLoading || !name.trim() || (name.trim() === currentName && !showSizeWarning)- Loading indicator on button provides clear visual feedback
- MUI components used correctly throughout
src/app/src/pages/eval/components/ResultsView.tsx (4)
9-9: LGTM! Clean imports and state management.The new imports and state follow project conventions:
ContentCopyIconfrom@mui/icons-material(as per guidelines)- Import uses
@app/*path alias correctly- State variables clearly named for their purpose
Also applies to: 39-39, 342-343
391-408: LGTM! Clean implementation following project patterns.The rename handler is well-implemented:
- Uses
callApi()from@app/utils/api(as per guidelines)invariantensures config is loaded before update- PATCH method with proper headers and body
- Throws error for dialog to catch and display
- Updates local state with
setConfig()to keep UI in sync
410-432: LGTM! Copy handler follows best practices.The copy handler is well-designed:
- Uses
callApi()correctly (as per guidelines)- POST method with proper JSON payload
window.open()opens copy in new tab (Google Docs pattern)- Success toast with
distinctTestCount.toLocaleString()provides clear feedback- Error handling throws for dialog component to display
793-798: LGTM! Clean UI integration with proper wiring.The menu and dialog integration is well-executed:
- Menu items include helpful tooltips for discoverability
ContentCopyIconprovides clear visual cue for copy action- Dialog props correctly configured for each use case:
- Edit name: simple dialog without size warning
- Copy: includes
showSizeWarninganditemCount={totalResultsCount}- Both handlers properly wired to respective dialogs
Also applies to: 826-838, 904-924
docs/plans/2025-10-31-eval-copy-design.md (1)
1-730: LGTM! Comprehensive and well-aligned design document.The design document is excellent:
- Clear database architecture analysis with all relevant tables
- Implementation strategy matches the actual code in the PR
- API design section aligns with
apiSchemas.tsand route implementation- Frontend UI design matches the
ConfirmEvalNameDialogcomponent- Testing strategy provides clear guidance for future test additions
- Proper use of "eval" terminology (per coding guidelines)
This document will serve as valuable reference for maintainers.
| itemLabel?: string; | ||
| } | ||
|
|
||
| export const ConfirmEvalNameDialog = ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The props make it seem like this dialog is designed to be generic (e.g. accepting name and actionButtonText) but it's unclear when and for what it might be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConfirmEvalNameDialog is intentionally designed to be reusable for both rename and copy operations
| } catch (error) { | ||
| console.error('Failed to update table:', error); | ||
| } | ||
| const handleSaveEvalName = async (newName: string) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to wrap this function in useCallback to ensure the invariant does not receive stale data.
| }; | ||
|
|
||
| const handleCopyEval = async (description: string) => { | ||
| invariant(evalId, 'Eval ID must be set before copying'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above re: wrapping in useCallback.
| }); | ||
|
|
||
| if (!response.ok) { | ||
| throw new Error('Failed to update eval name'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show an error toast.
| }); | ||
|
|
||
| if (!response.ok) { | ||
| throw new Error('Failed to copy evaluation'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show an error toast.
| actionButtonText="Save" | ||
| onConfirm={handleSaveEvalName} | ||
| /> | ||
| <ConfirmEvalNameDialog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a name confirmation dialog being used for copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-used the component. the re-name flow was bad and this also makes it nicer.
| onClose={() => setCopyDialogOpen(false)} | ||
| title="Copy Evaluation" | ||
| label="Description" | ||
| currentName={`${config?.description || 'Evaluation'} (Copy)`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inclusion of "(Copy)" is confusing here since this is current, original name.
src/models/eval.ts
Outdated
| const db = getDb(); | ||
|
|
||
| // Create the new eval record first | ||
| db.insert(evalsTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this insertion occur prior to the transaction?
|
|
||
| const response = ApiSchemas.Eval.Copy.Response.parse({ | ||
| id: newEval.id, | ||
| distinctTestCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used for user feedback in the success toast: "Copied 1,234 results successfully". May not be necessary / worth the complexity.
|
|
||
| logger.error('Failed to copy eval', { | ||
| error, | ||
| evalId: req.params.id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should be consistent with the above i.e. use id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to use req.params.id in error handler at src/server/routes/eval.ts:651 because the id variable is scoped inside the try block (from Zod parse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good aside from some slop logic. Did not test locally.
Fixed 2 critical bugs identified in self-review: 1. **Atomicity violation** (src/models/eval.ts): Moved eval record creation inside transaction to prevent orphaned records if copy fails partway through. Previously, eval was created outside the transaction, causing data integrity issues on failure. 2. **Dialog logic bug** (ConfirmEvalNameDialog.tsx): Fixed early return condition that prevented copying small evals (<10K results) when using default name. Changed from checking `!showSizeWarning` to checking `itemCount === undefined` to properly distinguish rename vs copy operations. 3. **Performance optimization** (eval.ts, routes/eval.ts): Removed duplicate `getResultsCount()` query by passing count as optional parameter to `copy()` method. See docs/plans/2025-10-31-eval-copy-critical-review.md for full analysis.
0ffca23 to
06f2c12
Compare
- Add useCallback wrappers for handleSaveEvalName and handleCopyEval to prevent unnecessary re-renders - Add error toast notifications for failed copy/rename operations - Fix Date.now() inefficiency in Eval.copy() batch loop by calling once per batch instead of per result - Replace deprecated onKeyPress with onKeyDown in ConfirmEvalNameDialog - Fix logging consistency to use 'id' variable instead of 'req.params.id' in error handler
06f2c12 to
f1b11cb
Compare
Use more conventional 'Copy of [name]' pattern instead of '[name] (Copy)' to match user expectations (e.g., Google Docs pattern)
…e]' pattern" This reverts commit d764496. The '[name] (Copy)' pattern is preferred.
Add 40 tests covering: - Rendering with various props and states - Size warnings for large/very large operations - Input validation (empty, whitespace, valid input) - Rename mode vs copy mode behavior - Loading states and async operations - Error handling and display - Keyboard interactions (Enter, Shift+Enter) - Dialog lifecycle and state management - Cancel button behavior - Edge cases (boundary values)
Rewrote tests to focus on: - Component rendering and props - Size warning display logic - Button states based on props - Edge case thresholds (10K, 50K) Removed tests that relied on fireEvent for user interaction, which were causing timeouts in CI due to MUI's complex async behavior. Tests now run reliably in ~300ms instead of timing out.
Applied Biome formatter to collapse multi-line statements.
a8185dd to
281369b
Compare
❌ Generated 21 tests - 19 passed, 2 failed (281369b) View tests ↗Test Summary
ResultsTusk's tests show solid coverage of the eval copy feature with 19 passing tests across two components. The Key Points
View check history
Was Tusk helpful? Give feedback by reacting with 👍 or 👎 |
Summary
Add the ability to duplicate evaluations with all results, configuration, prompts, and relationships via the UI menu.
Changes
Backend
Eval.copy()method - Deep copies eval with batched result copying (1000 rows per batch) to handle large evaluationsPOST /api/eval/:id/copywith Zod validation and proper error handlingFrontend
ConfirmEvalNameDialoghandles both copy and rename operationsImplementation Details
Batching Strategy:
Deduplication:
Component Reusability:
EditEvalNameDialogwith flexibleConfirmEvalNameDialogFiles Changed
src/models/eval.ts- Addedcopy()methodsrc/server/apiSchemas.ts- Added Copy schemassrc/server/routes/eval.ts- Added POST endpointsrc/app/src/pages/eval/components/ConfirmEvalNameDialog.tsx- New reusable dialogsrc/app/src/pages/eval/components/ResultsView.tsx- UI integrationdocs/plans/2025-10-31-eval-copy-design.md- Implementation planCHANGELOG.md- Documented featureEditEvalNameDialog.tsxand its test (replaced with flexible component)Testing
Future Work
Eval.copy()methodConfirmEvalNameDialogcomponent