-
Notifications
You must be signed in to change notification settings - Fork 462
Closed
Labels
EvaluationFrontendInternal TeamUXenhancementNew feature or requestNew feature or requestsize:MThis PR changes 30-99 lines, ignoring generated files.This PR changes 30-99 lines, ignoring generated files.
Description
Parent: #3051
Summary
Implement a guided tour that helps users run their first evaluation. This is the primary activation target.
Tour ID: first-evaluation
Completion event: evaluation_ran
Behind env var: NEXT_PUBLIC_ENABLE_WALKTHROUGHS=true
Steps
| # | Icon | Title | Content | Selector |
|---|---|---|---|---|
| 1 | π― | Run Your First Evaluation | Evaluations help you measure how well your prompts perform. Let's run one together. | (centered) |
| 2 | Open the Evaluation Modal | Click "Run Evaluation" to start. | [data-tour="run-evaluation-button"] |
|
| 3 | π | Select a Test Set | Choose a test set. We have created one for you to get started. | [data-tour="testset-select"] |
| 4 | π | Choose an Evaluator | Select "Exact Match" to compare outputs against expected answers. You can create custom evaluators later. | [data-tour="evaluator-select"] |
| 5 | π | Run the Evaluation | Click Run to start the evaluation. | [data-tour="run-eval-confirm"] |
| 6 | π | View Your Results | Here are your results. You can see how each test case performed and the overall score. | [data-tour="eval-results"] |
Implementation Tasks
- Add
data-tourattributes to evaluation UI elements - Create tour definition in
components/Onboarding/tours/ - Register tour with
tourRegistry - Fire
evaluation_ranevent on completion
Notes
- Step 3 uses the pre-created test set (see Create initial entities for lower onboarding frictionΒ #3057)
- If user has no prompt, guide them to create one first
Acceptance Criteria
- Tour only runs when
NEXT_PUBLIC_ENABLE_WALKTHROUGHS=true - All 6 steps render with correct copy
- Selectors target correct elements
- Completion event fires when tour finishes
dosubot
Metadata
Metadata
Assignees
Labels
EvaluationFrontendInternal TeamUXenhancementNew feature or requestNew feature or requestsize:MThis PR changes 30-99 lines, ignoring generated files.This PR changes 30-99 lines, ignoring generated files.