[feature] Improve evaluation runs page(s) / table(s)#3016
[feature] Improve evaluation runs page(s) / table(s)#3016ardaerzin merged 272 commits intorelease/v0.66.0from
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull Request Overview
This PR introduces significant improvements to the evaluation runs page(s) and table(s) in the frontend, implementing a comprehensive redesign of the evaluation run details interface. The changes add new views (Overview, Scenarios, Configuration), enhanced comparison capabilities, and improved data visualization components.
Key Changes
- Added Overview view with metric comparisons, spider charts, and temporal metrics visualization
- Implemented Configuration view with detailed run metadata, evaluator settings, and variant information
- Enhanced table functionality with focus drawer for detailed scenario inspection
- Added run comparison features with support for multiple run comparisons
Reviewed Changes
Copilot reviewed 117 out of 324 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| OverviewPlaceholders.tsx | Loading and empty state placeholders with animated radar chart mock |
| OverviewMetricComparison.tsx | Metric comparison logic aggregating data across runs |
| MetricComparisonCard.tsx | Chart component for displaying metric distributions across runs |
| MetadataSummaryTable.tsx | Comprehensive metadata table showing run details and metrics |
| EvaluatorTemporalMetricsChart.tsx | Time-series chart for evaluator metrics with area/line visualization |
| BaseRunMetricsSection.tsx | Section displaying base run metrics with temporal and static views |
| AggregatedOverviewSection.tsx | Aggregated overview combining metadata table and spider chart |
| OverviewView.tsx | Main overview view component orchestrating run comparisons |
| ConfigurationView/utils.ts | Utility functions for parsing run configuration data |
| ConfigurationView/index.tsx | Main configuration view with synchronized scrolling columns |
| TestsetSection.tsx | Testset configuration display component |
| QuerySection.tsx | Query configuration with filters and sampling rate display |
| PromptConfigCard.tsx | Prompt configuration card with message normalization |
| InvocationSection.tsx | Invocation configuration with variant details |
| GeneralSection.tsx | General run information with editable name/description |
| EvaluatorSection.tsx | Evaluator configuration display with JSON toggle |
| Reference components | Reusable components for displaying application/variant/testset references |
| TableHeaders/StepGroupHeader.tsx | Dynamic table headers with reference resolution |
| TableCells | Cell renderers for metrics, invocations, inputs, and actions |
| FocusDrawer components | Drawer for detailed scenario inspection with navigation |
| EvaluatorMetricsSpiderChart | Spider/radar chart for multi-dimensional metric visualization |
| EvaluatorMetricsChart | Chart components for evaluator metric distributions |
| CompareRunsMenu.tsx | UI for selecting and managing run comparisons |
| Page.tsx | Main page component with tab navigation |
| Various atoms/state | State management for comparison, focus drawer, and table data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...components/EvalRunDetails2/components/views/OverviewView/components/OverviewPlaceholders.tsx
Outdated
Show resolved
Hide resolved
web/oss/src/components/EvalRunDetails2/components/views/ConfigurationView/utils.ts
Show resolved
Hide resolved
...omponents/EvalRunDetails2/components/views/ConfigurationView/components/PromptConfigCard.tsx
Outdated
Show resolved
Hide resolved
web/oss/src/components/EvalRunDetails2/components/TableCells/MetricCell.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 117 out of 326 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 120 out of 336 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…de-daytona-sandbox
…com/Agenta-AI/agenta into frontend-feat/new-evaluations-pages
…de-daytona-sandbox
…ogram bins, improve duration formatting - Only use invocation step (type="invocation") for shared analytics keys (duration, tokens, costs) to avoid showing evaluator execution metrics instead of LLM metrics - Add invocationStepKeys parameter to flattenRunLevelMetricData to identify correct step - Aggregate histogram bins when count exceeds MAX_DISPLAY_BINS (6) for clearer visualization - Fix duration formatting in metric popover by
…and refactor utility functions - Remove commented debug logs and unused code blocks across multiple components - Add error handling and user feedback for testset name fetching failures - Add input validation for scenario and run IDs to prevent SSRF attacks - Improve prompt key resolution with helper function in PromptConfigCard - Add clarifying comments for fallback logic and depth limits - Refactor sample rate formatting with
…-evaluations-pages
- Remove word splitting, filtering, and humanization logic from humanizeEvaluatorName - Return evaluator label unchanged instead of processing it
…com/Agenta-AI/agenta into frontend-feat/new-evaluations-pages
tba...