[feature] Improve evaluation runs page(s) / table(s) by ardaerzin · Pull Request #3016 · Agenta-AI/agenta

ardaerzin · 2025-11-19T14:28:59Z

tba...

vercel · 2025-11-19T14:29:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
agenta-documentation	Ready	Preview	Comment	Dec 5, 2025 3:53pm

CLAassistant · 2025-11-19T14:29:06Z

All committers have signed the CLA.

Copilot

Pull Request Overview

This PR introduces significant improvements to the evaluation runs page(s) and table(s) in the frontend, implementing a comprehensive redesign of the evaluation run details interface. The changes add new views (Overview, Scenarios, Configuration), enhanced comparison capabilities, and improved data visualization components.

Key Changes

Added Overview view with metric comparisons, spider charts, and temporal metrics visualization
Implemented Configuration view with detailed run metadata, evaluator settings, and variant information
Enhanced table functionality with focus drawer for detailed scenario inspection
Added run comparison features with support for multiple run comparisons

Reviewed Changes

Copilot reviewed 117 out of 324 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
OverviewPlaceholders.tsx	Loading and empty state placeholders with animated radar chart mock
OverviewMetricComparison.tsx	Metric comparison logic aggregating data across runs
MetricComparisonCard.tsx	Chart component for displaying metric distributions across runs
MetadataSummaryTable.tsx	Comprehensive metadata table showing run details and metrics
EvaluatorTemporalMetricsChart.tsx	Time-series chart for evaluator metrics with area/line visualization
BaseRunMetricsSection.tsx	Section displaying base run metrics with temporal and static views
AggregatedOverviewSection.tsx	Aggregated overview combining metadata table and spider chart
OverviewView.tsx	Main overview view component orchestrating run comparisons
ConfigurationView/utils.ts	Utility functions for parsing run configuration data
ConfigurationView/index.tsx	Main configuration view with synchronized scrolling columns
TestsetSection.tsx	Testset configuration display component
QuerySection.tsx	Query configuration with filters and sampling rate display
PromptConfigCard.tsx	Prompt configuration card with message normalization
InvocationSection.tsx	Invocation configuration with variant details
GeneralSection.tsx	General run information with editable name/description
EvaluatorSection.tsx	Evaluator configuration display with JSON toggle
Reference components	Reusable components for displaying application/variant/testset references
TableHeaders/StepGroupHeader.tsx	Dynamic table headers with reference resolution
TableCells	Cell renderers for metrics, invocations, inputs, and actions
FocusDrawer components	Drawer for detailed scenario inspection with navigation
EvaluatorMetricsSpiderChart	Spider/radar chart for multi-dimensional metric visualization
EvaluatorMetricsChart	Chart components for evaluator metric distributions
CompareRunsMenu.tsx	UI for selecting and managing run comparisons
Page.tsx	Main page component with tab navigation
Various atoms/state	State management for comparison, focus drawer, and table data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...components/EvalRunDetails2/components/views/OverviewView/components/OverviewPlaceholders.tsx

web/oss/src/components/EvalRunDetails2/components/views/ConfigurationView/utils.ts

...omponents/EvalRunDetails2/components/views/ConfigurationView/components/PromptConfigCard.tsx

web/oss/src/components/EvalRunDetails2/components/TableCells/MetricCell.tsx

Copilot

Pull Request Overview

Copilot reviewed 117 out of 326 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull Request Overview

Copilot reviewed 120 out of 336 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api/oss/src/apis/fastapi/evaluations/models.py

api/oss/src/apis/fastapi/evaluations/router.py

api/oss/src/apis/fastapi/evaluations/utils.py

api/oss/src/apis/fastapi/testsets/router.py

api/oss/src/core/evaluations/service.py

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…de-daytona-sandbox

…com/Agenta-AI/agenta into frontend-feat/new-evaluations-pages

…de-daytona-sandbox

…ogram bins, improve duration formatting - Only use invocation step (type="invocation") for shared analytics keys (duration, tokens, costs) to avoid showing evaluator execution metrics instead of LLM metrics - Add invocationStepKeys parameter to flattenRunLevelMetricData to identify correct step - Aggregate histogram bins when count exceeds MAX_DISPLAY_BINS (6) for clearer visualization - Fix duration formatting in metric popover by

…and refactor utility functions - Remove commented debug logs and unused code blocks across multiple components - Add error handling and user feedback for testset name fetching failures - Add input validation for scenario and run IDs to prevent SSRF attacks - Improve prompt key resolution with helper function in PromptConfigCard - Add clarifying comments for fallback logic and depth limits - Refactor sample rate formatting with

…-evaluations-pages

- Remove word splitting, filtering, and humanization logic from humanizeEvaluatorName - Return evaluator label unchanged instead of processing it

…com/Agenta-AI/agenta into frontend-feat/new-evaluations-pages

Copilot AI review requested due to automatic review settings November 19, 2025 14:28

Copilot AI reviewed Nov 19, 2025

View reviewed changes

vercel bot deployed to Preview November 19, 2025 14:49 View deployment

Copilot AI review requested due to automatic review settings November 19, 2025 15:13

Copilot AI reviewed Nov 19, 2025

View reviewed changes

vercel bot deployed to Preview November 19, 2025 15:21 View deployment

Copilot AI review requested due to automatic review settings November 20, 2025 15:27

Copilot AI reviewed Nov 20, 2025

View reviewed changes

vercel bot deployed to Preview November 20, 2025 15:29 View deployment

junaway changed the title ~~[Frontend / Feat] Improve evaluation runs page(s) / table(s)~~ [feature] Improve evaluation runs page(s) / table(s) Nov 20, 2025