pydantic
diff --git a/‎docs/guides/web-ui/evals.md‎
Lines changed: 71 additions & 0 deletions b/‎docs/guides/web-ui/evals.md‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎docs/images/guide/evals-overview.webp‎
16.5 KB b/‎docs/images/guide/evals-overview.webp‎
16.5 KB
diff --git a/‎docs/images/guide/evals-preview1.webp‎
81.8 KB b/‎docs/images/guide/evals-preview1.webp‎
81.8 KB
diff --git a/‎docs/images/guide/evals-preview2.webp‎
86.5 KB b/‎docs/images/guide/evals-preview2.webp‎
86.5 KB
diff --git a/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions b/‎mkdocs.yml‎
Lines changed: 1 addition & 0 deletions
@@ -0,0 +1,71 @@
+# Evals (beta)
+
+View and analyze your evaluation results in Pydantic Logfire's web interface. Evals provide observability into how your AI systems perform across different test cases and experiments over time.
+
+!!! note "Code-First Evaluation"
+
+    Evals are created and run using the [Pydantic Evals](https://ai.pydantic.dev/evals/) package, which is developed in tandem with Pydantic AI. Logfire serves as an observability layer where you can view and compare results.
+
+To get started, refer to the [Pydantic Evals installation guide](https://ai.pydantic.dev/evals/#installation).
+
+## What are Evals?
+
+Evals help you systematically test and evaluate AI systems by running them against predefined test cases. Each evaluation experiment appears in Logfire automatically when you run the `pydantic_evals.Dataset.evaluate` method with Logfire integration enabled.
+
+For the data model, examples and full documentation on creating and running Evals, read the [Pydantic Evals docs](https://ai.pydantic.dev/evals/)
+
+## Viewing Experiments
+
+![Evals overview](../../images/guide/evals-overview.webp)
+
+The Evals tab shows all experiments for your project available within your data retention period. Each experiment represents a single run of a dataset against a task function.
+
+### Experiment List
+
+Each experiment displays:
+
+- **Experiment name** - Auto-generated by Logfire (e.g., "gentle-sniff-buses")
+- **Task name** - The function being evaluated
+- **Span link** - Direct link to the detailed trace
+- **Created timestamp** - When the experiment was run
+
+Click on any experiment to view detailed results.
+
+### Experiment Details
+
+Individual experiment pages show comprehensive results including:
+
+- **Test cases** with inputs, expected outputs, and actual outputs
+- **Assertion results** - Pass/fail status for each evaluator
+- **Performance metrics** - Duration, token usage, and custom scores
+- **Evaluation scores** - Detailed scoring from all evaluators
+
+![Experiment details](../../images/guide/evals-preview1.webp)
+
+## Comparing Experiments
+
+Use the experiment comparison view to analyze performance across different runs:
+
+1. Select multiple experiments from the list
+2. Click **Compare selected**
+3. View side-by-side results for the same test cases
+
+![Experiment comparison](../../images/guide/evals-preview2.webp)
+
+The comparison view highlights:
+
+- **Differences in outputs** between experiment runs
+- **Score variations** across evaluators
+- **Performance changes** in metrics like duration and token usage
+- **Regression detection** when comparing baseline vs current implementations
+
+## Integration with Traces
+
+Every evaluation experiment generates detailed OpenTelemetry traces that appear in Logfire:
+
+- **Experiment span** - Root span containing all evaluation metadata
+- **Case execution spans** - Individual test case runs with full context
+- **Task function spans** - Detailed tracing of your AI system under test
+- **Evaluator spans** - Scoring and assessment execution details
+
+Navigate from experiment results to full trace details using the span links.
@@ -82,6 +82,7 @@ nav:
           - Live View: guides/web-ui/live.md
           - LLM Panels: guides/web-ui/llm-panels.md
           - Dashboards: guides/web-ui/dashboards.md
+          - Evals (Beta): guides/web-ui/evals.md
           - Issues (Beta): guides/web-ui/issues.md
           - Alerts: guides/web-ui/alerts.md
           - Saved Searches: guides/web-ui/saved-searches.md