Skip to content

Commit 91c67b9

Browse files
Evals docs in Logfire (points to Pydantic AI) (#1374)
Co-authored-by: David Montague <[email protected]>
1 parent 4bde7f1 commit 91c67b9

File tree

5 files changed

+72
-0
lines changed

5 files changed

+72
-0
lines changed

docs/guides/web-ui/evals.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Evals (beta)
2+
3+
View and analyze your evaluation results in Pydantic Logfire's web interface. Evals provide observability into how your AI systems perform across different test cases and experiments over time.
4+
5+
!!! note "Code-First Evaluation"
6+
7+
Evals are created and run using the [Pydantic Evals](https://ai.pydantic.dev/evals/) package, which is developed in tandem with Pydantic AI. Logfire serves as an observability layer where you can view and compare results.
8+
9+
To get started, refer to the [Pydantic Evals installation guide](https://ai.pydantic.dev/evals/#installation).
10+
11+
## What are Evals?
12+
13+
Evals help you systematically test and evaluate AI systems by running them against predefined test cases. Each evaluation experiment appears in Logfire automatically when you run the `pydantic_evals.Dataset.evaluate` method with Logfire integration enabled.
14+
15+
For the data model, examples and full documentation on creating and running Evals, read the [Pydantic Evals docs](https://ai.pydantic.dev/evals/)
16+
17+
## Viewing Experiments
18+
19+
![Evals overview](../../images/guide/evals-overview.webp)
20+
21+
The Evals tab shows all experiments for your project available within your data retention period. Each experiment represents a single run of a dataset against a task function.
22+
23+
### Experiment List
24+
25+
Each experiment displays:
26+
27+
- **Experiment name** - Auto-generated by Logfire (e.g., "gentle-sniff-buses")
28+
- **Task name** - The function being evaluated
29+
- **Span link** - Direct link to the detailed trace
30+
- **Created timestamp** - When the experiment was run
31+
32+
Click on any experiment to view detailed results.
33+
34+
### Experiment Details
35+
36+
Individual experiment pages show comprehensive results including:
37+
38+
- **Test cases** with inputs, expected outputs, and actual outputs
39+
- **Assertion results** - Pass/fail status for each evaluator
40+
- **Performance metrics** - Duration, token usage, and custom scores
41+
- **Evaluation scores** - Detailed scoring from all evaluators
42+
43+
![Experiment details](../../images/guide/evals-preview1.webp)
44+
45+
## Comparing Experiments
46+
47+
Use the experiment comparison view to analyze performance across different runs:
48+
49+
1. Select multiple experiments from the list
50+
2. Click **Compare selected**
51+
3. View side-by-side results for the same test cases
52+
53+
![Experiment comparison](../../images/guide/evals-preview2.webp)
54+
55+
The comparison view highlights:
56+
57+
- **Differences in outputs** between experiment runs
58+
- **Score variations** across evaluators
59+
- **Performance changes** in metrics like duration and token usage
60+
- **Regression detection** when comparing baseline vs current implementations
61+
62+
## Integration with Traces
63+
64+
Every evaluation experiment generates detailed OpenTelemetry traces that appear in Logfire:
65+
66+
- **Experiment span** - Root span containing all evaluation metadata
67+
- **Case execution spans** - Individual test case runs with full context
68+
- **Task function spans** - Detailed tracing of your AI system under test
69+
- **Evaluator spans** - Scoring and assessment execution details
70+
71+
Navigate from experiment results to full trace details using the span links.
16.5 KB
Loading
81.8 KB
Loading
86.5 KB
Loading

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ nav:
8282
- Live View: guides/web-ui/live.md
8383
- LLM Panels: guides/web-ui/llm-panels.md
8484
- Dashboards: guides/web-ui/dashboards.md
85+
- Evals (Beta): guides/web-ui/evals.md
8586
- Issues (Beta): guides/web-ui/issues.md
8687
- Alerts: guides/web-ui/alerts.md
8788
- Saved Searches: guides/web-ui/saved-searches.md

0 commit comments

Comments
 (0)