Skip to content

Commit 6da8107

Browse files
authored
Add more detail to the PydanticEvals readme.md (#1320)
1 parent f6b267d commit 6da8107

File tree

1 file changed

+84
-2
lines changed

1 file changed

+84
-2
lines changed

pydantic_evals/README.md

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,87 @@ use of Python syntax.
1818

1919
Full documentation is available at [ai.pydantic.dev/evals](https://ai.pydantic.dev/evals).
2020

21-
[//]: # (TODO: Add a basic example here.)
22-
[//]: # (TODO: Add a note about how you can view the results in the terminal or in any OTel sink, e.g. Logfire.)
21+
## Example
22+
23+
While you'd typically use Pydantic Evals with more complex functions (such as PydanticAI agents or graphs), here's a
24+
quick example that evaluates a simple function against a test case using both custom and built-in evaluators:
25+
26+
```python
27+
from pydantic_evals import Case, Dataset
28+
from pydantic_evals.evaluators import Evaluator, EvaluatorContext, IsInstance
29+
30+
# Define a test case with inputs and expected output
31+
case = Case(
32+
name='capital_question',
33+
inputs='What is the capital of France?',
34+
expected_output='Paris',
35+
)
36+
37+
# Define a custom evaluator
38+
class MatchAnswer(Evaluator[str, str]):
39+
def evaluate(self, ctx: EvaluatorContext[str, str]) -> float:
40+
if ctx.output == ctx.expected_output:
41+
return 1.0
42+
elif isinstance(ctx.output, str) and ctx.expected_output.lower() in ctx.output.lower():
43+
return 0.8
44+
return 0.0
45+
46+
# Create a dataset with the test case and evaluators
47+
dataset = Dataset(
48+
cases=[case],
49+
evaluators=[IsInstance(type_name='str'), MatchAnswer()],
50+
)
51+
52+
# Define the function to evaluate
53+
async def answer_question(question: str) -> str:
54+
return 'Paris'
55+
56+
# Run the evaluation
57+
report = dataset.evaluate_sync(answer_question)
58+
report.print(include_input=True, include_output=True)
59+
"""
60+
Evaluation Summary: answer_question
61+
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
62+
┃ Case ID ┃ Inputs ┃ Outputs ┃ Scores ┃ Assertions ┃ Duration ┃
63+
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
64+
│ capital_question │ What is the capital of France? │ Paris │ MatchAnswer: 1.00 │ ✔ │ 10ms │
65+
├──────────────────┼────────────────────────────────┼─────────┼───────────────────┼────────────┼──────────┤
66+
│ Averages │ │ │ MatchAnswer: 1.00 │ 100.0% ✔ │ 10ms │
67+
└──────────────────┴────────────────────────────────┴─────────┴───────────────────┴────────────┴──────────┘
68+
"""
69+
```
70+
71+
Using the library with more complex functions, such as PydanticAI agents, is similar — all you need to do is define a
72+
task function wrapping the function you want to evaluate, with a signature that matches the inputs and outputs of your
73+
test cases.
74+
75+
## Logfire Integration
76+
77+
Pydantic Evals uses OpenTelemetry to record traces for each case in your evaluations.
78+
79+
You can send these traces to any OpenTelemetry-compatible backend. For the best experience, we recommend [Pydantic Logfire](https://logfire.pydantic.dev/docs), which includes custom views for evals:
80+
81+
<div style="display: flex; gap: 1rem; flex-wrap: wrap;">
82+
<img src="https://ai.pydantic.dev/img/logfire-evals-overview.png" alt="Logfire Evals Overview" width="48%">
83+
<img src="https://ai.pydantic.dev/img/logfire-evals-case.png" alt="Logfire Evals Case View" width="48%">
84+
</div>
85+
86+
You'll see full details about the inputs, outputs, token usage, execution durations, etc. And you'll have access to the full trace for each case — ideal for debugging, writing path-aware evaluators, or running the similar evaluations against production traces.
87+
88+
Basic setup:
89+
90+
```python {test="skip" lint="skip" format="skip"}
91+
import logfire
92+
93+
logfire.configure(
94+
send_to_logfire='if-token-present',
95+
environment='development',
96+
service_name='evals',
97+
)
98+
99+
...
100+
101+
my_dataset.evaluate_sync(my_task)
102+
```
103+
104+
[Read more about the Logfire integration here.](https://ai.pydantic.dev/evals/#logfire-integration)

0 commit comments

Comments
 (0)