Include report averages #3053

cetra3 · 2025-10-01T02:11:47Z

Allows the evals page in logfire to present some nice stats in the main comparison table:

github-actions · 2025-10-01T02:19:30Z

Docs Preview

commit:	`53b4af4`
Preview URL:	https://dc8262b5-pydantic-ai-previews.pydantic.workers.dev

DouweM

@cetra3 A few comments. More generally, was this added for a specific frontend feature that will look at these attrs and subfields, or more just because it's useful info?

DouweM · 2025-10-01T16:31:51Z

pydantic_evals/pydantic_evals/dataset.py

            )
            if (averages := report.averages()) is not None and averages.assertions is not None:
+                experiment_metadata = {'n_cases': len(self.cases), 'averages': averages}
+                eval_span.set_attribute('experiment.metadata', experiment_metadata)


Can you add a logfire. prefix here since it's not general OTel but specific to our platform?

I'd argue it's specific to pydantic evals, rather than logfire itself?

We use the experiment.metadata field to ingest information about the test run. Are you saying we should change this to logfire.experiment.metadata ? or are you saying underneath experiment.metadata we should have a logfire field.

@cetra3 Yeah I was suggesting (based on @alexmojaki suggestion in Slack) to change it to logfire.experiment.metadata. But I see your (implied) point that even though right now only Logfire will use these attributes, they could be useful to users sending pydantic_evals data to different platform as well.

@alexmojaki What do you think about a pydantic_evals prefix?

Everything here is specific to pydantic evals. This is particularly specific not just to logfire but to fusionfire popped attributes. n_cases is being repeated here so that it can be displayed cheaply in the frontend. We may add or remove things here freely depending on what the frontend needs, it's not intended to be queried. I think logfire. makes sense for that.

DouweM · 2025-10-01T16:34:49Z

pydantic_evals/pydantic_evals/dataset.py

                trace_id=trace_id,
            )
            if (averages := report.averages()) is not None and averages.assertions is not None:
+                experiment_metadata = {'n_cases': len(self.cases), 'averages': averages}


From the test below, it looks like we already had n_cases in here, do we need to repeat it? Or could we drop it there and have it just under the metadata?

DouweM · 2025-10-01T16:35:53Z

tests/evals/test_dataset.py

+                            'experiment.metadata': {
+                                'type': 'object',
+                                'properties': {
+                                    'averages': {


I'd expect n_cases to also be listed here

DouweM · 2025-10-01T16:37:35Z

tests/evals/test_dataset.py

                    'logfire.msg': 'evaluate mock_async_task',
+                    'experiment.metadata': {
+                        'n_cases': 2,
+                        'averages': {


If we're going to be using this on the frontend, I don't know if it's wise to pass along the entire ReportCaseAggregate object or just the subset we need

DouweM

@cetra3 A few comments. More generally, was this added for a specific frontend feature that will look at these attrs and subfields, or more just because it's useful info?

Include report averages

7fdab58

cetra3 requested review from DouweM and dmontagu October 1, 2025 02:11

Adjust test

53b4af4

DouweM requested changes Oct 1, 2025

View reviewed changes

DouweM self-assigned this Oct 1, 2025

DouweM added the awaiting author revision label Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include report averages #3053

Include report averages #3053

Uh oh!

cetra3 commented Oct 1, 2025

Uh oh!

github-actions bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

DouweM left a comment

Uh oh!

DouweM Oct 1, 2025

Uh oh!

cetra3 Oct 2, 2025

Uh oh!

DouweM Oct 3, 2025

Uh oh!

alexmojaki Oct 3, 2025

Uh oh!

DouweM Oct 1, 2025

Uh oh!

DouweM Oct 1, 2025

Uh oh!

DouweM Oct 1, 2025

Uh oh!

DouweM left a comment

Uh oh!

Uh oh!

Include report averages #3053

Are you sure you want to change the base?

Include report averages #3053

Uh oh!

Conversation

cetra3 commented Oct 1, 2025

Uh oh!

github-actions bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docs Preview

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

DouweM Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

cetra3 Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

alexmojaki Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 1, 2025 •

edited

Loading