[docs] eval SDK docs #2940

jp-agenta · 2025-11-12T11:47:53Z

[docs] eval SDK docs

vercel · 2025-11-12T11:47:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
agenta-documentation	Error			Nov 12, 2025 11:49am

CLAassistant · 2025-11-12T11:48:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull Request Overview

This PR contains documentation updates and code formatting improvements for the SDK evaluation system, along with version rollbacks and cleanup of deprecated features.

Key changes:

Added comprehensive SDK evaluation documentation with quick-start guides and detailed configuration pages
Reformatted multi-line assert statements and string concatenations for better readability
Rolled back version numbers from 0.62.1 to 0.61.0 across multiple packages
Removed deprecated LLM-as-a-judge output schema customization feature and related documentation
Reorganized example notebooks into subdirectories and added new evaluation quick-start notebook

Reviewed Changes

Copilot reviewed 284 out of 671 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
docs/docs/evaluation/evaluation-from-sdk/*	New SDK evaluation documentation pages including quick-start, evaluator configuration, and testset management guides
examples/jupyter/evaluation/quick-start.ipynb	New comprehensive evaluation quick-start notebook demonstrating SDK usage
sdk/pyproject.toml, api/pyproject.toml, web/ee/package.json	Version rollback from 0.62.1 to 0.61.0
sdk/agenta/sdk/workflows/handlers.py	Changed error handling from returning failure objects to raising exceptions
sdk/agenta/sdk/tracing/exporters.py	Changed default OTLP async export from "true" to "false"
Multiple test files (sdk/tests/, api/oss/tests/)	Reformatted multi-line assert statements for better readability
docs/blog/entries/customize-llm-as-a-judge-output-schemas.mdx	Removed documentation for deprecated feature
hosting/docker-compose//docker-compose.yml	Removed cron service configurations
api/oss/src/services/converters.py	Deleted deprecated converter functions

Comments suppressed due to low confidence (1)

sdk/agenta/sdk/workflows/handlers.py:454

This change from returning {'success': False} to raising exceptions is a breaking behavioral change that could affect existing code. Ensure all callers are prepared to handle these exceptions instead of checking the success field in return values.

    if not isinstance(outputs, str) and not isinstance(outputs, dict):
        raise InvalidOutputsV0Error(expected=["dict", "str"], got=outputs)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-12T11:49:00Z

api/pyproject.toml

 ecdsa = "^0.19.1"
 bson = "^0.5.10"
-agenta = ">=0.61.0"
+agenta = "^0.60.1"


The agenta dependency constraint ^0.60.1 is inconsistent with the API version being rolled back to 0.61.0 (line 3). Consider updating this constraint to ^0.61.0 or explaining why a lower version constraint is needed.

Suggested change

agenta = "^0.60.1"

agenta = "^0.61.0"

Copilot · 2025-11-12T11:49:00Z

sdk/agenta/sdk/tracing/exporters.py



 log = get_module_logger(__name__)



Changing the default from 'true' to 'false' for async export is a significant behavioral change that could impact performance characteristics. This should be documented in release notes and migration guides, as it affects how telemetry data is exported.

Suggested change

# NOTE: The default for async export has changed from 'true' to 'false'.

# This is a significant behavioral change that could impact performance characteristics.

# Ensure this change is documented in release notes and migration guides,

# as it affects how telemetry data is exported.

Copilot · 2025-11-12T11:49:01Z

api/oss/src/services/app_manager.py

+            evaluation_db = await db_manager_ee.fetch_evaluation_by_id(
                project_id=project_id,
                evaluation_id=object_id,
            )


Changed from db_manager.fetch_evaluation_by_id to db_manager_ee.fetch_evaluation_by_id. This suggests the function was moved to the EE (Enterprise Edition) module. Verify that db_manager_ee is properly imported and available, otherwise this will cause a NameError.

docs/evaluation-sdk-docs

9c116fc

Copilot AI review requested due to automatic review settings November 12, 2025 11:47

vercel bot had a problem deploying to Preview November 12, 2025 11:49 Failure

Copilot AI reviewed Nov 12, 2025

View reviewed changes

mmabrouk closed this Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[docs] eval SDK docs #2940

[docs] eval SDK docs #2940

Uh oh!

jp-agenta commented Nov 12, 2025

Uh oh!

vercel bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

+# NOTE: The default for async export has changed from 'true' to 'false'.
+# This is a significant behavioral change that could impact performance characteristics.
+# Ensure this change is documented in release notes and migration guides,
+# as it affects how telemetry data is exported.

[docs] eval SDK docs #2940

[docs] eval SDK docs #2940

Uh oh!

Conversation

jp-agenta commented Nov 12, 2025

Uh oh!

vercel bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Nov 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Nov 12, 2025 •

edited

Loading