Skip to content

Conversation

@jp-agenta
Copy link
Member

[docs] eval SDK docs

Copilot AI review requested due to automatic review settings November 12, 2025 11:47
@vercel
Copy link

vercel bot commented Nov 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
agenta-documentation Error Error Nov 12, 2025 11:49am

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR contains documentation updates and code formatting improvements for the SDK evaluation system, along with version rollbacks and cleanup of deprecated features.

Key changes:

  • Added comprehensive SDK evaluation documentation with quick-start guides and detailed configuration pages
  • Reformatted multi-line assert statements and string concatenations for better readability
  • Rolled back version numbers from 0.62.1 to 0.61.0 across multiple packages
  • Removed deprecated LLM-as-a-judge output schema customization feature and related documentation
  • Reorganized example notebooks into subdirectories and added new evaluation quick-start notebook

Reviewed Changes

Copilot reviewed 284 out of 671 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
docs/docs/evaluation/evaluation-from-sdk/* New SDK evaluation documentation pages including quick-start, evaluator configuration, and testset management guides
examples/jupyter/evaluation/quick-start.ipynb New comprehensive evaluation quick-start notebook demonstrating SDK usage
sdk/pyproject.toml, api/pyproject.toml, web/ee/package.json Version rollback from 0.62.1 to 0.61.0
sdk/agenta/sdk/workflows/handlers.py Changed error handling from returning failure objects to raising exceptions
sdk/agenta/sdk/tracing/exporters.py Changed default OTLP async export from "true" to "false"
Multiple test files (sdk/tests/, api/oss/tests/) Reformatted multi-line assert statements for better readability
docs/blog/entries/customize-llm-as-a-judge-output-schemas.mdx Removed documentation for deprecated feature
hosting/docker-compose//docker-compose.yml Removed cron service configurations
api/oss/src/services/converters.py Deleted deprecated converter functions
Comments suppressed due to low confidence (1)

sdk/agenta/sdk/workflows/handlers.py:454

  • This change from returning {'success': False} to raising exceptions is a breaking behavioral change that could affect existing code. Ensure all callers are prepared to handle these exceptions instead of checking the success field in return values.
    if not isinstance(outputs, str) and not isinstance(outputs, dict):
        raise InvalidOutputsV0Error(expected=["dict", "str"], got=outputs)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ecdsa = "^0.19.1"
bson = "^0.5.10"
agenta = ">=0.61.0"
agenta = "^0.60.1"
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agenta dependency constraint ^0.60.1 is inconsistent with the API version being rolled back to 0.61.0 (line 3). Consider updating this constraint to ^0.61.0 or explaining why a lower version constraint is needed.

Suggested change
agenta = "^0.60.1"
agenta = "^0.61.0"

Copilot uses AI. Check for mistakes.


log = get_module_logger(__name__)

Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default from 'true' to 'false' for async export is a significant behavioral change that could impact performance characteristics. This should be documented in release notes and migration guides, as it affects how telemetry data is exported.

Suggested change
# NOTE: The default for async export has changed from 'true' to 'false'.
# This is a significant behavioral change that could impact performance characteristics.
# Ensure this change is documented in release notes and migration guides,
# as it affects how telemetry data is exported.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to 48
evaluation_db = await db_manager_ee.fetch_evaluation_by_id(
project_id=project_id,
evaluation_id=object_id,
)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from db_manager.fetch_evaluation_by_id to db_manager_ee.fetch_evaluation_by_id. This suggests the function was moved to the EE (Enterprise Edition) module. Verify that db_manager_ee is properly imported and available, otherwise this will cause a NameError.

Copilot uses AI. Check for mistakes.
@mmabrouk mmabrouk closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants