-
Notifications
You must be signed in to change notification settings - Fork 455
[docs] eval SDK docs #2940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] eval SDK docs #2940
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR contains documentation updates and code formatting improvements for the SDK evaluation system, along with version rollbacks and cleanup of deprecated features.
Key changes:
- Added comprehensive SDK evaluation documentation with quick-start guides and detailed configuration pages
- Reformatted multi-line assert statements and string concatenations for better readability
- Rolled back version numbers from 0.62.1 to 0.61.0 across multiple packages
- Removed deprecated LLM-as-a-judge output schema customization feature and related documentation
- Reorganized example notebooks into subdirectories and added new evaluation quick-start notebook
Reviewed Changes
Copilot reviewed 284 out of 671 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/docs/evaluation/evaluation-from-sdk/* | New SDK evaluation documentation pages including quick-start, evaluator configuration, and testset management guides |
| examples/jupyter/evaluation/quick-start.ipynb | New comprehensive evaluation quick-start notebook demonstrating SDK usage |
| sdk/pyproject.toml, api/pyproject.toml, web/ee/package.json | Version rollback from 0.62.1 to 0.61.0 |
| sdk/agenta/sdk/workflows/handlers.py | Changed error handling from returning failure objects to raising exceptions |
| sdk/agenta/sdk/tracing/exporters.py | Changed default OTLP async export from "true" to "false" |
| Multiple test files (sdk/tests/, api/oss/tests/) | Reformatted multi-line assert statements for better readability |
| docs/blog/entries/customize-llm-as-a-judge-output-schemas.mdx | Removed documentation for deprecated feature |
| hosting/docker-compose//docker-compose.yml | Removed cron service configurations |
| api/oss/src/services/converters.py | Deleted deprecated converter functions |
Comments suppressed due to low confidence (1)
sdk/agenta/sdk/workflows/handlers.py:454
- This change from returning
{'success': False}to raising exceptions is a breaking behavioral change that could affect existing code. Ensure all callers are prepared to handle these exceptions instead of checking the success field in return values.
if not isinstance(outputs, str) and not isinstance(outputs, dict):
raise InvalidOutputsV0Error(expected=["dict", "str"], got=outputs)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ecdsa = "^0.19.1" | ||
| bson = "^0.5.10" | ||
| agenta = ">=0.61.0" | ||
| agenta = "^0.60.1" |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agenta dependency constraint ^0.60.1 is inconsistent with the API version being rolled back to 0.61.0 (line 3). Consider updating this constraint to ^0.61.0 or explaining why a lower version constraint is needed.
| agenta = "^0.60.1" | |
| agenta = "^0.61.0" |
|
|
||
|
|
||
| log = get_module_logger(__name__) | ||
|
|
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default from 'true' to 'false' for async export is a significant behavioral change that could impact performance characteristics. This should be documented in release notes and migration guides, as it affects how telemetry data is exported.
| # NOTE: The default for async export has changed from 'true' to 'false'. | |
| # This is a significant behavioral change that could impact performance characteristics. | |
| # Ensure this change is documented in release notes and migration guides, | |
| # as it affects how telemetry data is exported. |
| evaluation_db = await db_manager_ee.fetch_evaluation_by_id( | ||
| project_id=project_id, | ||
| evaluation_id=object_id, | ||
| ) |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from db_manager.fetch_evaluation_by_id to db_manager_ee.fetch_evaluation_by_id. This suggests the function was moved to the EE (Enterprise Edition) module. Verify that db_manager_ee is properly imported and available, otherwise this will cause a NameError.
[docs] eval SDK docs