[Docs] AGE-3418 Docs for evaluation SDK #2943

mmabrouk · 2025-11-12T12:13:44Z

This commit introduces a new guide on running evaluations using the Agenta SDK. It provides an overview of the process, enhancing user understanding of programmatic evaluation execution.

…r feedback and Langchain observability. Adjusted paths in existing documentation to reflect new file structure.

…atalayer/results path.

This commit introduces a new guide on managing testsets, including creating, listing, retrieving, and upserting testsets using the Agenta SDK. Additionally, a Jupyter notebook is added to demonstrate these functionalities with practical examples.

…entation and examples related to evaluation processes in Agenta.

This commit introduces a comprehensive guide on creating custom evaluators and utilizing built-in evaluators to assess application outputs. The new documentation covers the structure, inputs, return values, and practical examples for both custom and built-in evaluators, enhancing the overall understanding of evaluation processes in Agenta.

This commit introduces a new guide on defining and configuring applications for evaluation with the Agenta SDK. It covers the basic application structure, input handling, return values, and provides practical examples, enhancing user understanding of application setup and usage.

This commit introduces a new guide on running evaluations using the Agenta SDK. It provides an overview of the process, enhancing user understanding of programmatic evaluation execution.

linear · 2025-11-12T12:13:47Z

AGE-3418 Write new evaluation SDK docs

vercel · 2025-11-12T12:13:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
agenta-documentation	Ready	Preview	Comment	Nov 12, 2025 1:54pm

mmabrouk · 2025-11-12T12:14:10Z

#2932

…luation from SDK

… API key setup and refining the evaluation process steps. This update improves user experience by ensuring necessary credentials are configured for LLM-based evaluators and clarifies the overall evaluation workflow.

…-judge evaluator and refine testset description. This enhances clarity for users configuring evaluations with the Agenta SDK.

…arameters in the Agenta SDK. This update improves clarity for users running evaluations and ensures proper configuration of evaluation details.

…ameter and updating example outputs. This enhances clarity and consistency in the Jupyter notebook and markdown files related to testset creation.

… Agenta SDK This commit introduces two new guides: one for managing testsets, detailing creation, listing, and retrieval processes, and another for configuring evaluators, covering both custom and built-in evaluators. These additions enhance user understanding of evaluation workflows and improve the overall documentation structure by replacing the previous running evaluations guide.

… structure. This change enhances readability and maintains focus on the evaluation creation process using the Agenta SDK.

mmabrouk added 9 commits November 12, 2025 13:10

Add new category for 'Evaluation from SDK' in JSON configuration

622aed4

Add quick start guide and example notebook for Agenta SDK evaluations

e4680e0

Update tutorial links and add new Jupyter notebooks for capturing use…

b27daa8

…r feedback and Langchain observability. Adjusted paths in existing documentation to reflect new file structure.

Update .gitignore to include all files and directories in the tests/d…

5ec8ccf

…atalayer/results path.

Remove Jupyter notebook for evaluations with SDK, consolidating docum…

d974709

…entation and examples related to evaluation processes in Agenta.

Add documentation for running evaluations programmatically from the SDK

f963c72

This commit introduces a new guide on running evaluations using the Agenta SDK. It provides an overview of the process, enhancing user understanding of programmatic evaluation execution.

vercel bot had a problem deploying to Preview November 12, 2025 12:14 Failure

Update quick start guide to include troubleshooting resources for eva…

1141565

…luation from SDK

vercel bot deployed to Preview November 12, 2025 12:17 View deployment

mmabrouk added 4 commits November 12, 2025 14:07

Update quick start guide to include OpenAI API key setup for LLM-as-a…

ed0cfdc

…-judge evaluator and refine testset description. This enhances clarity for users configuring evaluations with the Agenta SDK.

Enhance quick start guide by adding evaluation name and description p…

ce934da

…arameters in the Agenta SDK. This update improves clarity for users running evaluations and ensures proper configuration of evaluation details.

Refactor testset management documentation by removing description par…

bd8d5e1

…ameter and updating example outputs. This enhances clarity and consistency in the Jupyter notebook and markdown files related to testset creation.

vercel bot deployed to Preview November 12, 2025 13:39 View deployment

vercel bot deployed to Preview November 12, 2025 13:51 View deployment

mmabrouk changed the title ~~AGE-3418-docs-for-new-evaluation-sdk~~ [Docs] AGE-3418 Docs for evaluation SDK Nov 12, 2025

Refactor quick start guide by removing redundant header and improving…

74e6e0d

… structure. This change enhances readability and maintains focus on the evaluation creation process using the Agenta SDK.

vercel bot deployed to Preview November 12, 2025 13:54 View deployment

mmabrouk marked this pull request as ready for review November 12, 2025 13:54

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Nov 12, 2025

mmabrouk requested a review from junaway November 12, 2025 13:54

mmabrouk changed the base branch from main to release/v0.62.2 November 12, 2025 13:55

mmabrouk changed the base branch from release/v0.62.2 to main November 12, 2025 13:55

mmabrouk enabled auto-merge November 12, 2025 13:55

ashrafchowdury approved these changes Nov 12, 2025

View reviewed changes

mmabrouk merged commit bc4cf0f into main Nov 12, 2025
11 checks passed

dosubot bot added documentation Improvements or additions to documentation Evaluation labels Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs] AGE-3418 Docs for evaluation SDK #2943

[Docs] AGE-3418 Docs for evaluation SDK #2943

Uh oh!

mmabrouk commented Nov 12, 2025

Uh oh!

linear bot commented Nov 12, 2025

Uh oh!

vercel bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

mmabrouk commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Docs] AGE-3418 Docs for evaluation SDK #2943

[Docs] AGE-3418 Docs for evaluation SDK #2943

Uh oh!

Conversation

mmabrouk commented Nov 12, 2025

Uh oh!

linear bot commented Nov 12, 2025

Uh oh!

vercel bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmabrouk commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Nov 12, 2025 •

edited

Loading