Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ Agenta is a platform for building production-grade LLM applications. It helps **
Collaborate with Subject Matter Experts (SMEs) on prompt engineering and make sure nothing breaks in production.

- **Interactive Playground**: Compare prompts side by side against your test cases
- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/adding-custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
- **Version Control**: Version prompts and configurations with branching and environments
- **Complex Configurations**: Enable SMEs to collaborate on [complex configuration schemas](https://docs.agenta.ai/custom-workflows/overview?utm_source=github&utm_medium=referral&utm_campaign=readme) beyond simple prompts

[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/overview?utm_source=github&utm_medium=referral&utm_campaign=readme)
[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/concepts?utm_source=github&utm_medium=referral&utm_campaign=readme)

### 📊 Evaluation & Testing
Evaluate your LLM applications systematically with both human and automated feedback.
Expand Down
2 changes: 1 addition & 1 deletion api/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "api"
version = "0.59.11"
version = "0.59.12"
description = "Agenta API"
authors = [
{ name = "Mahmoud Mabrouk", email = "[email protected]" },
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/entries/annotate-your-llm-response-preview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ This is useful to:
- Run custom evaluation workflows
- Measure application performance in real-time

Check out the how to [annotate traces from API](/evaluation/annotate-api) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
Check out the how to [annotate traces from API](/observability/trace-with-python-sdk/annotate-traces) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).

<Image
style={{
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/entries/multiple-metrics-in-human-evaluation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ This unlocks a whole new set of use cases:
- Use human evaluation to bootstrap automatic evaluation. You can annotate your outputs with the expected answer or a rubic, then use it to set up an automatic evaluation.


Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human_evaluation) to learn how to use the new human evaluation workflow.
Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human-evaluation/quick-start) to learn how to use the new human evaluation workflow.

---
4 changes: 2 additions & 2 deletions docs/blog/entries/observability-and-prompt-management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ We’ll publish a full blog post soon, but here’s a quick look at what the new

**Next: Prompt Management**

We’ve completely rewritten the [prompt management SDK](/prompt-engineering/overview), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
We’ve completely rewritten the [prompt management SDK](/prompt-engineering/managing-prompts-programatically/setup), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).

**And finally: LLM-as-a-Judge Overhaul**

Weve made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and were seeing better results with it.
We've made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/configure-evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.

<Image
style={{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integ

We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.

Detailed instructions and examples are available in our [distributed tracing documentation](/observability/opentelemetry).
Detailed instructions and examples are available in our [distributed tracing documentation](/observability/trace-with-opentelemetry/distributed-tracing).

**Improved Custom Workflows**:

Expand Down
2 changes: 1 addition & 1 deletion docs/blog/entries/prompt-and-configuration-registry.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ config = agenta.get_config(base_id="xxxxx", environment="production", cache_time

```

You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).

**Improvements**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ We're excited to announce two major features this week:

1. We've integrated [RAGAS evaluators](https://docs.ragas.io/) into agenta. Two new evaluators have been added: **RAG Faithfulness** (measuring how consistent the LLM output is with the context) and **Context Relevancy** (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.

[Check out the tutorial](/evaluation/evaluators/rag-evaluators) to learn how to use RAG evaluators.
[Check out the tutorial](/evaluation/configure-evaluators/rag-evaluators) to learn how to use RAG evaluators.

{" "}

Expand Down
2 changes: 1 addition & 1 deletion docs/blog/entries/speed-improvements-in-the-playground.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ tags: [v0.52.5]
We rewrote most of Agenta's frontend. You'll see much faster speeds when you create prompts or use the playground.

We also made many improvements and fixed bugs:
- [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
- [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
- You can now use [an external Redis instance](/self-host/configuration#redis-caching) for caching by setting it as an environment variable
- Fixed the [custom workflow quick start tutorial](/custom-workflows/quick-start) and examples
- Fixed SDK compatibility issues with Python 3.9
Expand Down
2 changes: 1 addition & 1 deletion docs/blog/entries/vertex-ai-provider-support.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ To get started with Vertex AI, go to Settings → Model Hub and add your Vertex
- **Vertex Location**: The region for your models (e.g., `us-central1`, `europe-west4`)
- **Vertex Credentials**: Your service account key in JSON format

For detailed setup instructions, see our [documentation on adding custom providers](/prompt-engineering/playground/adding-custom-providers#configuring-vertex-ai).
For detailed setup instructions, see our [documentation on adding custom providers](/prompt-engineering/playground/custom-providers#configuring-vertex-ai).

### Security

Expand Down
18 changes: 9 additions & 9 deletions docs/blog/main.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ _24 October 2025_

We've added support for Google Cloud's Vertex AI platform. You can now use Gemini models and other Vertex AI partner models in the playground, configure them in the Model Hub, and access them through the Gateway using InVoke endpoints.

Check out the documentation for [configuring Vertex AI models](/prompt-engineering/playground/adding-custom-providers#configuring-vertex-ai).
Check out the documentation for [configuring Vertex AI models](/prompt-engineering/playground/custom-providers#configuring-vertex-ai).

---

Expand Down Expand Up @@ -98,7 +98,7 @@ We also made many improvements and fixed bugs:

**Improvements:**

- [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
- [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.

**Self-hosting:**

Expand All @@ -123,7 +123,7 @@ We rebuilt the human evaluation workflow from scratch. Now you can set multiple

This lets you evaluate the same output on different metrics like **relevance** or **completeness**. You can also create binary, numerical scores, or even use strings for **comments** or **expected answer**.

Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human_evaluation) to learn how to use the new human evaluation workflow.
Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human-evaluation/quick-start) to learn how to use the new human evaluation workflow.

<div style={{display: 'flex', justifyContent: 'center', marginTop: "20px", marginBottom: "20px", flexDirection: 'column', alignItems: 'center'}}>
<iframe
Expand Down Expand Up @@ -252,7 +252,7 @@ This is useful to:
- Run custom evaluation workflows
- Measure application performance in real-time

Check out the how to [annotate traces from API](/evaluation/annotate-api) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
Check out the how to [annotate traces from API](/observability/trace-with-python-sdk/annotate-traces) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).

<Image
style={{
Expand Down Expand Up @@ -449,7 +449,7 @@ Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integ

We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.

Detailed instructions and examples are available in our [distributed tracing documentation](/observability/opentelemetry).
Detailed instructions and examples are available in our [distributed tracing documentation](/observability/trace-with-opentelemetry/distributed-tracing).

**Improved Custom Workflows**:

Expand Down Expand Up @@ -676,11 +676,11 @@ We’ll publish a full blog post soon, but here’s a quick look at what the new

**Next: Prompt Management**

We’ve completely rewritten the [prompt management SDK](/prompt-engineering/overview), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
We’ve completely rewritten the [prompt management SDK](/prompt-engineering/managing-prompts-programatically/setup), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).

**And finally: LLM-as-a-Judge Overhaul**

Weve made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and were seeing better results with it.
We've made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/configure-evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.

<Image
style={{
Expand Down Expand Up @@ -822,7 +822,7 @@ We're excited to announce two major features this week:

1. We've integrated [RAGAS evaluators](https://docs.ragas.io/) into agenta. Two new evaluators have been added: **RAG Faithfulness** (measuring how consistent the LLM output is with the context) and **Context Relevancy** (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.

[Check out the tutorial](/evaluation/evaluators/rag-evaluators) to learn how to use RAG evaluators.
[Check out the tutorial](/evaluation/configure-evaluators/rag-evaluators) to learn how to use RAG evaluators.

{" "}

Expand Down Expand Up @@ -999,7 +999,7 @@ config = agenta.get_config(base_id="xxxxx", environment="production", cache_time

```

You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).

**Improvements**

Expand Down
Loading