Agenta-AI · jp-agenta · Oct 31, 2025 · Oct 31, 2025 · Oct 31, 2025
diff --git a/README.md b/README.md
@@ -83,11 +83,11 @@ Agenta is a platform for building production-grade LLM applications. It helps **
 Collaborate with Subject Matter Experts (SMEs) on prompt engineering and make sure nothing breaks in production.
 
 - **Interactive Playground**: Compare prompts side by side against your test cases
-- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/adding-custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
+- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
 - **Version Control**: Version prompts and configurations with branching and environments
 - **Complex Configurations**: Enable SMEs to collaborate on [complex configuration schemas](https://docs.agenta.ai/custom-workflows/overview?utm_source=github&utm_medium=referral&utm_campaign=readme) beyond simple prompts
 
-[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/overview?utm_source=github&utm_medium=referral&utm_campaign=readme)
+[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/concepts?utm_source=github&utm_medium=referral&utm_campaign=readme)
 
 ### 📊 Evaluation & Testing
 Evaluate your LLM applications systematically with both human and automated feedback.

diff --git a/api/pyproject.toml b/api/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "api"
-version = "0.59.11"
+version = "0.59.12"
 description = "Agenta API"
 authors = [
     { name = "Mahmoud Mabrouk", email = "[email protected]" },

diff --git a/docs/blog/entries/annotate-your-llm-response-preview.mdx b/docs/blog/entries/annotate-your-llm-response-preview.mdx
@@ -20,7 +20,7 @@ This is useful to:
 - Run custom evaluation workflows
 - Measure application performance in real-time
 
-Check out the how to [annotate traces from API](/evaluation/annotate-api) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
+Check out the how to [annotate traces from API](/observability/trace-with-python-sdk/annotate-traces) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
 
 <Image
   style={{

diff --git a/docs/blog/entries/multiple-metrics-in-human-evaluation.mdx b/docs/blog/entries/multiple-metrics-in-human-evaluation.mdx
@@ -22,6 +22,6 @@ This unlocks a whole new set of use cases:
 - Use human evaluation to bootstrap automatic evaluation. You can annotate your outputs with the expected answer or a rubic, then use it to set up an automatic evaluation.
 
 
-Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human_evaluation) to learn how to use the new human evaluation workflow.
+Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human-evaluation/quick-start) to learn how to use the new human evaluation workflow.
 
 ---
diff --git a/docs/blog/entries/observability-and-prompt-management.mdx b/docs/blog/entries/observability-and-prompt-management.mdx
@@ -40,11 +40,11 @@ We’ll publish a full blog post soon, but here’s a quick look at what the new
 
 **Next: Prompt Management**
 
-We’ve completely rewritten the [prompt management SDK](/prompt-engineering/overview), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
+We’ve completely rewritten the [prompt management SDK](/prompt-engineering/managing-prompts-programatically/setup), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
 
 **And finally: LLM-as-a-Judge Overhaul**
 
-We’ve made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we’re seeing better results with it.
+We've made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/configure-evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.
 
 <Image
   style={{

diff --git a/docs/blog/entries/opentelemetry-compliance-and-custom-workflows-from-api.mdx b/docs/blog/entries/opentelemetry-compliance-and-custom-workflows-from-api.mdx
@@ -18,7 +18,7 @@ Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integ
 
 We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.
 
-Detailed instructions and examples are available in our [distributed tracing documentation](/observability/opentelemetry).
+Detailed instructions and examples are available in our [distributed tracing documentation](/observability/trace-with-opentelemetry/distributed-tracing).
 
 **Improved Custom Workflows**:
 

diff --git a/docs/blog/entries/prompt-and-configuration-registry.mdx b/docs/blog/entries/prompt-and-configuration-registry.mdx
@@ -21,7 +21,7 @@ config = agenta.get_config(base_id="xxxxx", environment="production", cache_time
 
 ```
 
-You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
+You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).
 
 **Improvements**
 

diff --git a/docs/blog/entries/ragas-evaluators-and-traces-in-the-playground.mdx b/docs/blog/entries/ragas-evaluators-and-traces-in-the-playground.mdx
@@ -15,7 +15,7 @@ We're excited to announce two major features this week:
 
 1. We've integrated [RAGAS evaluators](https://docs.ragas.io/) into agenta. Two new evaluators have been added: **RAG Faithfulness** (measuring how consistent the LLM output is with the context) and **Context Relevancy** (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.
 
-   [Check out the tutorial](/evaluation/evaluators/rag-evaluators) to learn how to use RAG evaluators.
+   [Check out the tutorial](/evaluation/configure-evaluators/rag-evaluators) to learn how to use RAG evaluators.
 
 {" "}
 

diff --git a/docs/blog/entries/speed-improvements-in-the-playground.mdx b/docs/blog/entries/speed-improvements-in-the-playground.mdx
@@ -9,7 +9,7 @@ tags: [v0.52.5]
 We rewrote most of Agenta's frontend. You'll see much faster speeds when you create prompts or use the playground.
 
 We also made many improvements and fixed bugs:
-  - [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
+  - [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
   - You can now use [an external Redis instance](/self-host/configuration#redis-caching) for caching by setting it as an environment variable
   - Fixed the [custom workflow quick start tutorial](/custom-workflows/quick-start) and examples
   - Fixed SDK compatibility issues with Python 3.9

diff --git a/docs/blog/entries/vertex-ai-provider-support.mdx b/docs/blog/entries/vertex-ai-provider-support.mdx
@@ -29,7 +29,7 @@ To get started with Vertex AI, go to Settings → Model Hub and add your Vertex
 - **Vertex Location**: The region for your models (e.g., `us-central1`, `europe-west4`)
 - **Vertex Credentials**: Your service account key in JSON format
 
-For detailed setup instructions, see our [documentation on adding custom providers](/prompt-engineering/playground/adding-custom-providers#configuring-vertex-ai).
+For detailed setup instructions, see our [documentation on adding custom providers](/prompt-engineering/playground/custom-providers#configuring-vertex-ai).
 
 ### Security
 

diff --git a/docs/blog/main.mdx b/docs/blog/main.mdx
@@ -18,7 +18,7 @@ _24 October 2025_
 
 We've added support for Google Cloud's Vertex AI platform. You can now use Gemini models and other Vertex AI partner models in the playground, configure them in the Model Hub, and access them through the Gateway using InVoke endpoints.
 
-Check out the documentation for [configuring Vertex AI models](/prompt-engineering/playground/adding-custom-providers#configuring-vertex-ai).
+Check out the documentation for [configuring Vertex AI models](/prompt-engineering/playground/custom-providers#configuring-vertex-ai).
 
 ---
 
@@ -98,7 +98,7 @@ We also made many improvements and fixed bugs:
 
 **Improvements:**
 
-- [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
+- [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
 
 **Self-hosting:**
 
@@ -123,7 +123,7 @@ We rebuilt the human evaluation workflow from scratch. Now you can set multiple
 
 This lets you evaluate the same output on different metrics like **relevance** or **completeness**. You can also create binary, numerical scores, or even use strings for **comments** or **expected answer**. 
 
-Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human_evaluation) to learn how to use the new human evaluation workflow.
+Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human-evaluation/quick-start) to learn how to use the new human evaluation workflow.
 
 <div style={{display: 'flex', justifyContent: 'center', marginTop: "20px", marginBottom: "20px", flexDirection: 'column', alignItems: 'center'}}>
   <iframe
@@ -252,7 +252,7 @@ This is useful to:
 - Run custom evaluation workflows
 - Measure application performance in real-time
 
-Check out the how to [annotate traces from API](/evaluation/annotate-api) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
+Check out the how to [annotate traces from API](/observability/trace-with-python-sdk/annotate-traces) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
 
 <Image
   style={{
@@ -449,7 +449,7 @@ Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integ
 
 We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.
 
-Detailed instructions and examples are available in our [distributed tracing documentation](/observability/opentelemetry).
+Detailed instructions and examples are available in our [distributed tracing documentation](/observability/trace-with-opentelemetry/distributed-tracing).
 
 **Improved Custom Workflows**:
 
@@ -676,11 +676,11 @@ We’ll publish a full blog post soon, but here’s a quick look at what the new
 
 **Next: Prompt Management**
 
-We’ve completely rewritten the [prompt management SDK](/prompt-engineering/overview), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
+We’ve completely rewritten the [prompt management SDK](/prompt-engineering/managing-prompts-programatically/setup), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
 
 **And finally: LLM-as-a-Judge Overhaul**
 
-We’ve made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we’re seeing better results with it.
+We've made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/configure-evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.
 
 <Image
   style={{
@@ -822,7 +822,7 @@ We're excited to announce two major features this week:
 
 1. We've integrated [RAGAS evaluators](https://docs.ragas.io/) into agenta. Two new evaluators have been added: **RAG Faithfulness** (measuring how consistent the LLM output is with the context) and **Context Relevancy** (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.
 
-   [Check out the tutorial](/evaluation/evaluators/rag-evaluators) to learn how to use RAG evaluators.
+   [Check out the tutorial](/evaluation/configure-evaluators/rag-evaluators) to learn how to use RAG evaluators.
 
 {" "}
 
@@ -999,7 +999,7 @@ config = agenta.get_config(base_id="xxxxx", environment="production", cache_time
 
 ```
 
-You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
+You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).
 
 **Improvements**
-Original file line number
+Diff line change
@@ Expand Up @@
     ```
-    You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
+    You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).
     **Improvements**
@@ Expand Down @@