diff --git a/README.md b/README.md
index 6f9ef58382..e1ca6f49e6 100644
--- a/README.md
+++ b/README.md
@@ -83,11 +83,11 @@ Agenta is a platform for building production-grade LLM applications. It helps **
 Collaborate with Subject Matter Experts (SMEs) on prompt engineering and make sure nothing breaks in production.
 
 - **Interactive Playground**: Compare prompts side by side against your test cases
-- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/adding-custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
+- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
 - **Version Control**: Version prompts and configurations with branching and environments
 - **Complex Configurations**: Enable SMEs to collaborate on [complex configuration schemas](https://docs.agenta.ai/custom-workflows/overview?utm_source=github&utm_medium=referral&utm_campaign=readme) beyond simple prompts
 
-[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/overview?utm_source=github&utm_medium=referral&utm_campaign=readme)
+[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/concepts?utm_source=github&utm_medium=referral&utm_campaign=readme)
 
 ### 📊 Evaluation & Testing
 Evaluate your LLM applications systematically with both human and automated feedback.
diff --git a/api/pyproject.toml b/api/pyproject.toml
index 35b6013d1b..ad5cf17108 100644
--- a/api/pyproject.toml
+++ b/api/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "api"
-version = "0.59.11"
+version = "0.59.12"
 description = "Agenta API"
 authors = [
     { name = "Mahmoud Mabrouk", email = "mahmoud@agenta.ai" },
diff --git a/docs/blog/entries/annotate-your-llm-response-preview.mdx b/docs/blog/entries/annotate-your-llm-response-preview.mdx
index 7c70f71e8d..ac2443ce74 100644
--- a/docs/blog/entries/annotate-your-llm-response-preview.mdx
+++ b/docs/blog/entries/annotate-your-llm-response-preview.mdx
@@ -20,7 +20,7 @@ This is useful to:
 - Run custom evaluation workflows
 - Measure application performance in real-time
 
-Check out the how to [annotate traces from API](/evaluation/annotate-api) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
+Check out the how to [annotate traces from API](/observability/trace-with-python-sdk/annotate-traces) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
 
 <Image
   style={{
diff --git a/docs/blog/entries/multiple-metrics-in-human-evaluation.mdx b/docs/blog/entries/multiple-metrics-in-human-evaluation.mdx
index 211ec1b0ca..203f3c8db9 100644
--- a/docs/blog/entries/multiple-metrics-in-human-evaluation.mdx
+++ b/docs/blog/entries/multiple-metrics-in-human-evaluation.mdx
@@ -22,6 +22,6 @@ This unlocks a whole new set of use cases:
 - Use human evaluation to bootstrap automatic evaluation. You can annotate your outputs with the expected answer or a rubic, then use it to set up an automatic evaluation.
 
 
-Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human_evaluation) to learn how to use the new human evaluation workflow.
+Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human-evaluation/quick-start) to learn how to use the new human evaluation workflow.
 
 ---
\ No newline at end of file
diff --git a/docs/blog/entries/observability-and-prompt-management.mdx b/docs/blog/entries/observability-and-prompt-management.mdx
index d6cbb6d1a2..e36a5960d2 100644
--- a/docs/blog/entries/observability-and-prompt-management.mdx
+++ b/docs/blog/entries/observability-and-prompt-management.mdx
@@ -40,11 +40,11 @@ We’ll publish a full blog post soon, but here’s a quick look at what the new
 
 **Next: Prompt Management**
 
-We’ve completely rewritten the [prompt management SDK](/prompt-engineering/overview), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
+We’ve completely rewritten the [prompt management SDK](/prompt-engineering/managing-prompts-programatically/setup), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
 
 **And finally: LLM-as-a-Judge Overhaul**
 
-We’ve made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we’re seeing better results with it.
+We've made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/configure-evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.
 
 <Image
   style={{
diff --git a/docs/blog/entries/opentelemetry-compliance-and-custom-workflows-from-api.mdx b/docs/blog/entries/opentelemetry-compliance-and-custom-workflows-from-api.mdx
index bf690eafe1..e07aa29de8 100644
--- a/docs/blog/entries/opentelemetry-compliance-and-custom-workflows-from-api.mdx
+++ b/docs/blog/entries/opentelemetry-compliance-and-custom-workflows-from-api.mdx
@@ -18,7 +18,7 @@ Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integ
 
 We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.
 
-Detailed instructions and examples are available in our [distributed tracing documentation](/observability/opentelemetry).
+Detailed instructions and examples are available in our [distributed tracing documentation](/observability/trace-with-opentelemetry/distributed-tracing).
 
 **Improved Custom Workflows**:
 
diff --git a/docs/blog/entries/prompt-and-configuration-registry.mdx b/docs/blog/entries/prompt-and-configuration-registry.mdx
index fd4668aa64..e5ee11bbc7 100644
--- a/docs/blog/entries/prompt-and-configuration-registry.mdx
+++ b/docs/blog/entries/prompt-and-configuration-registry.mdx
@@ -21,7 +21,7 @@ config = agenta.get_config(base_id="xxxxx", environment="production", cache_time
 
 ```
 
-You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
+You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).
 
 **Improvements**
 
diff --git a/docs/blog/entries/ragas-evaluators-and-traces-in-the-playground.mdx b/docs/blog/entries/ragas-evaluators-and-traces-in-the-playground.mdx
index cc3e8d9bf9..4dedfb29ff 100644
--- a/docs/blog/entries/ragas-evaluators-and-traces-in-the-playground.mdx
+++ b/docs/blog/entries/ragas-evaluators-and-traces-in-the-playground.mdx
@@ -15,7 +15,7 @@ We're excited to announce two major features this week:
 
 1. We've integrated [RAGAS evaluators](https://docs.ragas.io/) into agenta. Two new evaluators have been added: **RAG Faithfulness** (measuring how consistent the LLM output is with the context) and **Context Relevancy** (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.
 
-   [Check out the tutorial](/evaluation/evaluators/rag-evaluators) to learn how to use RAG evaluators.
+   [Check out the tutorial](/evaluation/configure-evaluators/rag-evaluators) to learn how to use RAG evaluators.
 
 {" "}
 
diff --git a/docs/blog/entries/speed-improvements-in-the-playground.mdx b/docs/blog/entries/speed-improvements-in-the-playground.mdx
index e7c8105227..7473fd3467 100644
--- a/docs/blog/entries/speed-improvements-in-the-playground.mdx
+++ b/docs/blog/entries/speed-improvements-in-the-playground.mdx
@@ -9,7 +9,7 @@ tags: [v0.52.5]
 We rewrote most of Agenta's frontend. You'll see much faster speeds when you create prompts or use the playground.
 
 We also made many improvements and fixed bugs:
-  - [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
+  - [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
   - You can now use [an external Redis instance](/self-host/configuration#redis-caching) for caching by setting it as an environment variable
   - Fixed the [custom workflow quick start tutorial](/custom-workflows/quick-start) and examples
   - Fixed SDK compatibility issues with Python 3.9
diff --git a/docs/blog/entries/vertex-ai-provider-support.mdx b/docs/blog/entries/vertex-ai-provider-support.mdx
index d42a7b20fd..1c4cbc6767 100644
--- a/docs/blog/entries/vertex-ai-provider-support.mdx
+++ b/docs/blog/entries/vertex-ai-provider-support.mdx
@@ -29,7 +29,7 @@ To get started with Vertex AI, go to Settings → Model Hub and add your Vertex
 - **Vertex Location**: The region for your models (e.g., `us-central1`, `europe-west4`)
 - **Vertex Credentials**: Your service account key in JSON format
 
-For detailed setup instructions, see our [documentation on adding custom providers](/prompt-engineering/playground/adding-custom-providers#configuring-vertex-ai).
+For detailed setup instructions, see our [documentation on adding custom providers](/prompt-engineering/playground/custom-providers#configuring-vertex-ai).
 
 ### Security
 
diff --git a/docs/blog/main.mdx b/docs/blog/main.mdx
index 3bff93ae36..6c6187423a 100644
--- a/docs/blog/main.mdx
+++ b/docs/blog/main.mdx
@@ -18,7 +18,7 @@ _24 October 2025_
 
 We've added support for Google Cloud's Vertex AI platform. You can now use Gemini models and other Vertex AI partner models in the playground, configure them in the Model Hub, and access them through the Gateway using InVoke endpoints.
 
-Check out the documentation for [configuring Vertex AI models](/prompt-engineering/playground/adding-custom-providers#configuring-vertex-ai).
+Check out the documentation for [configuring Vertex AI models](/prompt-engineering/playground/custom-providers#configuring-vertex-ai).
 
 ---
 
@@ -98,7 +98,7 @@ We also made many improvements and fixed bugs:
 
 **Improvements:**
 
-- [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
+- [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge) now uses double curly braces `{{}}` instead of single curly braces `{` and `}`. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.
 
 **Self-hosting:**
 
@@ -123,7 +123,7 @@ We rebuilt the human evaluation workflow from scratch. Now you can set multiple
 
 This lets you evaluate the same output on different metrics like **relevance** or **completeness**. You can also create binary, numerical scores, or even use strings for **comments** or **expected answer**. 
 
-Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human_evaluation) to learn how to use the new human evaluation workflow.
+Watch the video below and read the [post](/changelog/multiple-metrics-in-human-evaluation) for more details. Or check out the [docs](/evaluation/human-evaluation/quick-start) to learn how to use the new human evaluation workflow.
 
 <div style={{display: 'flex', justifyContent: 'center', marginTop: "20px", marginBottom: "20px", flexDirection: 'column', alignItems: 'center'}}>
   <iframe
@@ -252,7 +252,7 @@ This is useful to:
 - Run custom evaluation workflows
 - Measure application performance in real-time
 
-Check out the how to [annotate traces from API](/evaluation/annotate-api) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
+Check out the how to [annotate traces from API](/observability/trace-with-python-sdk/annotate-traces) for more details. Or try our new tutorial (available as [jupyter notebook](https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb)) [here](/tutorials/cookbooks/capture-user-feedback).
 
 <Image
   style={{
@@ -449,7 +449,7 @@ Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integ
 
 We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.
 
-Detailed instructions and examples are available in our [distributed tracing documentation](/observability/opentelemetry).
+Detailed instructions and examples are available in our [distributed tracing documentation](/observability/trace-with-opentelemetry/distributed-tracing).
 
 **Improved Custom Workflows**:
 
@@ -676,11 +676,11 @@ We’ll publish a full blog post soon, but here’s a quick look at what the new
 
 **Next: Prompt Management**
 
-We’ve completely rewritten the [prompt management SDK](/prompt-engineering/overview), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
+We’ve completely rewritten the [prompt management SDK](/prompt-engineering/managing-prompts-programatically/setup), giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this [here](/tutorials/sdk/manage-prompts-with-SDK).
 
 **And finally: LLM-as-a-Judge Overhaul**
 
-We’ve made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we’re seeing better results with it.
+We've made significant upgrades to the [LLM-as-a-Judge evaluator](/evaluation/configure-evaluators/llm-as-a-judge). It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.
 
 <Image
   style={{
@@ -822,7 +822,7 @@ We're excited to announce two major features this week:
 
 1. We've integrated [RAGAS evaluators](https://docs.ragas.io/) into agenta. Two new evaluators have been added: **RAG Faithfulness** (measuring how consistent the LLM output is with the context) and **Context Relevancy** (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.
 
-   [Check out the tutorial](/evaluation/evaluators/rag-evaluators) to learn how to use RAG evaluators.
+   [Check out the tutorial](/evaluation/configure-evaluators/rag-evaluators) to learn how to use RAG evaluators.
 
 {" "}
 
@@ -999,7 +999,7 @@ config = agenta.get_config(base_id="xxxxx", environment="production", cache_time
 
 ```
 
-You can find additional documentation [here](/prompt-engineering/prompt-management/how-to-integrate-with-agenta).
+You can find additional documentation [here](/prompt-engineering/integrating-prompts/integrating-with-agenta).
 
 **Improvements**
 
diff --git a/docs/docs/evaluation/01-overview.mdx b/docs/docs/evaluation/01-overview.mdx
deleted file mode 100644
index d1ca0d4e50..0000000000
--- a/docs/docs/evaluation/01-overview.mdx
+++ /dev/null
@@ -1,90 +0,0 @@
----
-title: "Overview"
-description: Systematically evaluate your LLM applications and compare their performance.
-sidebar_position: 1
----
-
-```mdx-code-block
-import DocCard from '@theme/DocCard';
-import clsx from 'clsx';
-
-```
-
-The key to building production-ready LLM applications is to have a tight feedback loop of prompt engineering and evaluation. Whether you are optimizing a chatbot, working on Retrieval-Augmented Generation (RAG), or fine-tuning a text generation task, evaluation is a critical step to ensure consistent performance across different inputs, models, and parameters. In this section, we explain how to use agenta to quickly evaluate and compare the performance of your LLM applications.
-
-### Set up evaluation
-
-<section className='row'>
-<article key='1' className="col col--6 margin-bottom--lg">
-
-  <DocCard
-    item={{
-      type: "link",
-      href: "/evaluation/configure-evaluators",
-      label: "Configure Evaluators",
-      description: "Configure evaluators for your use case",
-    }}
-  />
-  </article>
-
-  <article key='2' className="col col--6 margin-bottom--lg">
-  <DocCard
-    item={{
-      type: "link",
-      href: "/evaluation/create-test-sets",
-      label: "Create Test Sets",
-      description: "Create Test Sets",
-    }}
-  />
-  </article>
-  </section>
-
-### Run evaluations
-
-  <section className='row'>
-
-<article key="1" className="col col--6 margin-bottom--lg">
-  <DocCard
-    item={{
-      type: "link",
-      href: "/evaluation/no-code-evaluation",
-      label: "Run Evaluations from the web UI",
-      description: "Learn about the evaluation process in Agenta",
-    }}
-  />
-</article>
-
-  <article key='2' className="col col--6 margin-bottom--lg">
-  <DocCard
-    item={{
-      type: "link",
-      href: "/evaluation/sdk-evaluation",
-      label: "Run Evaluations with the SDK",
-      description: "Learn about the evaluation process in Agenta",
-    }}
-    />
-  </article>
-  </section>
-
-### Available evaluators
-
-| **Evaluator Name**                                                                                | **Use Case**                     | **Type**           | **Description**                                                                  |
-| ------------------------------------------------------------------------------------------------- | -------------------------------- | ------------------ | -------------------------------------------------------------------------------- |
-| [Exact Match](/evaluation/evaluators/classification-entiry-extraction#exact-match)                | Classification/Entity Extraction | Pattern Matching   | Checks if the output exactly matches the expected result.                        |
-| [Contains JSON](/evaluation/evaluators/classification-entiry-extraction#contains-json)            | Classification/Entity Extraction | Pattern Matching   | Ensures the output contains valid JSON.                                          |
-| [Regex Test](/evaluation/evaluators/pattern-matching#regular-expression)                          | Classification/Entity Extraction | Pattern Matching   | Checks if the output matches a given regex pattern.                              |
-| [JSON Field Match](/evaluation/evaluators/classification-entiry-extraction#json-field-match)      | Classification/Entity Extraction | Pattern Matching   | Compares specific fields within JSON data.                                       |
-| [JSON Diff Match](/evaluation/evaluators/classification-entiry-extraction#json-diff-match)        | Classification/Entity Extraction | Similarity Metrics | Compares generated JSON with a ground truth JSON based on schema or values.      |
-| [Similarity Match](/evaluation/evaluators/semantic-similarity#similarity-match)                   | Text Generation / Chatbot        | Similarity Metrics | Compares generated output with expected using Jaccard similarity.                |
-| [Semantic Similarity Match](/evaluation/evaluators/semantic-similarity#semantic-similarity-match) | Text Generation / Chatbot        | Semantic Analysis  | Compares the meaning of the generated output with the expected result.           |
-| [Starts With](/evaluation/evaluators/pattern-matching#starts-with)                                | Text Generation / Chatbot        | Pattern Matching   | Checks if the output starts with a specified prefix.                             |
-| [Ends With](/evaluation/evaluators/pattern-matching#ends-with)                                    | Text Generation / Chatbot        | Pattern Matching   | Checks if the output ends with a specified suffix.                               |
-| [Contains](/evaluation/evaluators/pattern-matching#contains)                                      | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains a specific substring.                              |
-| [Contains Any](/evaluation/evaluators/pattern-matching#contains-any)                              | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains any of a list of substrings.                       |
-| [Contains All](/evaluation/evaluators/pattern-matching#contains-all)                              | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains all of a list of substrings.                       |
-| [Levenshtein Distance](/evaluation/evaluators/semantic-similarity#levenshtein-distance)           | Text Generation / Chatbot        | Similarity Metrics | Calculates the Levenshtein distance between output and expected result.          |
-| [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge)                                           | Text Generation / Chatbot        | LLM-based          | Sends outputs to an LLM model for critique and evaluation.                       |
-| [RAG Faithfulness](/evaluation/evaluators/rag-evaluators)                                         | RAG / Text Generation / Chatbot  | LLM-based          | Evaluates if the output is faithful to the retrieved documents in RAG workflows. |
-| [RAG Context Relevancy](/evaluation/evaluators/rag-evaluators)                                    | RAG / Text Generation / Chatbot  | LLM-based          | Measures the relevancy of retrieved documents to the given question in RAG.      |
-| [Custom Code Evaluation](/evaluation/evaluators/custom-evaluator)                                 | Custom Logic                     | Custom             | Allows users to define their own evaluator in Python.                            |
-| [Webhook Evaluator](/evaluation/evaluators/webhook-evaluator)                                     | Custom Logic                     | Custom             | Sends output to a webhook for external evaluation.                               |
diff --git a/docs/docs/evaluation/01-quick-start-ui.mdx b/docs/docs/evaluation/01-quick-start-ui.mdx
new file mode 100644
index 0000000000..cbcdacc035
--- /dev/null
+++ b/docs/docs/evaluation/01-quick-start-ui.mdx
@@ -0,0 +1,10 @@
+---
+title: "Quick Start: Evaluation from UI"
+sidebar_label: "Quick Start (UI)"
+description: "Get started with evaluating your LLM applications using the Agenta web interface"
+sidebar_position: 1
+---
+
+import { Redirect } from '@docusaurus/router';
+
+<Redirect to="/evaluation/evaluation-from-ui/quick-start" />
diff --git a/docs/docs/evaluation/02-create-test-sets.mdx b/docs/docs/evaluation/02-create-test-sets.mdx
deleted file mode 100644
index 0681a073af..0000000000
--- a/docs/docs/evaluation/02-create-test-sets.mdx
+++ /dev/null
@@ -1,192 +0,0 @@
----
-title: "Create Test Sets"
----
-
-```mdx-code-block
-import { Stream } from '@cloudflare/stream-react';
-import Image from "@theme/IdealImage";
-```
-
-This guide outlines the various methods for creating test sets in Agenta and provides specifications for the test set schema.
-
-Test sets are one of the most critical components for building reliable LLM-powered applications. They allow you to evaluate your application, find edge cases, prevent regressions, and systematically improve performance over time.
-
-## What is a Test Set?
-
-A test set is a collection of test cases, each containing:
-
-- **Inputs**: The data your LLM application expects (required)
-- **Ground Truth**: The expected answer from your application (optional, often stored as "correct_answer")
-- **Annotations**: Additional metadata or rules about the test case (optional)
-
-You can create a test set in Agenta using the following methods:
-
-- [By uploading a CSV or JSON file](#creating-a-test-set-from-a-csv-or-json)
-- [Using the API](#creating-a-test-set-using-the-api)
-- [Using the UI](#creatingediting-a-test-set-from-the-ui)
-- [From the playground](#creating-a-test-set-from-the-playground)
-- [From traces in observability](#adding-data-from-traces)
-
-## Creating a Test Set from a CSV or JSON
-
-To create a test set from a CSV or JSON file:
-
-1. Go to `Test sets`
-2. Click `Upload test sets`
-3. Select either `CSV` or `JSON`
-
-<Image img={require("/images/test-sets/upload_test_set.png")} />
-
-### CSV Format
-
-We use CSV with commas (,) as separators and double quotes (") as quote characters. The first row should contain the header with column names. Each input should have its own column. The column containing the reference answer can have any name, but we use "correct_answer" by default.
-
-:::info
-If you choose a different column name for the reference answer, you'll need to configure the evaluator later with that specific name.
-:::
-
-Here's an example of a valid CSV:
-
-```csv
-text,instruction,correct_answer
-Hello,How are you?,I'm good.
-"Tell me a joke.",Sure, here's one:...
-```
-
-### JSON Format
-
-The test set should be in JSON format with the following structure:
-
-1. A JSON file containing an array of objects.
-2. Each object in the array represents a row, with keys as column headers and values as row data. Here's an example of a valid JSON file:
-
-```json
-[
-  { "recipe_name": "Chicken Parmesan", "correct_answer": "Chicken" },
-  { "recipe_name": "a, special, recipe", "correct_answer": "Beef" }
-]
-```
-
-### Test set schema for Chat Applications
-
-For chat applications created using the chat template in Agenta, the input should be saved in the column called `messages`, which would contain the input list of messages:
-
-```json
-[
-  { "content": "message.", "role": "user" },
-  { "content": "message.", "role": "assistant" }
-  // Add more messages if necessary
-]
-```
-
-In case the prompt include other variables (e.g. `context`), make sure to have a column with the same name and the value of the variable.
-
-The reference answer column (by default `correct_answer`) should follow the same format:
-
-```json
-{ "content": "message.", "role": "assistant" }
-```
-
-Here is an example of a valid CSV for testing the default chat prompt template:
-
-```csv chat_test_set.csv
-context,messages,correct_answer
-test,"[{""role"":""user"",""content"":""hi""}]","{""content"":""Hello! How can I assist you today?"",""role"":""assistant"",""annotations"":[]}"
-```
-
-## Creating a Test Set Using the API
-
-You can upload a test set using our API. Find the [API endpoint reference here](/reference/api/upload-file).
-
-Here's an example of such a call:
-
-**HTTP Request:**
-
-```
-POST /testsets
-
-```
-
-**Request Body:**
-
-```json
-{
-  "name": "testsetname",
-  "csvdata": [
-    { "column1": "row1col1", "column2": "row1col2" },
-    { "column1": "row2col1", "column2": "row2col2" }
-  ]
-}
-```
-
-## Creating/Editing a Test Set from the UI
-
-To create or edit a test set from the UI:
-
-1. Go to `Test sets`
-2. Choose `Create a test set with UI` or select the test set
-3. Name your test set and specify the columns for input types.
-4. Add the dataset.
-
-Remember to click `Save test set`
-
-<Image img={require("/images/test-sets/add_test_set_ui.png")} />
-
-## Creating a Test Set from the Playground
-
-<Stream controls src="27668e5d4c08c8211d4d808af16090bd" height="400px" />
-<br />
-
-The playground offers a convenient way to create and add data to a test set. This workflow is useful when you discover interesting cases or edge cases while experimenting with your LLM application.
-
-To add a data point to a test set from the playground:
-
-1. Work with your application in the playground
-2. When you find an interesting case, click the `Add to test set` button located near the `Run` button
-3. A drawer will display showing the inputs and outputs from the playground
-4. You can modify inputs and correct answers if needed
-5. Select an existing test set to add to, or choose `+Add new` to create a new one
-6. Once you're satisfied, click `Add` to finalize
-
-## Adding Data From Traces
-
-One of the most valuable sources of test cases is your production data. Traces captured in the Observability view represent real user interactions with your LLM application.
-
-<Stream controls src="03031e0bf0b33319923d5dadbf5d5e5a" height="400px" />
-<br />
-
-### Adding a Single Trace
-
-To add a single trace to a test set:
-
-1. Navigate to the **Observability** view in Agenta
-2. Find a trace you want to add to a test set
-3. Click the **Add to test set** button at the top of the trace
-4. Choose to create a new test set or select an existing one
-5. Review the mapping between trace data and test set columns
-   - Agenta will automatically map the inputs and outputs to appropriate columns
-   - You can edit the expected answer if you don't agree with the output
-6. Click **Save** to add the trace to your test set
-
-### Adding Multiple Traces at Once
-
-To efficiently add multiple traces:
-
-1. In the Observability view, use the search function to filter traces
-   - For example, search for specific response patterns like "I don't have enough information"
-2. Select all relevant traces by checking the boxes next to them
-3. Click **Add to test set**
-4. Choose an existing test set or create a new one
-5. Review the mapping for the traces
-6. Click **Save** to add all selected traces to your test set
-
-## Using Your Test Sets
-
-Once you have created test sets, you can use them for:
-
-1. **Playground Iteration**: Load test sets in the playground to test and refine your prompts
-2. **Automated Evaluation**: Run systematic evaluations comparing outputs against expected answers
-3. **Human Evaluation**: Collect human feedback on your application's performance
-4. **Regression Testing**: Ensure new changes don't break existing functionality
-
-To learn more about using your test sets for evaluation, see our [Evaluation documentation](/evaluation/overview).
diff --git a/docs/docs/evaluation/03-concepts.mdx b/docs/docs/evaluation/03-concepts.mdx
new file mode 100644
index 0000000000..e781a28de1
--- /dev/null
+++ b/docs/docs/evaluation/03-concepts.mdx
@@ -0,0 +1,56 @@
+---
+title: "Concepts"
+description: "Understand the key concepts of LLM evaluation in Agenta"
+sidebar_position: 3
+---
+
+import Image from "@theme/IdealImage";
+
+## What is evaluation?
+
+The key to building production-ready LLM applications is to have a tight feedback loop of prompt engineering and evaluation. Whether you are optimizing a chatbot, working on Retrieval-Augmented Generation (RAG), or fine-tuning a text generation task, evaluation is a critical step to ensure consistent performance across different inputs, models, and parameters.
+
+## Key concepts
+
+### Evaluators
+
+Evaluators are functions that assess the output of an LLM application.
+
+Evaluators typically take as input:
+
+- The output of the LLM application
+- (Optional) The reference answer (i.e., expected output or ground truth)
+- (Optional) The inputs to the LLM application
+- Any other relevant data, such as context
+
+Evaluators return different types of results based on the evaluator type. Simple evaluators return single values like boolean (true/false) or numeric scores. Evaluators with schemas (such as LLM-as-a-Judge or custom evaluators) can return structured results with multiple fields, allowing you to capture various aspects of the evaluation in a single result.
+
+
+### Test sets
+
+Test sets are collections of test cases used to evaluate your LLM application. Each test case contains:
+
+- **Inputs**: The data your LLM application expects (required)
+- **Ground Truth**: The expected answer from your application (optional, often stored as "correct_answer")
+- **Annotations**: Additional metadata or rules about the test case (optional)
+
+Test sets are critical for:
+- Evaluating your application systematically
+- Finding edge cases
+- Preventing regressions
+- Measuring improvements over time
+
+### Evaluation workflows
+
+Agenta supports multiple evaluation workflows:
+
+1. **Automated Evaluation (UI)**: Run evaluations from the web interface with configurable evaluators
+2. **Automated Evaluation (SDK)**: Run evaluations programmatically for integration into CI/CD pipelines
+3. **Online Evaluation**: Run evaluations on new traces as they are generated by your LLM application
+4. **Human Evaluation**: Collect expert feedback and annotations for qualitative assessment
+
+## Next steps
+
+- [Configure evaluators](/evaluation/configure-evaluators/overview) for your use case
+- [Create test sets](/evaluation/managing-test-sets/upload-csv) from various sources
+- [Run your first evaluation](/evaluation/evaluation-from-ui/running-evaluations) from the UI
diff --git a/docs/docs/evaluation/03-configure-evaluators.mdx b/docs/docs/evaluation/03-configure-evaluators.mdx
deleted file mode 100644
index 04973265ec..0000000000
--- a/docs/docs/evaluation/03-configure-evaluators.mdx
+++ /dev/null
@@ -1,87 +0,0 @@
----
-title: "Configure Evaluators"
-description: "Set up evaluators for your use case"
----
-
-import Image from "@theme/IdealImage";
-
-In this guide will show you how to configure evaluators for your LLM application.
-
-### What are evaluators?
-
-Evaluators are functions that assess the output of an LLM application.
-
-Evaluators typically take as input:
-
-- The output of the LLM application
-- (Optional) The reference answer (i.e., expected output or ground truth)
-- (Optional) The inputs to the LLM application
-- Any other relevant data, such as context
-
-Evaluators return either a float or a boolean value.
-
-<Image
-  img={require("/images/evaluation/evaluators-inout.png")}
-  alt="Figure showing the inputs and outputs of an evaluator."
-  loading="lazy"
-/>
-
-### Configuring evaluators
-
-To create a new evaluator, click on the `Configure Evaluators` button in the `Evaluations` view.
-
-![The configure evaluators button in agenta.](/images/evaluation/configure-evaluators-1.png)
-
-### Selecting evaluators
-
-Agenta offers a growing list of pre-built evaluators suitable for most use cases. We also provide options for [creating custom evaluators](/evaluation/evaluators/custom-evaluator) (by writing your own Python function) or [using webhooks](/evaluation/evaluators/webhook-evaluator) for evaluation.
-
-<details id="available-evaluators">
-<summary>Available Evaluators</summary>
-
-| **Evaluator Name**                                                                                | **Use Case**                     | **Type**           | **Description**                                                                  |
-| ------------------------------------------------------------------------------------------------- | -------------------------------- | ------------------ | -------------------------------------------------------------------------------- |
-| [Exact Match](/evaluation/evaluators/classification-entiry-extraction#exact-match)                | Classification/Entity Extraction | Pattern Matching   | Checks if the output exactly matches the expected result.                        |
-| [Contains JSON](/evaluation/evaluators/classification-entiry-extraction#contains-json)            | Classification/Entity Extraction | Pattern Matching   | Ensures the output contains valid JSON.                                          |
-| [Regex Test](/evaluation/evaluators/pattern-matching#regular-expression)                          | Classification/Entity Extraction | Pattern Matching   | Checks if the output matches a given regex pattern.                              |
-| [JSON Field Match](/evaluation/evaluators/classification-entiry-extraction#json-field-match)      | Classification/Entity Extraction | Pattern Matching   | Compares specific fields within JSON data.                                       |
-| [JSON Diff Match](/evaluation/evaluators/classification-entiry-extraction#json-diff-match)        | Classification/Entity Extraction | Similarity Metrics | Compares generated JSON with a ground truth JSON based on schema or values.      |
-| [Similarity Match](/evaluation/evaluators/semantic-similarity#similarity-match)                   | Text Generation / Chatbot        | Similarity Metrics | Compares generated output with expected using Jaccard similarity.                |
-| [Semantic Similarity Match](/evaluation/evaluators/semantic-similarity#semantic-similarity-match) | Text Generation / Chatbot        | Semantic Analysis  | Compares the meaning of the generated output with the expected result.           |
-| [Starts With](/evaluation/evaluators/pattern-matching#starts-with)                                | Text Generation / Chatbot        | Pattern Matching   | Checks if the output starts with a specified prefix.                             |
-| [Ends With](/evaluation/evaluators/pattern-matching#ends-with)                                    | Text Generation / Chatbot        | Pattern Matching   | Checks if the output ends with a specified suffix.                               |
-| [Contains](/evaluation/evaluators/pattern-matching#contains)                                      | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains a specific substring.                              |
-| [Contains Any](/evaluation/evaluators/pattern-matching#contains-any)                              | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains any of a list of substrings.                       |
-| [Contains All](/evaluation/evaluators/pattern-matching#contains-all)                              | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains all of a list of substrings.                       |
-| [Levenshtein Distance](/evaluation/evaluators/semantic-similarity#levenshtein-distance)           | Text Generation / Chatbot        | Similarity Metrics | Calculates the Levenshtein distance between output and expected result.          |
-| [LLM-as-a-judge](/evaluation/evaluators/llm-as-a-judge)                                           | Text Generation / Chatbot        | LLM-based          | Sends outputs to an LLM model for critique and evaluation.                       |
-| [RAG Faithfulness](/evaluation/evaluators/rag-evaluators)                                         | RAG / Text Generation / Chatbot  | LLM-based          | Evaluates if the output is faithful to the retrieved documents in RAG workflows. |
-| [RAG Context Relevancy](/evaluation/evaluators/rag-evaluators)                                    | RAG / Text Generation / Chatbot  | LLM-based          | Measures the relevancy of retrieved documents to the given question in RAG.      |
-| [Custom Code Evaluation](/evaluation/evaluators/custom-evaluator)                                 | Custom Logic                     | Custom             | Allows users to define their own evaluator in Python.                            |
-| [Webhook Evaluator](/evaluation/evaluators/webhook-evaluator)                                     | Custom Logic                     | Custom             | Sends output to a webhook for external evaluation.                               |
-
-</details>
-
-![Screen for selecting an evaluator.](/images/evaluation/configure-evaluators-2.png)
-
-## Evaluators' settings
-
-Each evaluator comes with it's unique settings. For instance in the screen below, the JSON field match evaluator requires you to specify which field in the output JSON you need to consider for evaluation. You'll find detailed information about these parameters on each evaluator's documentation page.
-
-![Screen for configuring an evaluator.](/images/evaluation/configure-evaluators-3.png)
-
-## Mappings evaluator's inputs to the LLM data
-
-Evaluators need to know which parts of the data contain the output and the reference answer. Most evaluators allow you to configure this mapping, typically by specifying the name of the column in the test set that contains the `reference answer`.
-
-For more sophisticated evaluators, such as `RAG evaluators` (_available only in cloud and enterprise versions_), you need to define more complex mappings (see figure below).
-
-<Image
-  img={require("/images/evaluation/evaluator-mapping.png")}
-  alt="Figure showing how RAGAS faithfulness evaluator is configured in agenta."
-  loading="lazy"
-/>
-
-Configuring the evaluator is done by mapping the evaluator inputs to the generation data:
-
-![Figure showing how RAGAS faithfulness evaluator is configured in agenta.](/images/evaluation/configure_mapping.png)
diff --git a/docs/docs/evaluation/04-no-code-evaluation.mdx b/docs/docs/evaluation/04-no-code-evaluation.mdx
deleted file mode 100644
index 0323af74bf..0000000000
--- a/docs/docs/evaluation/04-no-code-evaluation.mdx
+++ /dev/null
@@ -1,68 +0,0 @@
----
-title: "No-code Evaluation"
-description: Run evaluation from the UI.
----
-
-import Image from "@theme/IdealImage";
-
-This guide will show you how to run evaluations from the UI.
-
-Before you get started, make sure that you have [created a test set](/evaluation/create-test-sets) and [configured evaluators](/evaluation/configure-evaluators) appropriate for your task.
-
-## Running Evaluations
-
-To start an evaluation, navigate to the Evaluations page and click the `Start new evaluation` button. A modal will appear, allowing you to setup the evaluation.
-
-<Image
-  img={require("/images/evaluation/start-new-evaluation.png")}
-  alt="Start new evaluation"
-/>
-
-### Setting Up Evaluation Parameters
-
-In the modal, specify the following:
-
-- <b>Testset:</b> Choose the testset(s) for your evaluation.
-- <b>Variants:</b> Choose one or more variants to evaluate.
-- <b>Evaluators:</b> Pick one or more evaluators for assessment.
-
-<Image
-  img={require("/images/evaluation/new-evaluation-modal.png")}
-  alt="New evaluation modal"
-  style={{ width: "70%", display: "block", margin: "auto" }}
-/>
-
-#### Advanced Configuration
-
-Additional settings allow you to adjust batching and retry parameters for LLM calls. This help mitigating rate limit errors from your LLM provider.
-
-Advanced configuration options include:
-
-- **Batch Size:** Number of test cases to run concurrently in each batch (default: 10).
-- **Retry Delay:** Time to wait before retrying a failed call (default: 3s).
-- **Max Retries:** Maximum number of retry attempts for a failed call (default: 3).
-- **Delay Between Batches:** Pause duration between batch runs (default: 5s).
-
-## Analyzing Evaluation Results
-
-The main view offers an aggregated summary of results. Each column displays the average score per evaluator for each variant/test set combination. You'll also see average latency, total cost, creation date, and evaluation status.
-
-For a detailed view of an evaluation, click on a completed evaluation row.
-
-<Image
-  img={require("/images/evaluation/detailed-evaluation-results.png")}
-  alt="Detailed evaluation results"
-  style={{ width: "100%" }}
-/>
-
-The evaluation table columns show inputs, reference answers used by evaluators, LLM application output, evaluator results, cost, and latency.
-
-## Comparing Evaluations
-
-Once evaluations are marked "completed," you can compare two or more evaluations <b>from the same test set</b>. Click the `Compare` button to access the Evaluation comparison view, where you can analyze outputs from multiple evaluations side by side.
-
-<Image
-  img={require("/images/evaluation/comparing-evaluations.gif")}
-  style={{ width: "100%" }}
-  alt="Animation showing how to compare evaluations in Agenta"
-/>
diff --git a/docs/docs/evaluation/05-sdk-evaluation.mdx b/docs/docs/evaluation/05-sdk-evaluation.mdx
deleted file mode 100644
index 1355823e01..0000000000
--- a/docs/docs/evaluation/05-sdk-evaluation.mdx
+++ /dev/null
@@ -1,145 +0,0 @@
----
-title: "Evaluate from SDK"
-description: "Run evaluation programmatically from the SDK."
----
-
-import Image from "@theme/IdealImage";
-import GoogleColabButton from "@site/src/components/GoogleColabButton";
-
-This guide explains how to run evaluations programmaticaly from the SDK in agenta. We will do the following:
-
-- Create a test set
-- Create and configure an evaluator
-- Run an evaluation
-- Retrieve the results of evaluations
-
-## How agenta evaluation works
-
-In **agenta**, evaluation is a **fully managed service** . It takes place entirely on our backend, and take care of:
-
-- **Queuing** and managing evaluation jobs
-- **Batching** LLM app calls to optimize performance and avoid exceeding rate limits.
-- **Handle retries and errors** automatically to ensure robust evaluation runs.
-
-Our evaluation service takes a set of test sets, evaluators, and app variants and runs asynchronous jobs for evaluation.
-
-<Image
-  img={require("/images/evaluation/evaluate-sdk.png")}
-  alt="Figure showing how LLM app evaluation infrastructure in Agenta."
-  loading="lazy"
-/>
-
-<GoogleColabButton notebookPath="examples/jupyter/evaluations_with_sdk.ipynb">
-  Open in Google Colaboratory
-</GoogleColabButton>
-
-## 1. Setup
-
-1. Create an LLM app and a couple of variants in agenta and install our sdk using `pip install -U agenta`.
-
-2. Retrieve the application id either from the url.
-
-## 2. Setup the SDK client
-
-```python
-
-app_id = "667d8cfad1812781f7e375d9"
-
-# You can create the API key under the settings page. If you are using the OSS version, you should keep this as an empty string
-api_key = "EUqJGOUu.xxxx"
-
-# Host.
-host = "https://cloud.agenta.ai"
-
-# Initialize the client
-
-client = AgentaApi(base_url=host + "/api", api_key=api_key)
-```
-
-## 3. Create a test set
-
-```python
-from agenta.client.types.new_testset import NewTestset
-
-csvdata = [
-        {"country": "france", "capital": "Paris"},
-        {"country": "Germany", "capital": "Berlin"}
-    ]
-
-response = client.testsets.create_testset(request=NewTestset(name="test set", csvdata=csvdata))
-test_set_id = response.id
-
-```
-
-## 4. Create evaluators
-
-Let's create a custom code evaluator that return 1.0 if the first letter of the app output is uppercase
-
-```python
-code_snippet = """
-from typing import Dict
-
-def evaluate(
-    app_params: Dict[str, str],
-    inputs: Dict[str, str],
-    output: str,  # output of the llm app
-    datapoint: Dict[str, str]  # contains the testset row
-) -> float:
-    if output and output[0].isupper():
-        return 1.0
-    else:
-        return 0.0
-"""
-
-response = client.evaluators.create_new_evaluator_config(app_id=app_id, name="capital_letter_evaluator", evaluator_key="auto_custom_code_run", settings_values={"code": code_snippet})
-letter_match_eval_id = response.id
-```
-
-## 5. Run an evaluation
-
-First let's grab the first variants in the app
-
-```python
-response = client.apps.list_app_variants(app_id=app_id)
-print(response)
-myvariant_id = response[0].variant_id
-```
-
-Then , let's start the evaluation jobs
-
-```python
-from agenta.client.types.llm_run_rate_limit import LlmRunRateLimit
-
-rate_limit_config = LlmRunRateLimit(
-                        batch_size=10, # number of rows to call in parallel
-                        max_retries=3, # max number of time to retry a failed llm call
-                        retry_delay=2, # delay before retrying a failed llm call
-                        delay_between_batches=5, # delay between batches
-                    )
-response = client.evaluations.create_evaluation(app_id=app_id,
-                                                variant_ids=[myvariant_id],
-                                                testset_id=test_set_id,
-                                                evaluators_configs=[letter_match_eval_id],
-                                                rate_limit=rate_limit_config)
-print(response)
-```
-
-Now we can check for the status of the job
-
-```python
-client.evaluations.fetch_evaluation_status('667d98fbd1812781f7e3761a')
-```
-
-As soon as it is done, we can fetch the overall results
-
-```python
-response = client.evaluations.fetch_evaluation_results('667d98fbd1812781f7e3761a')
-
-results = [(evaluator["evaluator_config"]["name"], evaluator["result"]) for evaluator in response["results"]]
-```
-
-and the detailed results
-
-```python
-client.evaluations.fetch_evaluation_scenarios(evaluations_ids='667d98fbd1812781f7e3761a')
-```
diff --git a/docs/docs/evaluation/06-human_evaluation.mdx b/docs/docs/evaluation/06-human_evaluation.mdx
deleted file mode 100644
index 51c331a167..0000000000
--- a/docs/docs/evaluation/06-human_evaluation.mdx
+++ /dev/null
@@ -1,203 +0,0 @@
----
-title: 'How to Run Human Annotations on your LLM Application'
-sidebar_label: 'Human Evaluation'
-description: 'Learn how to use human evaluation and annotations to evaluate the performance of your LLM application.'
----
-
-import Image from "@theme/IdealImage";
-
-
-Human evaluation lets you evaluate your LLM application's performance using human judgment instead of automated metrics.
-
-<details>
-  <summary>⏯️ Watch a short demo of the human evaluation feature.</summary>
-
-  <iframe
-    width="100%"
-    height="400"
-    src="https://www.youtube.com/embed/zpoAbQlsfcw"
-    title="Human Evaluation - Demonstration"
-    frameBorder="0"
-    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
-    allowFullScreen
-  ></iframe>
-</details>
-
-## Why use human evaluation?
-
-Automated metrics can't capture everything. Sometimes you need human experts to evaluate results and identify why errors occur.
-
-Human evaluation helps you:
-- Get expert feedback to compare different versions of your application
-- Collect human feedback and insights to improve your prompts and configuration
-- Collect annotations to bootstrap automated evaluation
-
-## How human evaluation works
-
-Human evaluation follows the same process as automatic evaluation:
-1. Choose a test set
-2. Select the versions you want to evaluate
-3. Pick your evaluators
-4. Start the evaluation
-
-The only difference is that humans provide the evaluation scores instead of automated systems.
-
-## Single model evaluation
-
-### Start a new evaluation
-
-1. Go to the **Evaluations** page
-2. Select the **Human annotation** tab
-3. Click **Start new evaluation**
-
-<Image
-style={{
-    display: "block",
-    textAlign: "center",
-    marginBottom: "20px",
-}}
-img={require("/images/evaluation/human-evaluation/starting-human-evaluation.png")}
-alt="Starting human evaluation" 
-loading="lazy"
-/>
-
-### Configure your evaluation
-
-1. **Select your test set** - Choose the data you want to evaluate against
-2. **Select your revision** - Pick the version of your application to test
-
-:::warning
-Your test set columns must match the input variables in your revision. If they don't match, you'll see an error message.
-:::
-
-3. **Choose evaluators** - Select how you want to measure performance
-
-<Image
-  style={{
-    display: "block",
-    textAlign: "center",
-    marginBottom: "20px",
-  }}
-  img={require("/images/evaluation/human-evaluation/human-evaluation-config-1.png")}
-  alt="Human evaluation configuration" 
-  loading="lazy"
-/>
-
-### Create evaluators (optional)
-
-If you don't have evaluators yet, click **Create new** in the **Evaluator** section.
-
-Each evaluator has:
-- **Name** - What you're measuring (e.g., "correctness")
-- **Description** - What the evaluator does
-- **Feedback types** - How evaluators will score responses
-
-For example, a "correctness" evaluator might have:
-- `is_correct` - A yes/no question about accuracy
-- `error_type` - A multiple-choice field for categorizing mistakes
-
-<Image
-  style={{
-    display: "block",
-    textAlign: "center",
-    marginBottom: "20px",
-  }}
-  img={require("/images/evaluation/human-evaluation/configuring-evaluators.png")}
-  alt="Creating evaluators for human evaluation" 
-  loading="lazy"
-/>
-
-**Available feedback types:**
-- **Boolean** - Yes/no questions
-- **Integer** - Whole number ratings
-- **Decimal** - Precise numerical scores
-- **Single-choice** - Pick one option
-- **Multi-choice** - Pick multiple options
-- **String** - Free-text comments or notes
-
-:::tip
-Evaluators can include multiple related feedback types. For example:
-
-**Correctness evaluator:**
-- `is_correct` - Yes/no question about accuracy
-- `error_type` - Multiple-choice field to categorize mistakes (only if incorrect)
-
-**Style adherence evaluator:**
-- `is_adherent` - Yes/no question about style compliance  
-- `comment` - Text field explaining why the style doesn't match (if needed)
-
-This grouping helps you evaluate different aspects of your LLM's performance in an organized way.
-:::
-
-### Run the evaluation
-
-After creating your evaluators:
-
-1. Select the evaluators you want to use
-2. Click **Start evaluation**
-3. You'll be redirected to the annotation interface
-4. Click **Run all** to generate outputs and begin evaluation
-
-<Image
-style={{
-    display: "block",
-    textAlign: "center",
-    marginBottom: "20px",
-}}
-img={require("/images/evaluation/human-evaluation/human-evaluation-view.png")}
-alt="Human evaluation interface" 
-loading="lazy"
-/>
-
-### Annotate responses
-
-For each test case:
-1. Review the input and output
-2. Use the evaluation form on the right to score the response
-3. Click **Annotate** to save your assessment
-4. Click **Next** to move to the next test case
-
-:::tip
-Select the **Unannotated** tab to see only the test cases you haven't reviewed yet.
-:::
-
-### Review results
-
-After completing all annotations:
-- View results in the **Results** section
-- Compare performance with other experiments
-- Export results to CSV using **Export results**
-- Save annotated data as a new test set with **Save test set**
-
-## A/B testing evaluation
-
-A/B testing lets you compare two versions of your application side-by-side. For each test case, you choose which version performs better.
-
-### Set up A/B testing
-
-1. Select **two versions** you want to compare
-2. Choose your **test set**
-3. For each test case, decide which version is better (or if they're equal)
-
-<Image
-style={{
-    display: "block",
-    textAlign: "center",
-    marginBottom: "20px",
-}}
-img={require("/images/basic_guides/21_ab_test_view_card_view_light.png")} 
-alt="A/B test comparison view" 
-loading="lazy"
-/>
-
-### Collaborate with your team
-
-You can invite team members to help with A/B testing by sharing the evaluation link. Team members must be added to your workspace first.
-
-### A/B testing features
-
-During A/B evaluation, you can:
-- **Compare variants** - Score which version performs better for each test case
-- **Add notes** - Include context or detailed feedback
-- **Export results** - Download your evaluation data for further analysis
-
diff --git a/docs/docs/evaluation/_02-quick-start-sdk.mdx b/docs/docs/evaluation/_02-quick-start-sdk.mdx
new file mode 100644
index 0000000000..854b62b6bc
--- /dev/null
+++ b/docs/docs/evaluation/_02-quick-start-sdk.mdx
@@ -0,0 +1,10 @@
+---
+title: "Quick Start: Evaluation from SDK"
+sidebar_label: "Quick Start (SDK)"
+description: "Get started with evaluating your LLM applications programmatically using the Agenta SDK"
+sidebar_position: 2
+---
+
+import { Redirect } from '@docusaurus/router';
+
+<Redirect to="/evaluation/evaluation-from-sdk/quick-start" />
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/01-quick-start.mdx b/docs/docs/evaluation/_evaluation-from-sdk/01-quick-start.mdx
new file mode 100644
index 0000000000..02150aa368
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/01-quick-start.mdx
@@ -0,0 +1,78 @@
+---
+title: "Quick Start"
+sidebar_label: "Quick Start"
+description: "Quick start guide for running evaluations programmatically with the Agenta SDK"
+sidebar_position: 1
+---
+
+This quick start guide will help you run your first evaluation using the Agenta Python SDK.
+
+## Prerequisites
+
+- Python 3.8 or higher
+- Agenta account with API key
+- An LLM application deployed in Agenta
+
+## Installation
+
+```bash
+pip install -U agenta
+```
+
+## What you'll learn
+
+- How to initialize the SDK
+- How to create test sets programmatically
+- How to configure and run evaluations
+- How to retrieve evaluation results
+
+## Step-by-step tutorial
+
+For a comprehensive walkthrough of evaluation with the SDK, see our [Evaluate with SDK tutorial](/tutorials/sdk/evaluate-with-SDK).
+
+## Quick example
+
+```python
+import agenta as ag
+from agenta.client.api import AgentaApi
+
+# Initialize the SDK
+client = AgentaApi(
+    base_url="https://cloud.agenta.ai/api",
+    api_key="your-api-key"
+)
+
+# Create a test set
+test_set = client.testsets.create_testset(
+    request={
+        "name": "my_test_set",
+        "csvdata": [
+            {"input": "Hello", "expected": "Hi there!"},
+            {"input": "How are you?", "expected": "I'm doing well!"}
+        ]
+    }
+)
+
+# Run evaluation
+evaluation = client.evaluations.create_evaluation(
+    app_id="your-app-id",
+    variant_ids=["variant-id"],
+    testset_id=test_set.id,
+    evaluators_configs=["evaluator-config-id"]
+)
+
+# Check status
+status = client.evaluations.fetch_evaluation_status(evaluation.id)
+print(f"Evaluation status: {status}")
+
+# Get results when complete
+results = client.evaluations.fetch_evaluation_results(evaluation.id)
+print(results)
+```
+
+## Next steps
+
+- Learn about [setup and configuration](/evaluation/evaluation-from-sdk/setup-configuration)
+- Explore [managing test sets with the SDK](/evaluation/evaluation-from-sdk/managing-test-sets)
+- Understand [configuring evaluators](/evaluation/evaluation-from-sdk/configuring-evaluators)
+- See how to [run evaluations](/evaluation/evaluation-from-sdk/running-evaluations) in detail
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/02-setup-configuration.mdx b/docs/docs/evaluation/_evaluation-from-sdk/02-setup-configuration.mdx
new file mode 100644
index 0000000000..28b858b81a
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/02-setup-configuration.mdx
@@ -0,0 +1,60 @@
+---
+title: "Setup and Configuration"
+sidebar_label: "Setup & Configuration"
+description: "Learn how to set up and configure the Agenta SDK for evaluation"
+sidebar_position: 2
+---
+
+<!-- TODO: Replace with new SDK evaluation content -->
+
+## Setup
+
+First, install the Agenta SDK:
+
+```bash
+pip install -U agenta
+```
+
+## Initialize the SDK client
+
+```python
+from agenta.client.api import AgentaApi
+
+app_id = "667d8cfad1812781f7e375d9"
+
+# You can create the API key under the settings page.
+# If you are using the OSS version, you should keep this as an empty string
+api_key = "EUqJGOUu.xxxx"
+
+# Host
+host = "https://cloud.agenta.ai"
+
+# Initialize the client
+client = AgentaApi(base_url=host + "/api", api_key=api_key)
+```
+
+## Configuration options
+
+<!-- TODO: Add new SDK configuration options -->
+
+## Environment variables
+
+You can also configure the SDK using environment variables:
+
+```bash
+export AGENTA_API_KEY="your-api-key"
+export AGENTA_HOST="https://cloud.agenta.ai"
+```
+
+```python
+import agenta as ag
+
+# Initialize will read from environment variables
+ag.init()
+```
+
+## Next steps
+
+- Learn about [managing test sets](/evaluation/evaluation-from-sdk/managing-test-sets)
+- Explore [configuring evaluators](/evaluation/evaluation-from-sdk/configuring-evaluators)
+- Start [running evaluations](/evaluation/evaluation-from-sdk/running-evaluations)
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/03-managing-test-sets.mdx b/docs/docs/evaluation/_evaluation-from-sdk/03-managing-test-sets.mdx
new file mode 100644
index 0000000000..d866710948
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/03-managing-test-sets.mdx
@@ -0,0 +1,42 @@
+---
+title: "Managing Test Sets"
+sidebar_label: "Managing Test Sets"
+description: "Learn how to create, load, and manage test sets using the SDK"
+sidebar_position: 3
+---
+
+<!-- TODO: Replace with new SDK evaluation content -->
+
+## Creating test sets
+
+```python
+from agenta.client.types.new_testset import NewTestset
+
+csvdata = [
+    {"country": "france", "capital": "Paris"},
+    {"country": "Germany", "capital": "Berlin"}
+]
+
+response = client.testsets.create_testset(
+    request=NewTestset(name="test set", csvdata=csvdata)
+)
+test_set_id = response.id
+```
+
+## Loading existing test sets
+
+<!-- TODO: Add content for loading test sets -->
+
+## Updating test sets
+
+<!-- TODO: Add content for updating test sets -->
+
+## Deleting test sets
+
+<!-- TODO: Add content for deleting test sets -->
+
+## Next steps
+
+- Learn about [configuring evaluators](/evaluation/evaluation-from-sdk/configuring-evaluators)
+- Start [running evaluations](/evaluation/evaluation-from-sdk/running-evaluations)
+- Explore [viewing results](/evaluation/evaluation-from-sdk/viewing-results)
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/04-configuring-evaluators.mdx b/docs/docs/evaluation/_evaluation-from-sdk/04-configuring-evaluators.mdx
new file mode 100644
index 0000000000..0de67a068f
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/04-configuring-evaluators.mdx
@@ -0,0 +1,53 @@
+---
+title: "Configuring Evaluators"
+sidebar_label: "Configuring Evaluators"
+description: "Learn how to configure built-in and custom evaluators using the SDK"
+sidebar_position: 4
+---
+
+<!-- TODO: Replace with new SDK evaluation content -->
+
+## Creating evaluators
+
+### Custom code evaluator
+
+Let's create a custom code evaluator that returns 1.0 if the first letter of the app output is uppercase:
+
+```python
+code_snippet = """
+from typing import Dict
+
+def evaluate(
+    app_params: Dict[str, str],
+    inputs: Dict[str, str],
+    output: str,  # output of the llm app
+    datapoint: Dict[str, str]  # contains the testset row
+) -> float:
+    if output and output[0].isupper():
+        return 1.0
+    else:
+        return 0.0
+"""
+
+response = client.evaluators.create_new_evaluator_config(
+    app_id=app_id,
+    name="capital_letter_evaluator",
+    evaluator_key="auto_custom_code_run",
+    settings_values={"code": code_snippet}
+)
+letter_match_eval_id = response.id
+```
+
+## Using built-in evaluators
+
+<!-- TODO: Add content for using built-in evaluators -->
+
+## Configuring evaluator settings
+
+<!-- TODO: Add content for configuring evaluator settings -->
+
+## Next steps
+
+- Learn about [running evaluations](/evaluation/evaluation-from-sdk/running-evaluations)
+- Explore [viewing results](/evaluation/evaluation-from-sdk/viewing-results)
+- See all [evaluator types](/evaluation/configure-evaluators/overview)
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/05-running-evaluations.mdx b/docs/docs/evaluation/_evaluation-from-sdk/05-running-evaluations.mdx
new file mode 100644
index 0000000000..aa9a523a02
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/05-running-evaluations.mdx
@@ -0,0 +1,61 @@
+---
+title: "Running Evaluations"
+sidebar_label: "Running Evaluations"
+description: "Learn how to run evaluations programmatically using the SDK"
+sidebar_position: 5
+---
+
+<!-- TODO: Replace with new SDK evaluation content -->
+
+## Running an evaluation
+
+First, let's grab the first variant in the app:
+
+```python
+response = client.apps.list_app_variants(app_id=app_id)
+print(response)
+myvariant_id = response[0].variant_id
+```
+
+Then, let's start the evaluation jobs:
+
+```python
+from agenta.client.types.llm_run_rate_limit import LlmRunRateLimit
+
+rate_limit_config = LlmRunRateLimit(
+    batch_size=10,  # number of rows to call in parallel
+    max_retries=3,  # max number of time to retry a failed llm call
+    retry_delay=2,  # delay before retrying a failed llm call
+    delay_between_batches=5,  # delay between batches
+)
+
+response = client.evaluations.create_evaluation(
+    app_id=app_id,
+    variant_ids=[myvariant_id],
+    testset_id=test_set_id,
+    evaluators_configs=[letter_match_eval_id],
+    rate_limit=rate_limit_config
+)
+print(response)
+```
+
+## Checking evaluation status
+
+Now we can check for the status of the job:
+
+```python
+client.evaluations.fetch_evaluation_status('667d98fbd1812781f7e3761a')
+```
+
+## Configuring rate limits
+
+<!-- TODO: Add more details about rate limit configuration -->
+
+## Handling errors
+
+<!-- TODO: Add content for error handling -->
+
+## Next steps
+
+- Learn about [viewing results](/evaluation/evaluation-from-sdk/viewing-results)
+- Explore [advanced evaluation patterns](/evaluation/evaluation-from-sdk/setup-configuration)
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/06-viewing-results.mdx b/docs/docs/evaluation/_evaluation-from-sdk/06-viewing-results.mdx
new file mode 100644
index 0000000000..7e8609e71f
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/06-viewing-results.mdx
@@ -0,0 +1,47 @@
+---
+title: "Viewing Results"
+sidebar_label: "Viewing Results"
+description: "Learn how to retrieve and analyze evaluation results using the SDK"
+sidebar_position: 6
+---
+
+<!-- TODO: Replace with new SDK evaluation content -->
+
+## Fetching overall results
+
+As soon as the evaluation is done, we can fetch the overall results:
+
+```python
+response = client.evaluations.fetch_evaluation_results('667d98fbd1812781f7e3761a')
+
+results = [
+    (evaluator["evaluator_config"]["name"], evaluator["result"])
+    for evaluator in response["results"]
+]
+print(results)
+```
+
+## Fetching detailed results
+
+Get detailed results for each test case:
+
+```python
+detailed_results = client.evaluations.fetch_evaluation_scenarios(
+    evaluations_ids='667d98fbd1812781f7e3761a'
+)
+print(detailed_results)
+```
+
+## Analyzing results
+
+<!-- TODO: Add content for analyzing results -->
+
+## Exporting results
+
+<!-- TODO: Add content for exporting results -->
+
+## Next steps
+
+- Learn about [human evaluation](/evaluation/human-evaluation/quick-start)
+- Explore [comparing evaluations in the UI](/evaluation/evaluation-from-ui/comparing-runs)
+- See all [evaluator types](/evaluation/configure-evaluators/overview)
diff --git a/docs/docs/evaluation/_evaluation-from-sdk/_category_.json b/docs/docs/evaluation/_evaluation-from-sdk/_category_.json
new file mode 100644
index 0000000000..f2be6bfd52
--- /dev/null
+++ b/docs/docs/evaluation/_evaluation-from-sdk/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Evaluation from SDK",
+  "position": 7,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/evaluation/configure-evaluators/01-overview.mdx b/docs/docs/evaluation/configure-evaluators/01-overview.mdx
new file mode 100644
index 0000000000..7cb1292849
--- /dev/null
+++ b/docs/docs/evaluation/configure-evaluators/01-overview.mdx
@@ -0,0 +1,85 @@
+---
+title: "Configure Evaluators"
+sidebar_label: "Overview"
+description: "Set up evaluators for your use case"
+sidebar_position: 1
+---
+
+import Image from "@theme/IdealImage";
+
+This guide shows you how to configure evaluators for your LLM application.
+
+## Configuring evaluators
+
+To create a new evaluator, click the `Create New` button in the `Evaluators` page.
+
+## Selecting evaluators
+
+Agenta offers a growing list of pre-built evaluators suitable for most use cases. You can also [create custom evaluators](/evaluation/configure-evaluators/custom-evaluator) by writing your own Python function or [use webhooks](/evaluation/configure-evaluators/webhook-evaluator) for evaluation.
+
+<details id="available-evaluators">
+<summary>Available Evaluators</summary>
+
+| **Evaluator Name**                                                                                            | **Use Case**                     | **Type**           | **Description**                                                                  |
+| ------------------------------------------------------------------------------------------------------------- | -------------------------------- | ------------------ | -------------------------------------------------------------------------------- |
+| [Exact Match](/evaluation/configure-evaluators/classification-entity-extraction#exact-match)                | Classification/Entity Extraction | Pattern Matching   | Checks if the output exactly matches the expected result.                        |
+| [Contains JSON](/evaluation/configure-evaluators/classification-entity-extraction#contains-json)            | Classification/Entity Extraction | Pattern Matching   | Ensures the output contains valid JSON.                                          |
+| [Regex Test](/evaluation/configure-evaluators/regex-evaluator)                          | Classification/Entity Extraction | Pattern Matching   | Checks if the output matches a given regex pattern.                              |
+| [JSON Field Match](/evaluation/configure-evaluators/classification-entity-extraction#json-field-match)      | Classification/Entity Extraction | Pattern Matching   | Compares specific fields within JSON data.                                       |
+| [JSON Diff Match](/evaluation/configure-evaluators/classification-entity-extraction#json-diff-match)        | Classification/Entity Extraction | Similarity Metrics | Compares generated JSON with a ground truth JSON based on schema or values.      |
+| [Similarity Match](/evaluation/configure-evaluators/semantic-similarity#similarity-match)                   | Text Generation / Chatbot        | Similarity Metrics | Compares generated output with expected using Jaccard similarity.                |
+| [Semantic Similarity Match](/evaluation/configure-evaluators/semantic-similarity#semantic-similarity-match) | Text Generation / Chatbot        | Semantic Analysis  | Compares the meaning of the generated output with the expected result.           |
+| [Starts With](/evaluation/configure-evaluators/regex-evaluator)                                | Text Generation / Chatbot        | Pattern Matching   | Checks if the output starts with a specified prefix.                             |
+| [Ends With](/evaluation/configure-evaluators/regex-evaluator)                                    | Text Generation / Chatbot        | Pattern Matching   | Checks if the output ends with a specified suffix.                               |
+| [Contains](/evaluation/configure-evaluators/regex-evaluator)                                      | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains a specific substring.                              |
+| [Contains Any](/evaluation/configure-evaluators/regex-evaluator)                              | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains any of a list of substrings.                       |
+| [Contains All](/evaluation/configure-evaluators/regex-evaluator)                              | Text Generation / Chatbot        | Pattern Matching   | Checks if the output contains all of a list of substrings.                       |
+| [Levenshtein Distance](/evaluation/configure-evaluators/semantic-similarity#levenshtein-distance)           | Text Generation / Chatbot        | Similarity Metrics | Calculates the Levenshtein distance between output and expected result.          |
+| [LLM-as-a-judge](/evaluation/configure-evaluators/llm-as-a-judge)                                           | Text Generation / Chatbot        | LLM-based          | Sends outputs to an LLM model for critique and evaluation.                       |
+| [RAG Faithfulness](/evaluation/configure-evaluators/rag-evaluators)                                         | RAG / Text Generation / Chatbot  | LLM-based          | Evaluates if the output is faithful to the retrieved documents in RAG workflows. |
+| [RAG Context Relevancy](/evaluation/configure-evaluators/rag-evaluators)                                    | RAG / Text Generation / Chatbot  | LLM-based          | Measures the relevancy of retrieved documents to the given question in RAG.      |
+| [Custom Code Evaluation](/evaluation/configure-evaluators/custom-evaluator)                                 | Custom Logic                     | Custom             | Allows users to define their own evaluator in Python.                            |
+| [Webhook Evaluator](/evaluation/configure-evaluators/webhook-evaluator)                                     | Custom Logic                     | Custom             | Sends output to a webhook for external evaluation.                               |
+
+</details>
+
+<div style={{ display: "flex", justifyContent: "center" }}>
+  <Image
+    img={require("/images/evaluation/configure-evaluators-1.png")}
+    style={{ width: "50%" }}
+    alt="Create new evaluator"
+  />
+</div>
+
+## Evaluators' playground
+
+Each evaluator comes with its unique playground. For instance, in the screen below, the LLM-as-a-judge evaluator requires you to specify the prompt to use for the evaluation. You'll find detailed information about these parameters on each evaluator's documentation page.
+
+<div style={{ display: "flex", justifyContent: "center" }}>
+  <Image
+    img={require("/images/evaluation/configure-evaluators-3.png")}
+    alt="LLM-as-a-judge evaluator playground"
+  />
+</div>
+
+The evaluator playground lets you test your evaluator with sample input to make sure it's configured correctly.
+
+To use it, follow these steps:
+1. Load a test case from a test set
+2. Select a prompt and run it
+3. Run the evaluator to see the result
+
+You can adjust the configuration until you are happy with the result. When finished, commit your changes.
+
+
+## Next steps
+
+Explore the different evaluator types:
+
+- [Classification and Entity Extraction](/evaluation/configure-evaluators/classification-entity-extraction)
+- [Pattern Matching](/evaluation/configure-evaluators/regex-evaluator)
+- [Semantic Similarity](/evaluation/configure-evaluators/semantic-similarity)
+- [LLM as a Judge](/evaluation/configure-evaluators/llm-as-a-judge)
+- [RAG Evaluators](/evaluation/configure-evaluators/rag-evaluators)
+- [Custom Evaluators](/evaluation/configure-evaluators/custom-evaluator)
+- [Webhook Evaluators](/evaluation/configure-evaluators/webhook-evaluator)
diff --git a/docs/docs/evaluation/evaluators/01-classification-entiry-extraction.mdx b/docs/docs/evaluation/configure-evaluators/02-classification-entity-extraction.mdx
similarity index 100%
rename from docs/docs/evaluation/evaluators/01-classification-entiry-extraction.mdx
rename to docs/docs/evaluation/configure-evaluators/02-classification-entity-extraction.mdx
diff --git a/docs/docs/evaluation/evaluators/03-semantic-similarity.mdx b/docs/docs/evaluation/configure-evaluators/04-semantic-similarity.mdx
similarity index 100%
rename from docs/docs/evaluation/evaluators/03-semantic-similarity.mdx
rename to docs/docs/evaluation/configure-evaluators/04-semantic-similarity.mdx
diff --git a/docs/docs/evaluation/evaluators/04-llm-as-a-judge.mdx b/docs/docs/evaluation/configure-evaluators/05-llm-as-a-judge.mdx
similarity index 96%
rename from docs/docs/evaluation/evaluators/04-llm-as-a-judge.mdx
rename to docs/docs/evaluation/configure-evaluators/05-llm-as-a-judge.mdx
index 1543bb9fd7..14ead426f9 100644
--- a/docs/docs/evaluation/evaluators/04-llm-as-a-judge.mdx
+++ b/docs/docs/evaluation/configure-evaluators/05-llm-as-a-judge.mdx
@@ -4,7 +4,7 @@ title: "LLM-as-a-Judge"
 
 LLM-as-a-Judge is an evaluator that uses an LLM to assess LLM outputs. It's particularly useful for evaluating text generation tasks or chatbots where there's no single correct answer.
 
-![Configuration of LLM-as-a-judge](/images/evaluation/llm-as-a-judge.png)
+![Configuration of LLM-as-a-judge](/images/evaluation/configure-evaluators-3.png)
 
 The evaluator has the following parameters:
 
diff --git a/docs/docs/evaluation/evaluators/05-rag-evaluators.mdx b/docs/docs/evaluation/configure-evaluators/06-rag-evaluators.mdx
similarity index 100%
rename from docs/docs/evaluation/evaluators/05-rag-evaluators.mdx
rename to docs/docs/evaluation/configure-evaluators/06-rag-evaluators.mdx
diff --git a/docs/docs/evaluation/evaluators/06-custom-evaluator.mdx b/docs/docs/evaluation/configure-evaluators/07-custom-evaluator.mdx
similarity index 100%
rename from docs/docs/evaluation/evaluators/06-custom-evaluator.mdx
rename to docs/docs/evaluation/configure-evaluators/07-custom-evaluator.mdx
diff --git a/docs/docs/evaluation/evaluators/07-webhook-evaluator.mdx b/docs/docs/evaluation/configure-evaluators/08-webhook-evaluator.mdx
similarity index 100%
rename from docs/docs/evaluation/evaluators/07-webhook-evaluator.mdx
rename to docs/docs/evaluation/configure-evaluators/08-webhook-evaluator.mdx
diff --git a/docs/docs/evaluation/configure-evaluators/_category_.json b/docs/docs/evaluation/configure-evaluators/_category_.json
new file mode 100644
index 0000000000..838261f53c
--- /dev/null
+++ b/docs/docs/evaluation/configure-evaluators/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Configure Evaluators",
+  "position": 8,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/evaluation/configure-evaluators/regex-evaluator.mdx b/docs/docs/evaluation/configure-evaluators/regex-evaluator.mdx
new file mode 100644
index 0000000000..23e27ce4bc
--- /dev/null
+++ b/docs/docs/evaluation/configure-evaluators/regex-evaluator.mdx
@@ -0,0 +1,16 @@
+---
+title: "Regex Evaluator"
+---
+
+Regular Expressions (Regex) are sequences of characters that define search patterns, often used for pattern matching
+within strings. The `Regex Test` evaluator checks if the generated answer matches a regular expression pattern.
+
+The evaluator takes the regex pattern, and whether to match or not match for that pattern.
+
+Here are some examples:
+
+| Output                                 | Regex         | Match/Mismatch | Evaluator Output |
+| -------------------------------------- | ------------- | -------------- | ---------------- |
+| The iPhone 6 has a 1024px screen       | `.*iphone.*`  | `match`        | True             |
+| The Samsung galaxy has a 1024px screen | `.*Samsung.*` | `mismatch`     | False            |
+
diff --git a/docs/docs/evaluation/evaluation-from-ui/01-quick-start.mdx b/docs/docs/evaluation/evaluation-from-ui/01-quick-start.mdx
new file mode 100644
index 0000000000..a2354b42a0
--- /dev/null
+++ b/docs/docs/evaluation/evaluation-from-ui/01-quick-start.mdx
@@ -0,0 +1,111 @@
+---
+title: "Quick Start Evaluation from UI"
+sidebar_label: "Quick Start"
+description: "Quick start guide for running LLM and prompt evaluations from the Agenta UI"
+sidebar_position: 1
+---
+import Image from "@theme/IdealImage";
+
+This quick start guide will help you run your first evaluation using the Agenta UI.
+
+You will create a prompt that classifies tweets based on their sentiment (positive, negative, neutral). Then you will run an evaluation to measure its performance.
+
+## Prerequisites
+
+Before you get started, create the prompt and the test set:
+
+### Create the prompt
+
+Create a `completion` prompt that classifies tweets based on their sentiment.
+
+Use the following prompt:
+
+**_System prompt_**
+```
+Classify this tweet based on its sentiment. Return only the sentiment. The sentiment should be one of the following: positive, negative, neutral.
+```
+**_User prompt_**
+```
+{{tweet}}
+```
+
+Commit the prompt to the default variant.
+
+<Image
+  style={{ width: "100%", display: "block", margin: "20px auto" }}
+  img={require("/images/evaluation/evaluation-from-ui/01-evaluation-ui-prompt.png")}
+  alt="Create prompt modal showing completion and chat application options"
+/>
+
+
+### Create the test set
+
+Download a test set with 50 tweets based on the [Sentiment140 kaggle dataset](https://www.kaggle.com/datasets/kazanova/sentiment140).
+Download the test set [here](/examples/sentiment140_first50.csv). Then upload it to Agenta (see how to [upload a test set](/evaluation/managing-test-sets/upload-csv)).
+
+## Running the evaluation
+
+Navigate to the **Evaluations** page. Click on **Start new evaluation**.
+
+1. Choose the test set `sentiment140_first50`.
+2. Choose the latest revision from the default variant.
+3. Choose the evaluator `Exact Match`.
+4. Click on **Start evaluation**.
+
+<Image
+  style={{ width: "100%", display: "block", margin: "20px auto" }}
+  img={require("/images/evaluation/evaluation-from-ui/02-running-evaluation.png")}
+  alt="Start new evaluation modal"
+/>
+
+:::tip
+
+Agenta offers a variety of built-in evaluators for most use cases. You can also create custom evaluators in Python. One of the most commonly used evaluators is `LLM as a Judge`. See more about [evaluators](/evaluation/configure-evaluators/overview).
+:::
+
+## Viewing the results
+
+Once the evaluation completes, click on the evaluation run to see the results.
+
+The overview tab shows the aggregated results for all evaluators.
+<Image
+  style={{ width: "100%", display: "block", margin: "20px auto" }}
+  img={require("/images/evaluation/evaluation-from-ui/03-results-overview.png")}
+  alt="Evaluation results overview"
+/>
+
+
+The test set tab shows the results for each test case.
+
+<Image
+  style={{ width: "100%", display: "block", margin: "20px auto" }}
+  img={require("/images/evaluation/evaluation-from-ui/04-results-testcase.png")}
+  alt="Evaluation results test case"
+/>
+
+Click the expand button in the top right corner of the output cell to open a drawer with the full output. Click the tree icon to open a drawer with the trace.
+
+<Image
+  style={{ width: "100%", display: "block", margin: "20px auto" }}
+  img={require("/images/evaluation/evaluation-from-ui/05-results-testcase-drawer.png")}
+  alt="Evaluation results test case drawer"
+/>
+
+
+## Making a change and comparing the results
+
+Make a small change to the prompt and compare the results. Modify the prompt to use gpt-4o-mini instead of gpt-4o. Commit the change. Then rerun the evaluation with the new prompt revision.
+
+Now compare the results between the two evaluations. Click the `+ Compare` button on the top right corner of the evaluation results page. Select the evaluation to compare to.  
+
+<Image
+  style={{ width: "100%", display: "block", margin: "20px auto" }}
+  img={require("/images/evaluation/evaluation-from-ui/06-comparison-view.png")}
+  alt="Evaluation results comparison view"
+/>
+
+You might find that `gpt-4o-mini` performs differently than `gpt-4o` on this task. It may cost less but could have lower accuracy or different latency. 
+
+## Next steps
+
+You have run a simple evaluation for a classification task in Agenta. Next, learn more about the different evaluators and how to configure them. See more about [evaluators](/evaluation/configure-evaluators/overview).
diff --git a/docs/docs/evaluation/evaluation-from-ui/02-running-evaluations.mdx b/docs/docs/evaluation/evaluation-from-ui/02-running-evaluations.mdx
new file mode 100644
index 0000000000..018e795934
--- /dev/null
+++ b/docs/docs/evaluation/evaluation-from-ui/02-running-evaluations.mdx
@@ -0,0 +1,62 @@
+---
+title: "Running Evaluations"
+sidebar_label: "Running Evaluations"
+description: "Learn how to run evaluations from the Agenta web interface"
+sidebar_position: 2
+---
+
+import Image from "@theme/IdealImage";
+
+This guide will show you how to run evaluations from the UI.
+
+## Prerequisites
+
+Before you get started, make sure that you have [created a test set](/evaluation/managing-test-sets/upload-csv) and [configured evaluators](/evaluation/configure-evaluators/overview) appropriate for your task.
+
+## Starting an evaluation
+
+To start an evaluation, navigate to the Evaluations page and click the `Start new evaluation` button. A modal will appear, allowing you to setup the evaluation.
+
+<Image
+  img={require("/images/evaluation/start-new-evaluation.png")}
+  alt="Start new evaluation"
+/>
+
+## Setting up evaluation parameters
+
+In the modal, specify the following:
+
+- **Testset**: Choose the testset(s) for your evaluation
+- **Variants**: Choose one or more variants to evaluate
+- **Evaluators**: Pick one or more evaluators for assessment
+
+<Image
+  img={require("/images/evaluation/new-evaluation-modal.png")}
+  alt="New evaluation modal"
+/>
+
+## Advanced configuration
+
+Additional settings allow you to adjust batching and retry parameters for LLM calls. This helps mitigate rate limit errors from your LLM provider.
+
+Advanced configuration options include:
+
+- **Batch Size**: Number of test cases to run concurrently in each batch (default: 10)
+- **Retry Delay**: Time to wait before retrying a failed call (default: 3s)
+- **Max Retries**: Maximum number of retry attempts for a failed call (default: 3)
+- **Delay Between Batches**: Pause duration between batch runs (default: 5s)
+
+## Monitoring evaluation progress
+
+Once you start an evaluation:
+
+1. The evaluation will appear in the evaluations list
+2. You'll see the status (Running, Completed, Failed)
+3. Progress indicators show how many test cases have been processed
+4. You can view partial results while the evaluation is running
+
+## Next steps
+
+- Learn how to [view evaluation results](/evaluation/evaluation-from-ui/viewing-results)
+- Understand how to [compare evaluations](/evaluation/evaluation-from-ui/comparing-runs)
+- Try [human evaluation](/evaluation/human-evaluation/quick-start) for expert feedback
diff --git a/docs/docs/evaluation/evaluation-from-ui/03-viewing-results.mdx b/docs/docs/evaluation/evaluation-from-ui/03-viewing-results.mdx
new file mode 100644
index 0000000000..8cfdd869d7
--- /dev/null
+++ b/docs/docs/evaluation/evaluation-from-ui/03-viewing-results.mdx
@@ -0,0 +1,78 @@
+---
+title: "Viewing Evaluation Results"
+sidebar_label: "Viewing Results"
+description: "Learn how to view and analyze evaluation results in Agenta"
+sidebar_position: 3
+---
+
+import Image from "@theme/IdealImage";
+
+## Overview
+
+Once your evaluation completes, Agenta provides comprehensive views to analyze the results and understand your LLM application's performance.
+
+## Overview evaluation tab
+
+The main view offers an aggregated summary of results.
+
+<Image
+  img={require("/images/evaluation/overview-results.png")}
+  alt="Overview evaluation results"
+  style={{ width: "100%" }}
+/>
+
+- Average score per evaluator for each variant/test set combination
+- Average latency 
+- Total cost 
+- Creation date
+
+## Test cases evaluation tab
+
+The test cases evaluation tab provides a detailed view of each test case.
+
+<Image
+  img={require("/images/evaluation/detailed-evaluation-results.png")}
+  alt="Detailed evaluation results"
+  style={{ width: "100%" }}
+/>
+
+The evaluation table columns show:
+
+- **Inputs**: The input data from your test set
+- **Reference Answers**: The expected/correct answers used by evaluators
+- **LLM Output**: The actual output from your application
+- **Evaluator Results**: Scores or boolean values from each evaluator
+- **Cost**: The cost of running this test case
+- **Latency**: How long the test case took to execute
+
+If you click on a test case, you will see a drawer with the full output and the evaluator results.
+
+<Image
+  img={require("/images/evaluation/detailed-evaluation-drawer.png")}
+  alt="Detailed evaluation drawer"
+  style={{ width: "100%" }}
+/>
+
+## Prompt configuration tab
+
+The prompt configuration tab shows the prompt configuration used for this evaluation.
+
+<Image
+  img={require("/images/evaluation/evaluation-prompt-config.png")}
+  alt="Prompt configuration"
+  style={{ width: "100%" }}
+/>
+
+## Exporting results
+
+Export your evaluation results for further analysis:
+
+1. Click the **Export** button on the evaluation detail page
+2. Choose CSV format
+3. Open in your preferred analysis tool (Excel, Python, R, etc.)
+
+## Next steps
+
+- Learn how to [compare multiple evaluations](/evaluation/evaluation-from-ui/comparing-runs)
+- Try [human evaluation](/evaluation/human-evaluation/quick-start) for qualitative assessment
+- Explore [evaluation from SDK](/tutorials/sdk/evaluate-with-SDK) for CI/CD integration
diff --git a/docs/docs/evaluation/evaluation-from-ui/04-comparing-runs.mdx b/docs/docs/evaluation/evaluation-from-ui/04-comparing-runs.mdx
new file mode 100644
index 0000000000..895e1dbf95
--- /dev/null
+++ b/docs/docs/evaluation/evaluation-from-ui/04-comparing-runs.mdx
@@ -0,0 +1,72 @@
+---
+title: "Comparing Evaluation Runs"
+sidebar_label: "Comparing Runs"
+description: "Learn how to compare multiple evaluation runs to find the best performing variant"
+sidebar_position: 4
+---
+
+import Image from "@theme/IdealImage";
+
+## Overview
+
+Compare evaluations to understand which variant performs better. This helps you make data-driven decisions about your LLM application.
+
+## Prerequisites
+
+To compare evaluations, you need:
+
+- Two or more completed evaluations
+- All evaluations must use the same test set
+
+## Starting a comparison
+
+After your evaluations complete, you can compare two or more of them:
+
+1. Go to the Evaluations page
+2. Click the compare button in the top right corner of the evaluation results page
+3. Select the evaluations you want to compare 
+
+
+## Overview comparison tab
+
+The overview comparison tab shows aggregated results for all evaluators. The figures let you compare results between evaluations.
+
+<Image
+  img={require("/images/evaluation/comparing-evaluations.png")}
+  style={{ width: "100%" }}
+  alt="Animation showing how to compare evaluations in Agenta"
+/>
+
+## Test set comparison tab
+
+The test set comparison tab shows results for each test case. The figures let you compare results between evaluations.
+
+<Image
+  img={require("/images/evaluation/comparison-view-testset.png")}
+  style={{ width: "100%" }}
+  alt="Comparison view test set"
+/>
+
+Click on a row to see a drawer with the full output and evaluator results side by side.
+
+<Image
+  img={require("/images/evaluation/comparison-view-drawer.png")}
+  style={{ width: "100%" }}
+  alt="Comparison view test set"
+/>
+
+## Prompt configuration comparison tab
+
+The prompt configuration comparison tab shows the prompt configuration used for each evaluation. The figures let you compare prompt configurations between evaluations.
+
+<Image
+  img={require("/images/evaluation/comparison-view-configuration.png")}
+  style={{ width: "100%" }}
+  alt="Comparison view prompt configuration"
+/>
+
+## Next steps
+
+- Learn about [human evaluation](/evaluation/human-evaluation/quick-start) for qualitative feedback
+- Explore [evaluation from SDK](/tutorials/sdk/evaluate-with-SDK) for automated testing
+- Understand [evaluator types](/evaluation/configure-evaluators/overview) to choose the right metrics
diff --git a/docs/docs/evaluation/evaluation-from-ui/_category_.json b/docs/docs/evaluation/evaluation-from-ui/_category_.json
new file mode 100644
index 0000000000..190fc0bdb3
--- /dev/null
+++ b/docs/docs/evaluation/evaluation-from-ui/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Evaluation from UI",
+  "position": 6,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/evaluation/evaluators/02-pattern-matching.mdx b/docs/docs/evaluation/evaluators/02-pattern-matching.mdx
deleted file mode 100644
index 283c9e8165..0000000000
--- a/docs/docs/evaluation/evaluators/02-pattern-matching.mdx
+++ /dev/null
@@ -1,47 +0,0 @@
----
-title: "Pattern Matching Evaluators"
----
-
-Pattern Matching Evaluation is a method used to assess the performance of LLMs by identifying specific patterns within the output generated by the model.
-In Agenta, to perform pattern matching evaluation you can make use of the following evaluators:
-
-- Regex Test
-- Starts with
-- Ends with
-- Contains
-- Contains Any
-- Contains All
-
-## Regular Expression
-
-Regular Expressions (Regex) are sequences of characters that define search patterns, often used for pattern matching
-within strings. The `Regex Test` evaluator checks if the generated answer matches a regular expression pattern.
-
-The evaluator takes the regex pattern, and whether to match or not match for that pattern.
-
-Here are some examples:
-
-| Output                                 | Regex         | Match/Mismatch | Evaluator Output |
-| -------------------------------------- | ------------- | -------------- | ---------------- |
-| The iPhone 6 has a 1024px screen       | `.*iphone.*`  | `match`        | True             |
-| The Samsung galaxy has a 1024px screen | `.*Samsung.*` | `mismatch`     | False            |
-
-## Starts With
-
-**Starts With evaluator** checks if the output starts with a specified prefix, considering case sensitivity based on the settings.
-
-## Ends With
-
-**Ends With** evaluator checks if the output ends with a specified suffix, considering case sensitivity based on the settings.
-
-## Contains
-
-**Contains evaluator** checks if the output contains a specified substring, considering case sensitivity based on the settings.
-
-## Contains Any
-
-**Contains Any evaluator** checks if the output contains any of the specified substrings from a comma-separated list, considering case sensitivity based on the settings.
-
-## Contains All
-
-**Contains All evaluator** checks if the output contains all of the specified substrings from a comma-separated list, considering case sensitivity based on the settings.
diff --git a/docs/docs/evaluation/evaluators/_category_.json b/docs/docs/evaluation/evaluators/_category_.json
deleted file mode 100644
index 43e5ebe305..0000000000
--- a/docs/docs/evaluation/evaluators/_category_.json
+++ /dev/null
@@ -1,4 +0,0 @@
-{
-  "position": 8,
-  "label": "Evaluators"
-}
diff --git a/docs/docs/evaluation/human-evaluation/01-quick-start.mdx b/docs/docs/evaluation/human-evaluation/01-quick-start.mdx
new file mode 100644
index 0000000000..ae78c98295
--- /dev/null
+++ b/docs/docs/evaluation/human-evaluation/01-quick-start.mdx
@@ -0,0 +1,62 @@
+---
+title: "Quick Start"
+sidebar_label: "Quick Start"
+description: "Get started with human evaluation in Agenta"
+sidebar_position: 1
+---
+
+import Image from "@theme/IdealImage";
+
+## Overview
+
+Human evaluation lets you evaluate your LLM application's performance using human judgment instead of automated metrics.
+
+<details>
+  <summary>⏯️ Watch a short demo of the human evaluation feature.</summary>
+
+  <iframe
+    width="100%"
+    height="400"
+    src="https://www.youtube.com/embed/zpoAbQlsfcw"
+    title="Human Evaluation - Demonstration"
+    frameBorder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowFullScreen
+  ></iframe>
+</details>
+
+## Why use human evaluation?
+
+Automated metrics can't capture everything. Sometimes you need human experts to evaluate results and identify why errors occur.
+
+Human evaluation helps you:
+- Get expert feedback to compare different versions of your application
+- Collect human feedback and insights to improve your prompts and configuration
+- Collect annotations to bootstrap automated evaluation
+
+## How human evaluation works
+
+Human evaluation follows the same process as automatic evaluation:
+1. Choose a test set
+2. Select the versions you want to evaluate
+3. Pick your evaluators
+4. Start the evaluation
+
+The only difference is that humans provide the evaluation scores instead of automated systems.
+
+## Quick workflow
+
+1. **Start evaluation**: Go to Evaluations → Human annotation → Start new evaluation
+2. **Select test set**: Choose the data you want to evaluate against
+3. **Select variant**: Pick the version of your application to test
+4. **Configure evaluators**: Create or select evaluators (boolean, integer, multi-choice, etc.)
+5. **Run**: Click "Start evaluation" and generate outputs
+6. **Annotate**: Review each response and provide feedback
+7. **Review results**: Analyze aggregated scores and export data
+
+## Next steps
+
+- Learn about [configuring evaluators](/evaluation/human-evaluation/configuring-evaluators)
+- Understand [how to run evaluations](/evaluation/human-evaluation/running-evaluations)
+- Explore [viewing results](/evaluation/human-evaluation/viewing-results)
+- Try [A/B testing](/evaluation/human-evaluation/ab-testing)
diff --git a/docs/docs/evaluation/human-evaluation/02-configuring-evaluators.mdx b/docs/docs/evaluation/human-evaluation/02-configuring-evaluators.mdx
new file mode 100644
index 0000000000..209114c22b
--- /dev/null
+++ b/docs/docs/evaluation/human-evaluation/02-configuring-evaluators.mdx
@@ -0,0 +1,71 @@
+---
+title: "Configuring Evaluators"
+sidebar_label: "Configuring Evaluators"
+description: "Learn how to configure evaluators for human evaluation"
+sidebar_position: 2
+---
+
+import Image from "@theme/IdealImage";
+
+## Creating evaluators
+
+If you don't have evaluators yet, click **Create new** in the **Evaluator** section.
+
+Each evaluator has:
+- **Name** - What you're measuring (e.g., "correctness")
+- **Description** - What the evaluator does
+- **Feedback types** - How evaluators will score responses
+
+For example, a "correctness" evaluator might have:
+- `is_correct` - A yes/no question about accuracy
+- `error_type` - A multiple-choice field for categorizing mistakes
+
+<Image
+  style={{
+    display: "block",
+    textAlign: "center",
+    marginBottom: "20px",
+  }}
+  img={require("/images/evaluation/human-evaluation/configuring-evaluators.png")}
+  alt="Creating evaluators for human evaluation"
+  loading="lazy"
+/>
+
+## Available feedback types
+
+- **Boolean** - Yes/no questions
+- **Integer** - Whole number ratings
+- **Decimal** - Precise numerical scores
+- **Single-choice** - Pick one option
+- **Multi-choice** - Pick multiple options
+- **String** - Free-text comments or notes
+
+## Grouping related feedback types
+
+:::tip
+Evaluators can include multiple related feedback types. For example:
+
+**Correctness evaluator:**
+- `is_correct` - Yes/no question about accuracy
+- `error_type` - Multiple-choice field to categorize mistakes (only if incorrect)
+
+**Style adherence evaluator:**
+- `is_adherent` - Yes/no question about style compliance
+- `comment` - Text field explaining why the style doesn't match (if needed)
+
+This grouping helps you evaluate different aspects of your LLM's performance in an organized way.
+:::
+
+## Selecting evaluators
+
+After creating evaluators:
+
+1. Select the evaluators you want to use
+2. You can use multiple evaluators in a single evaluation
+3. Each evaluator will appear in the annotation interface
+
+## Next steps
+
+- Learn about [running evaluations](/evaluation/human-evaluation/running-evaluations)
+- Understand [how to view results](/evaluation/human-evaluation/viewing-results)
+- Try [A/B testing](/evaluation/human-evaluation/ab-testing)
diff --git a/docs/docs/evaluation/human-evaluation/03-running-evaluations.mdx b/docs/docs/evaluation/human-evaluation/03-running-evaluations.mdx
new file mode 100644
index 0000000000..dbc5544269
--- /dev/null
+++ b/docs/docs/evaluation/human-evaluation/03-running-evaluations.mdx
@@ -0,0 +1,87 @@
+---
+title: "Running Evaluations"
+sidebar_label: "Running Evaluations"
+description: "Learn how to run human evaluation sessions in Agenta"
+sidebar_position: 3
+---
+
+import Image from "@theme/IdealImage";
+
+## Starting a new evaluation
+
+1. Go to the **Evaluations** page
+2. Select the **Human annotation** tab
+3. Click **Start new evaluation**
+
+<Image
+style={{
+    display: "block",
+    textAlign: "center",
+    marginBottom: "20px",
+}}
+img={require("/images/evaluation/human-evaluation/starting-human-evaluation.png")}
+alt="Starting human evaluation"
+loading="lazy"
+/>
+
+## Configuring your evaluation
+
+1. **Select your test set** - Choose the data you want to evaluate against
+2. **Select your revision** - Pick the version of your application to test
+
+:::warning
+Your test set columns must match the input variables in your revision. If they don't match, you'll see an error message.
+:::
+
+3. **Choose evaluators** - Select how you want to measure performance
+
+<Image
+  style={{
+    display: "block",
+    textAlign: "center",
+    marginBottom: "20px",
+  }}
+  img={require("/images/evaluation/human-evaluation/human-evaluation-config-1.png")}
+  alt="Human evaluation configuration"
+  loading="lazy"
+/>
+
+## Running the evaluation
+
+After configuring:
+
+1. Click **Start evaluation**
+2. You'll be redirected to the annotation interface
+3. Click **Run all** to generate outputs and begin evaluation
+
+<Image
+style={{
+    display: "block",
+    textAlign: "center",
+    marginBottom: "20px",
+}}
+img={require("/images/evaluation/human-evaluation/human-evaluation-view.png")}
+alt="Human evaluation interface"
+loading="lazy"
+/>
+
+## Annotating responses
+
+For each test case:
+1. Review the input and output
+2. Use the evaluation form on the right to score the response
+3. Click **Annotate** to save your assessment
+4. Click **Next** to move to the next test case
+
+:::tip
+Select the **Unannotated** tab to see only the test cases you haven't reviewed yet.
+:::
+
+## Collaboration
+
+You can invite team members to help with evaluation by sharing the evaluation link. Team members must be added to your workspace first.
+
+## Next steps
+
+- Learn about [viewing results](/evaluation/human-evaluation/viewing-results)
+- Try [A/B testing](/evaluation/human-evaluation/ab-testing) to compare variants
diff --git a/docs/docs/evaluation/human-evaluation/04-viewing-results.mdx b/docs/docs/evaluation/human-evaluation/04-viewing-results.mdx
new file mode 100644
index 0000000000..e80c933b6f
--- /dev/null
+++ b/docs/docs/evaluation/human-evaluation/04-viewing-results.mdx
@@ -0,0 +1,61 @@
+---
+title: "Viewing Results"
+sidebar_label: "Viewing Results"
+description: "Learn how to view and export human evaluation results"
+sidebar_position: 4
+---
+
+## Overview
+
+After completing annotations, you can review and export the results of your human evaluation.
+
+## Viewing results
+
+The **Results** section shows:
+
+- Aggregated scores across all test cases
+- Individual annotations for each test case
+- Evaluator performance metrics
+- Comments and feedback provided by annotators
+
+## Comparing with other experiments
+
+You can compare human evaluation results with:
+
+- Automated evaluation runs
+- Other human evaluation sessions
+- Different variants or versions
+
+This helps you understand how human judgment aligns with automated metrics and identify areas for improvement.
+
+## Exporting results
+
+### Export to CSV
+
+Click **Export results** to download your evaluation data in CSV format. The exported file includes:
+
+- Test case inputs
+- LLM outputs
+- All annotation scores and feedback
+- Timestamp and annotator information
+
+### Saving as test set
+
+Click **Save test set** to create a new test set from annotated data. This is useful for:
+
+- Bootstrapping automated evaluation with human-validated examples
+- Creating regression test suites
+- Building training data for custom evaluators
+
+## Use cases for exported data
+
+- **Analysis**: Perform statistical analysis on evaluation results
+- **Reporting**: Create reports for stakeholders
+- **Training**: Use annotations to train or fine-tune models
+- **Quality Assurance**: Track quality metrics over time
+
+## Next steps
+
+- Learn about [A/B testing](/evaluation/human-evaluation/ab-testing)
+- Explore [automated evaluation](/evaluation/evaluation-from-ui/quick-start)
+- Understand [configure evaluators](/evaluation/configure-evaluators/overview)
diff --git a/docs/docs/evaluation/human-evaluation/05-ab-testing.mdx b/docs/docs/evaluation/human-evaluation/05-ab-testing.mdx
new file mode 100644
index 0000000000..c2da594532
--- /dev/null
+++ b/docs/docs/evaluation/human-evaluation/05-ab-testing.mdx
@@ -0,0 +1,69 @@
+---
+title: "A/B Testing"
+sidebar_label: "A/B Testing"
+description: "Learn how to compare two variants using human evaluation"
+sidebar_position: 5
+---
+
+import Image from "@theme/IdealImage";
+
+## Overview
+
+A/B testing lets you compare two versions of your application side-by-side. For each test case, you choose which version performs better.
+
+## Setting up A/B testing
+
+1. Select **two versions** you want to compare
+2. Choose your **test set**
+3. For each test case, decide which version is better (or if they're equal)
+
+<Image
+style={{
+    display: "block",
+    textAlign: "center",
+    marginBottom: "20px",
+}}
+img={require("/images/basic_guides/21_ab_test_view_card_view_light.png")}
+alt="A/B test comparison view"
+loading="lazy"
+/>
+
+## A/B testing features
+
+During A/B evaluation, you can:
+- **Compare variants** - Score which version performs better for each test case
+- **Add notes** - Include context or detailed feedback
+- **Export results** - Download your evaluation data for further analysis
+
+## Collaborating on A/B tests
+
+You can invite team members to help with A/B testing by sharing the evaluation link. Team members must be added to your workspace first.
+
+This is particularly useful for:
+- Getting diverse perspectives on performance
+- Reducing individual bias
+- Speeding up evaluation with multiple annotators
+
+## Interpreting A/B test results
+
+After completing the A/B test, you'll see:
+
+- Win/loss/tie counts for each variant
+- Percentage of cases where each variant performed better
+- Specific test cases where variants differed significantly
+- Notes and comments from annotators
+
+## Use cases
+
+A/B testing is ideal for:
+
+- **Prompt optimization**: Compare different prompt wordings
+- **Model selection**: Evaluate different LLM models (GPT-4 vs Claude vs others)
+- **Parameter tuning**: Test different temperature or max_tokens settings
+- **Feature comparison**: Compare variants with different features enabled
+
+## Next steps
+
+- Learn about [exporting results](/evaluation/human-evaluation/viewing-results)
+- Explore [automated evaluation](/evaluation/evaluation-from-ui/comparing-runs) for larger-scale comparisons
+- Understand [evaluation concepts](/evaluation/concepts)
diff --git a/docs/docs/evaluation/human-evaluation/_category_.json b/docs/docs/evaluation/human-evaluation/_category_.json
new file mode 100644
index 0000000000..44fdaf5ba8
--- /dev/null
+++ b/docs/docs/evaluation/human-evaluation/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Human Evaluation",
+  "position": 9,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/evaluation/managing-test-sets/01-upload-csv.mdx b/docs/docs/evaluation/managing-test-sets/01-upload-csv.mdx
new file mode 100644
index 0000000000..13591eddb7
--- /dev/null
+++ b/docs/docs/evaluation/managing-test-sets/01-upload-csv.mdx
@@ -0,0 +1,87 @@
+---
+title: "Upload Test Sets as CSVs"
+sidebar_label: "Upload CSV/JSON"
+description: "Learn how to upload test sets from CSV or JSON files"
+sidebar_position: 1
+---
+
+```mdx-code-block
+import Image from "@theme/IdealImage";
+```
+
+## Overview
+
+You can quickly create test sets by uploading CSV or JSON files. This is the fastest way to import existing test data into Agenta.
+
+## Uploading a file
+
+To create a test set from a CSV or JSON file:
+
+1. Go to `Test sets`
+2. Click `Upload test sets`
+3. Select either `CSV` or `JSON`
+
+<Image img={require("/images/test-sets/upload_test_set.png")} />
+
+## CSV Format
+
+We use CSV with commas (,) as separators and double quotes (") as quote characters. The first row should contain the header with column names. Each input should have its own column. The column containing the reference answer can have any name, but we use "correct_answer" by default.
+
+:::info
+If you choose a different column name for the reference answer, you'll need to configure the evaluator later with that specific name.
+:::
+
+Here's an example of a valid CSV:
+
+```csv
+text,instruction,correct_answer
+Hello,How are you?,I'm good.
+"Tell me a joke.",Sure, here's one:...
+```
+
+## JSON Format
+
+The test set should be in JSON format with the following structure:
+
+1. A JSON file containing an array of objects.
+2. Each object in the array represents a row, with keys as column headers and values as row data. Here's an example of a valid JSON file:
+
+```json
+[
+  { "recipe_name": "Chicken Parmesan", "correct_answer": "Chicken" },
+  { "recipe_name": "a, special, recipe", "correct_answer": "Beef" }
+]
+```
+
+## Test set schema for Chat Applications
+
+For chat applications created using the chat template in Agenta, the input should be saved in the column called `messages`, which would contain the input list of messages:
+
+```json
+[
+  { "content": "message.", "role": "user" },
+  { "content": "message.", "role": "assistant" }
+  // Add more messages if necessary
+]
+```
+
+In case the prompt includes other variables (e.g. `context`), make sure to have a column with the same name and the value of the variable.
+
+The reference answer column (by default `correct_answer`) should follow the same format:
+
+```json
+{ "content": "message.", "role": "assistant" }
+```
+
+Here is an example of a valid CSV for testing the default chat prompt template:
+
+```csv chat_test_set.csv
+context,messages,correct_answer
+test,"[{""role"":""user"",""content"":""hi""}]","{""content"":""Hello! How can I assist you today?"",""role"":""assistant"",""annotations"":[]}"
+```
+
+## Next steps
+
+- [Create test sets programmatically](/evaluation/managing-test-sets/create-programatically) using the SDK or API
+- [Create test sets from traces](/evaluation/managing-test-sets/create-from-traces) to capture real production data
+- [Create test sets from playground](/evaluation/managing-test-sets/create-from-playground) during experimentation
diff --git a/docs/docs/evaluation/managing-test-sets/02-create-programatically.mdx b/docs/docs/evaluation/managing-test-sets/02-create-programatically.mdx
new file mode 100644
index 0000000000..79c271af22
--- /dev/null
+++ b/docs/docs/evaluation/managing-test-sets/02-create-programatically.mdx
@@ -0,0 +1,87 @@
+---
+title: "Create Test Sets Programmatically"
+sidebar_label: "Create Programmatically"
+description: "Learn how to create test sets using the API or SDK"
+sidebar_position: 2
+---
+
+## Overview
+
+Creating test sets programmatically allows you to automate test set generation, integrate with your CI/CD pipeline, or dynamically generate test cases from existing data sources.
+
+## Creating via API
+
+You can upload a test set using our API. Find the [API endpoint reference here](/reference/api/upload-file).
+
+Here's an example of such a call:
+
+**HTTP Request:**
+
+```
+POST /testsets
+
+```
+
+**Request Body:**
+
+```json
+{
+  "name": "testsetname",
+  "csvdata": [
+    { "column1": "row1col1", "column2": "row1col2" },
+    { "column1": "row2col1", "column2": "row2col2" }
+  ]
+}
+```
+
+### Example with curl
+
+```bash
+curl -X POST "https://cloud.agenta.ai/api/testsets" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: ApiKey YOUR_API_KEY" \
+  -d '{
+    "name": "my_test_set",
+    "csvdata": [
+      {"input": "Hello", "expected": "Hi there!"},
+      {"input": "How are you?", "expected": "I am doing well!"}
+    ]
+  }'
+```
+
+## Creating via SDK
+
+```python
+from agenta.client.api import AgentaApi
+from agenta.client.types.new_testset import NewTestset
+
+# Initialize the client
+client = AgentaApi(
+    base_url="https://cloud.agenta.ai/api",
+    api_key="your-api-key"
+)
+
+# Create test set data
+csvdata = [
+    {"country": "France", "capital": "Paris"},
+    {"country": "Germany", "capital": "Berlin"},
+    {"country": "Spain", "capital": "Madrid"}
+]
+
+# Create the test set
+response = client.testsets.create_testset(
+    request=NewTestset(
+        name="countries_capitals",
+        csvdata=csvdata
+    )
+)
+
+test_set_id = response.id
+print(f"Created test set with ID: {test_set_id}")
+```
+
+## Next steps
+
+- [Create test sets from UI](/evaluation/managing-test-sets/create-from-ui) for manual creation
+- [Create test sets from traces](/evaluation/managing-test-sets/create-from-traces) to capture production data
+- [Create test sets from playground](/evaluation/managing-test-sets/create-from-playground) during experimentation
diff --git a/docs/docs/evaluation/managing-test-sets/03-create-from-ui.mdx b/docs/docs/evaluation/managing-test-sets/03-create-from-ui.mdx
new file mode 100644
index 0000000000..d6146d5a67
--- /dev/null
+++ b/docs/docs/evaluation/managing-test-sets/03-create-from-ui.mdx
@@ -0,0 +1,48 @@
+---
+title: "Create Test Sets from UI"
+sidebar_label: "Create from UI"
+description: "Learn how to create and edit test sets using the Agenta web interface"
+sidebar_position: 3
+---
+
+```mdx-code-block
+import Image from "@theme/IdealImage";
+```
+
+## Overview
+
+The Agenta UI provides an intuitive interface for creating and editing test sets manually. This is ideal when you want to quickly create or edit small test sets.
+
+## Creating a test set from the UI
+
+To create a new test set from the UI:
+
+1. Go to `Test sets` in the navigation
+2. Click `Create a test set with UI`
+3. Name your test set
+4. Specify the columns for input types
+5. Add your test cases row by row
+6. Click `Save test set` when done
+
+<Image img={require("/images/test-sets/add_test_set_ui.png")} />
+
+## Editing an existing test set
+
+To edit an existing test set:
+
+1. Go to `Test sets`
+2. Select the test set you want to edit
+3. Make your changes (add, edit, or delete rows)
+4. Click `Save test set` to persist your changes
+
+:::tip
+Remember to click `Save test set` before navigating away, or your changes will be lost!
+:::
+
+
+## Next steps
+
+- [Upload test sets as CSV](/evaluation/managing-test-sets/upload-csv) for bulk imports
+- [Create test sets programmatically](/evaluation/managing-test-sets/create-programatically) using API/SDK
+- [Create test sets from traces](/evaluation/managing-test-sets/create-from-traces) to capture production data
+- Learn about [running evaluations](/evaluation/evaluation-from-ui/running-evaluations) with your test sets
diff --git a/docs/docs/evaluation/managing-test-sets/04-create-from-traces.mdx b/docs/docs/evaluation/managing-test-sets/04-create-from-traces.mdx
new file mode 100644
index 0000000000..32f4ba8da5
--- /dev/null
+++ b/docs/docs/evaluation/managing-test-sets/04-create-from-traces.mdx
@@ -0,0 +1,56 @@
+---
+title: "Create Test Sets from Traces"
+sidebar_label: "Create from Traces"
+description: "Learn how to create test sets from production traces in observability"
+sidebar_position: 4
+---
+
+```mdx-code-block
+import { Stream } from '@cloudflare/stream-react';
+```
+
+## Overview
+
+One of the most valuable sources of test cases is your production data. Traces captured in the Observability view represent real user interactions with your LLM application.
+
+<Stream controls src="03031e0bf0b33319923d5dadbf5d5e5a" height="400px" />
+<br />
+
+## Adding a Single Trace
+
+To add a single trace to a test set:
+
+1. Navigate to the **Observability** view in Agenta
+2. Find a trace you want to add to a test set
+3. Click the **Add to test set** button at the top of the trace
+4. Choose to create a new test set or select an existing one
+5. Review the mapping between trace data and test set columns
+   - Agenta will automatically map the inputs and outputs to appropriate columns
+   - You can edit the expected answer if you don't agree with the output
+6. Click **Save** to add the trace to your test set
+
+## Adding Multiple Traces at Once
+
+To efficiently add multiple traces:
+
+1. In the Observability view, use the search function to filter traces
+   - For example, search for specific response patterns like "I don't have enough information"
+2. Select all relevant traces by checking the boxes next to them
+3. Click **Add to test set**
+4. Choose an existing test set or create a new one
+5. Review the mapping for the traces
+6. Click **Save** to add all selected traces to your test set
+
+## Use cases
+
+Creating test sets from traces is particularly useful for:
+
+- **Edge Cases**: Capture unusual or problematic user interactions
+- **Regression Testing**: Save examples of correct behavior to prevent future regressions
+- **Error Analysis**: Collect failed cases to understand and fix issues
+
+## Next steps
+
+- [Create test sets from playground](/evaluation/managing-test-sets/create-from-playground) during experimentation
+- [Upload test sets as CSVs](/evaluation/managing-test-sets/upload-csv) for bulk imports
+- Learn about [running evaluations](/evaluation/evaluation-from-ui/running-evaluations) with your test sets
diff --git a/docs/docs/evaluation/managing-test-sets/05-create-from-playground.mdx b/docs/docs/evaluation/managing-test-sets/05-create-from-playground.mdx
new file mode 100644
index 0000000000..a9c8d5d603
--- /dev/null
+++ b/docs/docs/evaluation/managing-test-sets/05-create-from-playground.mdx
@@ -0,0 +1,34 @@
+---
+title: "Create Test Sets from Playground"
+sidebar_label: "Create from Playground"
+description: "Learn how to create test sets directly from the playground while experimenting"
+sidebar_position: 5
+---
+
+```mdx-code-block
+import { Stream } from '@cloudflare/stream-react';
+```
+
+## Overview
+
+The playground offers a convenient way to create and add data to a test set. This workflow is useful when you discover interesting cases or edge cases while experimenting with your LLM application.
+
+<Stream controls src="27668e5d4c08c8211d4d808af16090bd" height="400px" />
+<br />
+
+## Adding data points from the playground
+
+To add a data point to a test set from the playground:
+
+1. Work with your application in the playground
+2. When you find an interesting case, click the `Add to test set` button located near the `Run` button
+3. A drawer will display showing the inputs and outputs from the playground
+4. You can modify inputs and correct answers if needed
+5. Select an existing test set to add to, or choose `+Add new` to create a new one
+6. Once you're satisfied, click `Add` to finalize
+
+## Next steps
+
+- [Upload test sets as CSVs](/evaluation/managing-test-sets/upload-csv) for bulk imports
+- [Create test sets from traces](/evaluation/managing-test-sets/create-from-traces) to capture production data
+- Learn about [running evaluations](/evaluation/evaluation-from-ui/running-evaluations) with your test sets
diff --git a/docs/docs/evaluation/managing-test-sets/_category_.json b/docs/docs/evaluation/managing-test-sets/_category_.json
new file mode 100644
index 0000000000..45dbdaf1b2
--- /dev/null
+++ b/docs/docs/evaluation/managing-test-sets/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Manage Test Sets",
+  "position": 5,
+  "collapsible": true,
+  "collapsed": true
+}
\ No newline at end of file
diff --git a/docs/docs/getting-started/01-introduction.mdx b/docs/docs/getting-started/01-introduction.mdx
index b27a8dcdec..c1c526d2b7 100644
--- a/docs/docs/getting-started/01-introduction.mdx
+++ b/docs/docs/getting-started/01-introduction.mdx
@@ -14,31 +14,45 @@ import Image from "@theme/IdealImage";
   alt="Screenshots of Agenta LLMOPS platform"
   loading="lazy"
 />
-Agenta is an open-source platform that helps **developers** and **product teams**
-build robust AI applications powered by LLMs. It offers all the tools for **prompt
-management and evaluation**.
+Agenta is an **open-source LLMOps platform** that helps **developers** and **product teams** build reliable LLM applications.
 
-### With Agenta, you can:
 
-1. Rapidly [**experiment** and **compare** prompts](/prompt-engineering/overview) on [any LLM workflow](/custom-workflows/overview) (chain-of-prompts, Retrieval Augmented Generation (RAG), LLM agents...)
-2. Rapidly [**create test sets**](/evaluation/create-test-sets) and **golden datasets** for evaluation
-3. **Evaluate** your application with pre-existing or **custom evaluators**
-4. **Annotate** and **A/B test** your applications with **human feedback**
-5. [**Collaborate with product teams**](/misc/team_management) for prompt engineering and evaluation
-6. [**Deploy your application**](/concepts/concepts#environments) in one-click in the UI, through CLI, or through github workflows.
+Agenta covers the entire LLM development lifecycle: **prompt management**, **evaluation**, and **observability**.
 
-Agenta focuses on increasing the speed of the development cycle of LLM applications by increasing the speed of experimentation.
+## Features
 
-## How is Agenta different?
+### Prompt Engineering and Management
 
-### Works with any LLM app workflow
+Agenta enables product teams to experiment with prompts, push them to production, run  evaluations, and annotate their results.
+
+
+
+### Evaluation
+
+Agenta enables product teams to experiment with prompts, push them to production, run  evaluations, and annotate their results.
+
+
+
+### Observability
+
+Agenta enables product teams to experiment with prompts, push them to production, run  evaluations, and annotate their results.
 
-Agenta enables prompt engineering and evaluation on any LLM app architecture, such as **Chain of Prompts**, **RAG**, or **LLM agents**. It is compatible with any framework like **Langchain** or **LlamaIndex**, and works with any model provider, such as **OpenAI**, **Cohere**, or **local models**.
 
-[Jump here](/custom-workflows/overview) to see how to use your own custom application with Agenta.
+
+## Why Agenta?
 
 ### Enable collaboration between developers and product teams
 
 Agenta empowers **non-developers** to iterate on the configuration of any custom LLM application, evaluate it, annotate it, A/B test it, and deploy it, all within the user interface.
 
-By **adding a few lines to your application code**, you can create a prompt playground that allows non-developers to experiment with prompts for your application and use all the tools within Agenta.
+### Open-source and MIT licensed
+
+Agenta is open-source and MIT licensed, so you can self-host it, modify it, and use it in commercial projects without restrictions.
+
+### Works with any LLM app workflow
+
+Agenta enables prompt engineering and evaluation on any LLM app architecture, such as **Chain of Prompts**, **RAG**, or **LLM agents**. It is compatible with any framework like **Langchain** or **LlamaIndex**, and works with any model provider, such as **OpenAI**, **Cohere**, or **local models**.
+
+[Jump here](/custom-workflows/overview) to see how to use your own custom application with Agenta.
+
+
diff --git a/docs/docs/getting-started/02-quick-start.mdx b/docs/docs/getting-started/02-quick-start.mdx
index ff1696ce2f..0535b21280 100644
--- a/docs/docs/getting-started/02-quick-start.mdx
+++ b/docs/docs/getting-started/02-quick-start.mdx
@@ -159,7 +159,7 @@ Evaluation helps you measure how well your prompt performs against your test cas
 />
 
 :::info
-Agenta comes with a [set of built-in evaluators](/evaluation/overview) that you can use to evaluate your prompt. You can also create your own custom evaluator [using code](/evaluation/evaluators/custom-evaluator) or [webhooks](/evaluation/evaluators/webhook-evaluator).
+Agenta comes with a [set of built-in evaluators](/evaluation/concepts) that you can use to evaluate your prompt. You can also create your own custom evaluator [using code](/evaluation/configure-evaluators/custom-evaluator) or [webhooks](/evaluation/configure-evaluators/webhook-evaluator).
 :::
 
 ### Analyzing the Results
@@ -199,7 +199,7 @@ Agenta provides two main ways to integrate with your application:
 - **Use Agenta for Prompt Management**: Fetch the prompt configuration and use it in your own code
 - **Use Agenta as a Proxy**: Use Agenta as a middleware that forwards requests to the LLM
 
-In this guide, we'll use the first approach. The second approach is covered in the [Integration Guide](/prompt-engineering/prompt-management/how-to-integrate-with-agenta). Its advantages are simpler integration and getting observability out of the box. The disadvantages are that it does not support streaming responses and adds a slight latency to the response (approximately 0.3 seconds).
+In this guide, we'll use the first approach. The second approach is covered in the [Integration Guide](/prompt-engineering/integrating-prompts/integrating-with-agenta). Its advantages are simpler integration and getting observability out of the box. The disadvantages are that it does not support streaming responses and adds a slight latency to the response (approximately 0.3 seconds).
 :::
 
 We are going to fetch the prompt configuration and use it in our own code:
@@ -279,7 +279,7 @@ response = client.chat.completions.create(
 
 </TabItem>
 
-<TabItem value="api" label="Using API (JavaScript)">
+<TabItem value="api" label="Using API (JS/TS)">
 
 ```javascript
 const fetchConfigs = async () => {
@@ -444,6 +444,6 @@ Congratulations! You've successfully created, tested, deployed, and set up obser
 
 Here are some next steps to explore:
 
-- [Set up your evaluation workflow](/evaluation/overview)
+- [Set up your evaluation workflow](/evaluation/concepts)
 - [Set up observability](/observability/overview) and explore the different integrations
-- [Set up custom workflows](/custom-workflows/overview) to enable product teams to run evaluations on complex applications from the UI
\ No newline at end of file
+- [Set up custom workflows](/custom-workflows/overview) to enable product teams to run evaluations on complex applications from the UI
diff --git a/docs/docs/misc/02-faq.mdx b/docs/docs/misc/02-faq.mdx
index d98b4dc8e6..7755a8f51e 100644
--- a/docs/docs/misc/02-faq.mdx
+++ b/docs/docs/misc/02-faq.mdx
@@ -24,7 +24,7 @@ Yes, Agenta can be used with TypeScript, though with varying levels of native su
    
    Agenta is fully OpenTelemetry compliant. You can auto-instrument your TypeScript application using Opentelemetry compatible integrations such as [OpenLLMetry](https://github.com/traceloop/openllmetry-js) which works well with TypeScript projects.
    
-   We have documentation on [setting up OpenTelemetry](/observability/opentelemetry#opentelemetry-tracing-without-agenta-sdk) with your API key.
+   We have documentation on [setting up OpenTelemetry](/observability/trace-with-opentelemetry/distributed-tracing#opentelemetry-tracing-without-agenta-sdk) with your API key.
 
 3. **Evaluation**
    
@@ -60,8 +60,7 @@ In addition it works natively with
 - [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service/)
 - [Vertex AI](https://cloud.google.com/vertex-ai)
 
-You can add any OpenAI compatible endpoint, including self-hosted models and custom models (for instance using [Ollama](https://docs.agenta.ai/prompt-engineering/playground/adding-custom-providers#configuring-openai-compatible-endpoints-eg-ollama)).
-You can also dynamically add new models to any provider already listed in the playground, such as [OpenRouter](/prompt-engineering/playground/adding-custom-providers#adding-models-to-a-provider-eg-openrouter), Anthropic, Gemini, Cohere, and others.
-
-You can learn more about setting up different models in the [documentation](/prompt-engineering/playground/adding-custom-providers).
+You can add any OpenAI compatible endpoint, including self-hosted models and custom models (for instance using [Ollama](https://docs.agenta.ai/prompt-engineering/playground/custom-providers#configuring-openai-compatible-endpoints-eg-ollama)).
+You can also dynamically add new models to any provider already listed in the playground, such as [OpenRouter](/prompt-engineering/playground/custom-providers#adding-models-to-a-provider-eg-openrouter), Anthropic, Gemini, Cohere, and others.
 
+You can learn more about setting up different models in the [documentation](/prompt-engineering/playground/custom-providers).
diff --git a/docs/docs/observability/02-overview.mdx b/docs/docs/observability/01-overview.mdx
similarity index 50%
rename from docs/docs/observability/02-overview.mdx
rename to docs/docs/observability/01-overview.mdx
index 4fe0a2a962..c8f5b34955 100644
--- a/docs/docs/observability/02-overview.mdx
+++ b/docs/docs/observability/01-overview.mdx
@@ -1,29 +1,29 @@
 ---
-title: Overview
-description: Learn how to instrument your application with Agenta for enhanced observability. This guide covers the benefits of observability, how Agenta helps, and how to get started.
+title: "Overview"
+sidebar_label: "Overview"
+description: "Complete guide to LLM observability with Agenta - monitor costs, performance, and trace every request with automatic instrumentation for your AI applications"
+sidebar_position: 2
 ---
 
 ```mdx-code-block
-import DocCard from '@theme/DocCard';
+import CustomDocCard from '@site/src/components/CustomDocCard';
 import clsx from 'clsx';
 import Image from "@theme/IdealImage";
-
 ```
 
-## Why Observability?
-
 Observability is the practice of monitoring and understanding the behavior of your LLM application. With Agenta, you can add a few lines of code to start tracking all inputs, outputs, and metadata of your application.
+
 This allows you to:
 
 - **Debug Effectively**: View exact prompts sent and contexts retrieved. For complex workflows like agents, you can trace the data flow and quickly identify root causes.
 - **Bootstrap Test Sets**: Track real-world inputs and outputs and use them to bootstrap test sets in which you can continuously iterate.
 - **Find Edge Cases**: Identify latency spikes and cost increases. Understand performance bottlenecks to optimize your app's speed and cost-effectiveness.
 - **Track Costs and Latency Over Time**: Monitor how your app's expenses and response times change.
-- **Compare App Versions**: Compare the behavior in productions of different versions of your application to see which performs better.
+- **Compare App Versions**: Compare the behavior in production of different versions of your application to see which performs better.
 
 <Image
   style={{ display: "block", margin: "10 auto" }}
-  img={require("/images/observability/observability.png")}
+  img={require("/images/observability/observability-mockup.png")}
   alt="Illustration of observability"
   loading="lazy"
 />
@@ -37,35 +37,31 @@ Agenta's observability features are built on **OpenTelemetry (OTel)**, an open-s
 - **Proven Reliability**: Use a mature and actively maintained SDK that's trusted in the industry.
 - **Ease of Integration**: If you're familiar with OTel, you already know how to instrument your app with Agenta. No new concepts or syntax to learn—Agenta uses familiar OTel concepts like traces and spans.
 
-## Key Concepts
-
-**Traces**: A trace represents the complete journey of a request through your application. In our context, a trace corresponds to a single request to your LLM application.
-
-**Spans**: A span is a unit of work within a trace. Spans can be nested, forming a tree-like structure. The root span represents the overall operation, while child spans represent sub-operations. Agenta enriches each span with cost information and metadata when you make LLM calls.
-
 ## Next Steps
 
 <section className='row'>
 <article key='1' className="col col--6 margin-bottom--lg">
 
-  <DocCard
+  <CustomDocCard
     item={{
       type: "link",
-      href: "/observability/quickstart",
-      label: "Quick Start",
-      description: "Get started with observability in Agenta",
+      href: "/observability/quickstart-python",
+      label: "Quick Start (Python)",
+      description: "Get started with observability using the Agenta Python SDK",
     }}
+    noIcon={true}
   />
   </article>
 
   <article key='2' className="col col--6 margin-bottom--lg">
-  <DocCard
+  <CustomDocCard
     item={{
       type: "link",
-      href: "/observability/observability-sdk",
-      label: "Observability SDK",
-      description: "Learn how to use the Agenta observability SDK",
+      href: "/observability/quick-start-opentelemetry",
+      label: "Quick Start (OpenTelemetry JS/TS)",
+      description: "Set up LLM observability with OpenTelemetry for JavaScript/TypeScript (Node.js)",
     }}
+    noIcon={true}
   />
   </article>
   </section>
@@ -73,9 +69,8 @@ Agenta's observability features are built on **OpenTelemetry (OTel)**, an open-s
 ### Integrations
 
 <section className='row'>
-
 <article key="1" className="col col--6 margin-bottom--lg">
-  <DocCard
+  <CustomDocCard
     item={{
       type: "link",
       href: "/observability/integrations/openai",
@@ -83,24 +78,26 @@ Agenta's observability features are built on **OpenTelemetry (OTel)**, an open-s
       description:
         "Learn how to instrument your OpenAI application with Agenta",
     }}
+    noIcon={true}
   />
 </article>
 
   <article key='2' className="col col--6 margin-bottom--lg">
-  <DocCard
+  <CustomDocCard
     item={{
       type: "link",
-      href: "/evaluation/sdk-evaluation",
+      href: "/observability/integrations/litellm",
       label: "LiteLLM",
       description: "Learn how to instrument your LiteLLM application with Agenta",
     }}
+    noIcon={true}
     />
   </article>
   </section>
 <section className='row'>
 
 <article key="1" className="col col--6 margin-bottom--lg">
-  <DocCard
+  <CustomDocCard
     item={{
       type: "link",
       href: "/observability/integrations/langchain",
@@ -108,17 +105,97 @@ Agenta's observability features are built on **OpenTelemetry (OTel)**, an open-s
       description:
         "Learn how to instrument your LangChain application with Agenta",
     }}
+    noIcon={true}
   />
 </article>
 
   <article key='2' className="col col--6 margin-bottom--lg">
-  <DocCard
+  <CustomDocCard
     item={{
       type: "link",
       href: "/observability/integrations/instructor",
       label: "Instructor",
       description: "Learn how to instrument your Instructor application with Agenta",
     }}
+    noIcon={true}
+    />
+  </article>
+  </section>
+<section className='row'>
+
+<article key="1" className="col col--6 margin-bottom--lg">
+  <CustomDocCard
+    item={{
+      type: "link",
+      href: "/observability/integrations/llamaindex",
+      label: "LlamaIndex",
+      description: "Learn how to instrument your LlamaIndex application with Agenta",
+    }}
+    noIcon={true}
+  />
+</article>
+
+  <article key='2' className="col col--6 margin-bottom--lg">
+  <CustomDocCard
+    item={{
+      type: "link",
+      href: "/observability/integrations/langgraph",
+      label: "LangGraph",
+      description: "Learn how to instrument your LangGraph application with Agenta",
+    }}
+    noIcon={true}
+    />
+  </article>
+  </section>
+<section className='row'>
+
+<article key="1" className="col col--6 margin-bottom--lg">
+  <CustomDocCard
+    item={{
+      type: "link",
+      href: "/observability/integrations/openai-agents",
+      label: "OpenAI Agents",
+      description: "Learn how to instrument your OpenAI Agents application with Agenta",
+    }}
+    noIcon={true}
+  />
+</article>
+
+  <article key='2' className="col col--6 margin-bottom--lg">
+  <CustomDocCard
+    item={{
+      type: "link",
+      href: "/observability/integrations/pydanticai",
+      label: "PydanticAI",
+      description: "Learn how to instrument your PydanticAI application with Agenta",
+    }}
+    noIcon={true}
+    />
+  </article>
+  </section>
+<section className='row'>
+
+<article key="1" className="col col--6 margin-bottom--lg">
+  <CustomDocCard
+    item={{
+      type: "link",
+      href: "/observability/integrations/dspy",
+      label: "DSPy",
+      description: "Learn how to instrument your DSPy application with Agenta",
+    }}
+    noIcon={true}
+  />
+</article>
+
+  <article key='2' className="col col--6 margin-bottom--lg">
+  <CustomDocCard
+    item={{
+      type: "link",
+      href: "/observability/integrations/agno",
+      label: "Agno",
+      description: "Learn how to instrument your Agno application with Agenta",
+    }}
+    noIcon={true}
     />
   </article>
   </section>
diff --git a/docs/docs/observability/01-quickstart.mdx b/docs/docs/observability/01-quickstart.mdx
deleted file mode 100644
index 1e00398f51..0000000000
--- a/docs/docs/observability/01-quickstart.mdx
+++ /dev/null
@@ -1,107 +0,0 @@
----
-title: Quick Start
----
-
-```mdx-code-block
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-import Image from "@theme/IdealImage";
-
-```
-
-Agenta enables you to capture all inputs, outputs, and metadata from your LLM applications, **whether they're hosted within Agenta or running in your own environment**.
-
-This guide will walk you through setting up observability for an OpenAI application running locally.
-
-:::note
-If you create an application through the Agenta UI, tracing is enabled by default. No additional setup is required—simply go to the observability view to see all your requests.
-:::
-
-## Step-by-Step Guide
-
-### 1. Install Required Packages
-
-First, install the Agenta SDK, OpenAI, and the OpenTelemetry instrumentor for OpenAI:
-
-```bash
-pip install -U agenta openai opentelemetry-instrumentation-openai
-```
-
-### 2. Configure Environment Variables
-
-<Tabs>
-<TabItem value="cloud" label="Agenta Cloud or Enterprise">
-If you're using Agenta Cloud or Enterprise Edition, you'll need an API key:
-
-1. Visit the [Agenta API Keys page](https://cloud.agenta.ai/settings?tab=apiKeys).
-2. Click on **Create New API Key** and follow the prompts.
-
-```python
-import os
-
-os.environ["AGENTA_API_KEY"] = "YOUR_AGENTA_API_KEY"
-os.environ["AGENTA_HOST"] = "https://cloud.agenta.ai"
-```
-
-</TabItem>
-<TabItem value="oss" label="Agenta OSS Running Locally">
-
-```python
-import os
-
-os.environ["AGENTA_HOST"] = "http://localhost"
-```
-
-</TabItem>
-</Tabs>
-
-### 3. Instrument Your Application
-
-Below is a sample script to instrument an OpenAI application:
-
-```python
-# highlight-start
-import agenta as ag
-from opentelemetry.instrumentation.openai import OpenAIInstrumentor
-import openai
-# highlight-end
-
-# highlight-start
-ag.init()
-# highlight-end
-
-# highlight-start
-OpenAIInstrumentor().instrument()
-# highlight-end
-
-response = openai.ChatCompletion.create(
-    model="gpt-3.5-turbo",
-    messages=[
-        {"role": "system", "content": "You are a helpful assistant."},
-        {"role": "user", "content": "Write a short story about AI Engineering."},
-    ],
-)
-
-print(response.choices[0].message.content)
-```
-
-**Explanation**:
-
-- **Import Libraries**: Import Agenta, OpenAI, and the OpenTelemetry instrumentor.
-- **Initialize Agenta**: Call `ag.init()` to initialize the Agenta SDK.
-- **Instrument OpenAI**: Use `OpenAIInstrumentor().instrument()` to enable tracing for OpenAI calls.
-
-### 4. View Traces in the Agenta UI
-
-After running your application, you can view the captured traces in Agenta:
-
-1. Log in to your Agenta dashboard.
-2. Navigate to the **Observability** section.
-3. You'll see a list of traces corresponding to your application's requests.
-
-<Image
-  style={{ display: "block", margin: "10 auto" }}
-  img={require("/images/observability/observability.png")}
-  alt="Illustration of observability"
-  loading="lazy"
-/>
diff --git a/docs/docs/observability/02-quickstart-python.mdx b/docs/docs/observability/02-quickstart-python.mdx
new file mode 100644
index 0000000000..e947ed683a
--- /dev/null
+++ b/docs/docs/observability/02-quickstart-python.mdx
@@ -0,0 +1,97 @@
+---
+title: Quick Start Guide - Python LLM Observability 
+sidebar_label: Quick Start (Python)
+description: Implement LLM observability in Python with Agenta SDK. Learn to set up distributed tracing, monitor requests, and debug your applications step-by-step.
+sidebar_position: 2
+---
+
+```mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+import Image from "@theme/IdealImage";
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+
+```
+
+Agenta captures all inputs, outputs, and metadata from your LLM applications. You can use it whether your applications run inside Agenta or in your own environment.
+
+This guide shows you how to set up observability for an OpenAI application running locally.
+
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/quickstart.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+## Step-by-Step Guide
+
+### 1. Install Required Packages
+
+Install the Agenta SDK, OpenAI, and the OpenTelemetry instrumentor for OpenAI:
+
+```bash
+pip install -U agenta openai opentelemetry-instrumentation-openai
+```
+
+### 2. Configure Environment Variables
+
+You need an API key to start tracing your application. Visit the Agenta API Keys page under settings. Click on Create New API Key and follow the prompts.
+
+```python
+import os
+
+os.environ["AGENTA_API_KEY"] = "YOUR_AGENTA_API_KEY"
+os.environ["AGENTA_HOST"] = "https://cloud.agenta.ai" # Change for self-hosted
+```
+
+### 3. Instrument Your Application
+
+Here is a sample script that instruments an OpenAI application:
+
+```python
+# highlight-start
+import agenta as ag
+from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+import openai
+# highlight-end
+
+# highlight-start
+ag.init()
+# highlight-end
+
+# highlight-start
+OpenAIInstrumentor().instrument()
+# highlight-end
+
+# highlight-start
+@ag.instrument()
+# highlight-end
+def generate():
+    response = openai.chat.completions.create(
+      model="gpt-3.5-turbo",
+      messages=[
+          {"role": "system", "content": "You are a helpful assistant."},
+          {"role": "user", "content": "Write a short story about AI Engineering."},
+      ],
+  )
+  return response.choices[0].message.content
+
+if __name__ == "__main__":
+    print(generate())
+```
+
+The script uses two mechanisms to trace your application.
+
+First, `OpenAIInstrumentor().instrument()` automatically traces all OpenAI calls. It monkey patches the OpenAI library to capture each request.
+
+Second, the `@ag.instrument()` decorator traces the decorated function. It creates a span for the function and records its inputs and outputs.
+
+### 4. View Traces in the Agenta UI
+
+After you run your application, you can view the captured traces in Agenta. Log in to your Agenta dashboard and navigate to the Observability section. You will see a list of traces that correspond to your application's requests.
+
+<Image
+  style={{ display: "block", margin: "10 auto" }}
+  img={require("/images/observability/observability_quickstart.png")}
+  alt="Illustration of observability"
+  loading="lazy"
+/>
diff --git a/docs/docs/observability/03-observability-sdk.mdx b/docs/docs/observability/03-observability-sdk.mdx
deleted file mode 100644
index d3d70798a6..0000000000
--- a/docs/docs/observability/03-observability-sdk.mdx
+++ /dev/null
@@ -1,317 +0,0 @@
----
-title: Observability SDK
----
-
-```mdx-code-block
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-import Image from "@theme/IdealImage";
-
-```
-
-This guide shows you how to use the Agenta observability SDK to instrument workflows in your application.
-
-If you're exclusively using a framework like [LangChain](/observability/integrations/langchain), you can use the auto-instrumentation packages to automatically instrument your application.
-
-However, if you need more flexibility, you can use the SDK to:
-
-- Instrument custom functions in your workflow
-- Add additional **metadata** to the spans
-- Link traces to **applications**, **variants**, and **environments** in Agenta
-
-## Setup
-
-**1. Install the Agenta SDK**
-
-```bash
-pip install -U agenta
-```
-
-**2. Set environment variables**
-
-<Tabs>
-<TabItem value="cloud" label="Agenta Cloud or Enterprise">
-
-1. Visit the [Agenta API Keys page](https://cloud.agenta.ai/settings?tab=apiKeys).
-2. Click on **Create New API Key** and follow the prompts.
-
-```python
-import os
-
-os.environ["AGENTA_API_KEY"] = "YOUR_AGENTA_API_KEY"
-os.environ["AGENTA_HOST"] = "https://cloud.agenta.ai"
-```
-
-</TabItem>
-<TabItem value="oss" label="Agenta OSS Running Locally">
-
-```python
-import os
-
-os.environ["AGENTA_HOST"] = "http://localhost"
-```
-
-</TabItem>
-</Tabs>
-
-## Instrumenting functions with the decorator
-
-To instrument a function, add the `@ag.instrument()` decorator. This automatically captures all input and output data.
-
-The decorator has a `spankind` argument to categorize each span in the UI. Available types are:
-
-`agent`, `chain`, `workflow`, `tool`, `embedding`, `query`, `completion`, `chat`, `rerank`
-
-:::caution
-The instrument decorator should be the top-most decorator on a function (i.e. the last decorator before the function call).
-:::
-
-```python
-@ag.instrument(spankind="task")
-def myllmcall(country:str):
-     prompt = f"What is the capital of {country}"
-     response = client.chat.completions.create(
-        model='gpt-4',
-        messages=[
-            {'role': 'user', 'content': prompt},
-        ],
-	)
-     return response.choices[0].text
-
-@ag.instrument(spankind="workflow")
-def generate(country:str):
-     return myllmcall(country)
-
-```
-
-Agenta automatically determines the parent span based on the function call and nests the spans accordingly.
-
-## Modify a span's metadata
-
-You can add additional information to a span's metadata using `ag.tracing.store_meta()`. This function accesses the active span from the context and adds the key-value pairs to the metadata.
-
-```python
-@ag.instrument(spankind="task")
-def compile_prompt(country:str):
-     prompt = f"What is the capital of {country}"
-
-     # highlight-next-line
-     ag.tracing.store_meta({"prompt_template": prompt})
-
-     formatted_prompt = prompt.format(country=country)
-     return formatted_prompt
-
-```
-
-Metadata is displayed in the span's raw data view.
-
-## Linking spans to applications, variants, and environments
-
-You can link a span to an application, variant, and environment by calling `ag.tracing.store_refs()`.
-
-Applications, variants, and environments can be referenced by their slugs, versions, and commit IDs (for specific versions).
-You can link a span to an application and variant like this:
-
-```python
-
-@ag.instrument(spankind="workflow")
-def generate(country:str):
-     prompt = f"What is the capital of {country}"
-
-
-     formatted_prompt = prompt.format(country=country)
-
-     completion = client.chat.completions.create(
-        model='gpt-4',
-        messages=[
-            {'role': 'user', 'content': formatted_prompt},
-        ],
-	)
-
-     # highlight-start
-     ag.tracing.store_refs(
-        {
-            "application.slug": "capital-app",
-            "environment.slug": "production",
-        }
-     )
-     # highlight-end
-     return completion.choices[0].message.content
-
-```
-
-`ag.tracing.store_refs()` takes a dict with keys from `application.slug`, `application.id`, `application.version`, `variant.slug`, `variant.id`, `variant.version`, `environment.slug`, `environment.id` and `environment.commit_id`, with the values being the slug, id, version or commit id of the application, variant, and environment respectively.
-
-## Storing Internals
-
-Internals are additional data stored in the span. Compared to metadata, internals have the following differences:
-
-- Internals are saved within the span data and are searchable with plain text queries.
-- Internals are shown by default in the span view in a collapsible section, while metadata is only shown as part of the JSON file with the raw data (i.e. better visibility with internals).
-- **Internals can be used for evaluation**. For instance, you can save the retrieved context in the internals and then use it to evaluate the factuality of the response.
-
-As a rule of thumb, use metadata for additional information that is not used for evaluation and not elementary to understand the span, otherwise use internals.
-
-Internals can be stored similarly to metadata:
-
-```python
-@ag.instrument(spankind="workflow")
-def rag_workflow(query:str):
-
-     context = retrieve_context(query)
-
-     # highlight-start
-     ag.tracing.store_internals({"context": context})
-     # highlight-end
-
-     prompt = f"Answer the following question {query} based on the context: {context}"
-
-     completion = client.chat.completions.create(
-        model='gpt-4',
-        messages=[
-            {'role': 'user', 'content': formatted_prompt},
-        ],
-	)
-     return completion.choices[0].message.content
-
-```
-
-## Redacting sensitive data: how to exclude data from capture
-
-In some cases, you may want to exclude parts of the inputs or outputs due to privacy concerns or because the data is too large to be stored in the span.
-
-You can do this by setting the `ignore_inputs` and/or `ignore_outputs` arguments to `True` in the instrument decorator.
-
-```python
-@ag.instrument(
-     spankind="workflow", 
-     ignore_inputs=True, 
-     ignore_outputs=True
-)
-def rag_workflow(query:str):
-     ...
-```
-
-If you want more control, you can specify which parts of the inputs and outputs to exclude:
-
-```python
-@ag.instrument(
-     spankind="workflow", 
-     ignore_inputs=["user_id"], 
-     ignore_outputs=["pii"], 
-)
-def rag_workflow(query:str, user_id:str):
-     ...
-     return {
-          "result": ...,
-          "pii": ...
-     }
-```
-
-For even finer control, you can use a custom `redact()` callback, along with instructions in the case of errors.
-
-```python
-def my_redact(name, field, data):
-     if name == "rag_workflow":
-          if field == "inputs":
-               del data["user_id"]
-          if field == "outputs":
-               del data["pii"]
-
-     return data
-
-
-@ag.instrument(
-     spankind="workflow", 
-     redact=my_redact,
-     redact_on_error=False,
-)
-def rag_workflow(query:str, user_id:str):
-     ...
-     return {
-          "result": ...,
-          "pii": ...
-     }
-```
-
-Finally, if you want to set up global rules for redaction, you can provide a global `redact()` callback that applies everywhere.
-
-```python
-def global_redact(
-     name:str,
-     field:str, 
-     data: Dict[str, Any]
-):
-     if "pii" in data:
-          del data["pii"]
-
-     return data
-
-
-ag.init(
-     redact=global_redact,
-     redact_on_error=True,
-)
-
-def local_redact(
-     name:str,
-     field:str, 
-     data: Dict[str, Any]
-):
-     if name == "rag_workflow":
-          if field == "inputs":
-               del data["user_id"]
-
-     return data
-
-
-@ag.instrument(
-     spankind="workflow", 
-     redact=local_redact,
-     redact_on_error=False,
-)
-def rag_workflow(query:str, user_id:str):
-     ...
-     return {
-          "result": ...,
-          "pii": ...
-     }
-```
-
-
-## Troubleshooting
-
-### Payload Too Large
-
-If your collector receives a **413** response when posting to `/otlp/v1/traces`, the batch size is too large. Agenta accepts batches up to **5 MB** by default.
-
-Reduce the batch size or enable compression in your collector configuration to keep requests under this limit.
-
-### Missing Traces in Serverless Functions
-
-If you're running Agenta observability in short-lived environments like **AWS Lambda**, **Vercel Functions**, **Cloudflare Workers**, or **Google Cloud Functions**, you may notice that some traces are missing from the Agenta dashboard.
-
-**Why this happens:**
-
-OpenTelemetry uses background processes to batch and export spans for efficiency. However, serverless functions terminate abruptly, often before these background processes can finish sending the trace data to Agenta. The spans get buffered but never exported.
-
-**Solution: Force Flush Before Function Exit**
-
-Use OpenTelemetry's `force_flush()` method to ensure all spans are exported before your function terminates:
-
-```python
-import agenta as ag
-from opentelemetry.trace import get_tracer_provider
-
-# Initialize once (outside handler for warm containers)
-ag.init()
-
-def handler(event, context):
-    try:
-        # Your instrumented application logic
-        result = your_instrumented_function()
-        return result
-    finally:
-        # Force export all pending spans before function exits
-        get_tracer_provider().force_flush()
-```
diff --git a/docs/docs/observability/03-quick-start-opentelemetry.mdx b/docs/docs/observability/03-quick-start-opentelemetry.mdx
new file mode 100644
index 0000000000..3b91f1f7a1
--- /dev/null
+++ b/docs/docs/observability/03-quick-start-opentelemetry.mdx
@@ -0,0 +1,245 @@
+---
+title: "Quick Start: OpenTelemetry for JavaScript/TypeScript (Node.js)"
+sidebar_label: "Quick Start (OpenTelemetry JS/TS)"
+description: "Set up LLM observability with OpenTelemetry in JavaScript, TypeScript, and Node.js. Learn how to instrument LLM apps, enable tracing and distributed tracing, and send telemetry to Agenta."
+sidebar_position: 3
+---
+
+```mdx-code-block
+import Image from "@theme/IdealImage";
+import GitHubExampleButton from "@site/src/components/GitHubExampleButton";
+```
+
+Agenta captures all inputs, outputs, and metadata from your LLM applications using OpenTelemetry. This guide shows you how to instrument a Node.js application with OpenTelemetry and send traces to Agenta.
+
+<GitHubExampleButton examplePath="examples/node/observability-opentelemetry">
+  View Complete Example on GitHub
+</GitHubExampleButton>
+
+## Step-by-Step Guide
+
+### 1. Install Required Packages
+
+Install OpenTelemetry packages, OpenAI, and the OpenInference instrumentation for OpenAI:
+
+```bash
+npm install @opentelemetry/api \
+  @opentelemetry/sdk-trace-node \
+  @opentelemetry/exporter-trace-otlp-proto \
+  @opentelemetry/instrumentation \
+  @opentelemetry/resources \
+  @opentelemetry/semantic-conventions \
+  @arizeai/openinference-instrumentation-openai \
+  @arizeai/openinference-semantic-conventions \
+  openai
+```
+
+
+### 2. Configure Environment Variables
+
+You need an API key to start tracing your application. Visit the Agenta API Keys page under settings and create a new API key.
+
+```bash
+export AGENTA_API_KEY="YOUR_AGENTA_API_KEY"
+export AGENTA_HOST="https://cloud.agenta.ai"  # Change for self-hosted
+export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
+```
+
+### 3. Set Up Instrumentation
+
+Create an `instrumentation.js` file to configure OpenTelemetry:
+
+```javascript
+// instrumentation.js
+// highlight-start
+import { registerInstrumentations } from "@opentelemetry/instrumentation";
+import { OpenAIInstrumentation } from "@arizeai/openinference-instrumentation-openai";
+import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-proto";
+import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
+import { Resource } from "@opentelemetry/resources";
+import { SimpleSpanProcessor } from "@opentelemetry/sdk-trace-base";
+import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
+import OpenAI from "openai";
+// highlight-end
+
+// highlight-start
+// Configure the OTLP exporter to send traces to Agenta
+const otlpExporter = new OTLPTraceExporter({
+  url: `${process.env.AGENTA_HOST}/api/otlp/v1/traces`,
+  headers: {
+    Authorization: `ApiKey ${process.env.AGENTA_API_KEY}`,
+  },
+});
+// highlight-end
+
+// highlight-start
+// Create and configure the tracer provider
+const tracerProvider = new NodeTracerProvider({
+  resource: new Resource({
+    [ATTR_SERVICE_NAME]: "openai-quickstart",
+  }),
+});
+
+// Use SimpleSpanProcessor for immediate export (better for short scripts)
+// For long-running services, use BatchSpanProcessor for better performance
+tracerProvider.addSpanProcessor(new SimpleSpanProcessor(otlpExporter));
+tracerProvider.register();
+// highlight-end
+
+// highlight-start
+// Register OpenAI instrumentation
+const instrumentation = new OpenAIInstrumentation();
+instrumentation.manuallyInstrument(OpenAI);
+
+registerInstrumentations({
+  instrumentations: [instrumentation],
+});
+// highlight-end
+
+console.log("👀 OpenTelemetry instrumentation initialized");
+```
+
+### 4. Instrument Your Application
+
+Create your application file `app.js`:
+
+```javascript
+// app.js
+// highlight-start
+import OpenAI from "openai";
+import { trace } from "@opentelemetry/api";
+// highlight-end
+
+const openai = new OpenAI({
+  apiKey: process.env.OPENAI_API_KEY,
+});
+
+// highlight-start
+const tracer = trace.getTracer("my-app", "1.0.0");
+// highlight-end
+
+async function generate() {
+  // highlight-start
+  // Create a span using Agenta's semantic conventions
+  return tracer.startActiveSpan("generate", async (span) => {
+    try {
+      // Set span type
+      span.setAttribute("ag.type.node", "workflow");
+      
+      const messages = [
+        { role: "system", content: "You are a helpful assistant." },
+        { role: "user", content: "Write a short story about AI Engineering." },
+      ];
+      
+      // Set inputs
+      span.setAttribute("ag.data.inputs", JSON.stringify({
+        messages: messages,
+        model: "gpt-3.5-turbo"
+      }));
+      // highlight-end
+      
+      const response = await openai.chat.completions.create({
+        model: "gpt-3.5-turbo",
+        messages: messages,
+      });
+
+      const content = response.choices[0].message.content;
+      
+      // highlight-start
+      // Set outputs
+      span.setAttribute("ag.data.outputs", JSON.stringify({
+        content: content
+      }));
+      
+      return content;
+    } finally {
+      span.end();
+    }
+  });
+  // highlight-end
+}
+
+async function main() {
+  const result = await generate();
+  console.log(result);
+  
+  // Flush traces before exit
+  await trace.getTracerProvider().forceFlush();
+}
+
+main();
+```
+
+### 5. Run Your Application
+
+Run your application with the instrumentation loaded first:
+
+```bash
+node --import ./instrumentation.js app.js
+```
+
+Or add it to your `package.json`:
+
+```json
+{
+  "type": "module",
+  "scripts": {
+    "start": "node --import ./instrumentation.js app.js"
+  }
+}
+```
+
+Then run:
+
+```bash
+npm start
+```
+
+## How It Works
+
+The instrumentation uses two mechanisms to trace your application:
+
+1. **Auto-instrumentation**: `OpenAIInstrumentation` automatically captures all OpenAI API calls, including prompts, completions, tokens, and costs.
+
+2. **Manual spans**: You can create custom spans using `tracer.startActiveSpan()` to track your own functions and add metadata using [Agenta's semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions).
+
+:::tip Span Processors
+This guide uses `SimpleSpanProcessor` which sends spans immediately. This is ideal for:
+- Short-lived scripts and CLI tools
+- Development and debugging
+- Ensuring traces are captured before process exit
+
+For long-running services (web servers, background workers), use `BatchSpanProcessor` for better performance by batching multiple spans before sending.
+:::
+
+### Agenta Semantic Conventions
+
+The example uses Agenta's semantic conventions for proper trace display:
+
+- **`ag.type.node`** - Defines the operation type (workflow, task, tool, etc.)
+- **`ag.data.inputs`** - Stores input parameters as JSON
+- **`ag.data.outputs`** - Stores output results as JSON
+- **`ag.data.internals`** - Stores intermediate values and metadata (optional)
+
+## View Traces in the Agenta UI
+
+After running your application, log in to your Agenta dashboard and navigate to the Observability section. You will see traces showing:
+
+- Complete execution timeline
+- Input messages and parameters
+- Output content
+- Token usage and costs
+- Latency metrics
+
+<Image
+  style={{ display: "block", margin: "10 auto" }}
+  img={require("/images/observability/observability_quickstart.png")}
+  alt="OpenTelemetry traces in Agenta"
+  loading="lazy"
+/>
+
+## Next Steps
+
+- Learn about [semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions) for better trace formatting
+- Explore [distributed tracing](/observability/trace-with-opentelemetry/distributed-tracing) across services
+- See [integration examples](/observability/integrations/openai) for other frameworks
diff --git a/docs/docs/observability/04-concepts.mdx b/docs/docs/observability/04-concepts.mdx
new file mode 100644
index 0000000000..8a226e0935
--- /dev/null
+++ b/docs/docs/observability/04-concepts.mdx
@@ -0,0 +1,163 @@
+---
+title: "Concepts of LLM Observability in Agenta"
+sidebar_label: "Concepts"
+description: "Understand key observability concepts in Agenta including traces, spans, and how OpenTelemetry powers our observability features"
+sidebar_position: 3
+---
+
+```mdx-code-block
+import Image from "@theme/IdealImage";
+```
+
+## Tracing in Agenta
+
+Agenta uses OpenTelemetry to track what happens in your LLM applications. OpenTelemetry is a free, open-source tool that makes monitoring applications easy. You write the monitoring code once, and it works with any observability platform. Read more about OpenTelemetry in [this guide we wrote for AI engineers](https://agenta.ai/blog/the-ai-engineer-s-guide-to-llm-observability-with-opentelemetry).
+
+<details>
+  <summary>⏯️ Watch a video about OpenTelemetry and tracing in Agenta.</summary>
+
+  <iframe
+    width="100%"
+    height="400"
+    src="https://www.youtube.com/embed/crEyMDJ4Bp0"
+    title="OpenTelemetry and Tracing in Agenta"
+    frameBorder="0"
+    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
+    allowFullScreen
+  ></iframe>
+</details>
+
+## Getting Started: Basic Concepts
+
+### Traces
+
+A trace represents the complete journey of a request through your application. In our context, a trace corresponds to a single request to your LLM application.
+
+For example, when a user asks your chatbot a question, that entire interaction is captured as one trace. The trace includes receiving the query, processing it, and returning the response.
+
+### Spans
+
+A span is a unit of work within a trace. Spans can be nested, forming a tree-like structure.
+
+The **root span** represents the overall operation (like "handle user query"). **Child spans** represent sub-operations (like "retrieve context", "call LLM", or "format response").
+
+Agenta enriches each span with cost information for LLM calls, latency measurements, input/output data, and custom metadata you add.
+
+### Span Kinds
+
+Agenta categorizes spans using span kinds. These help you understand different types of operations in your LLM workflow.
+
+Available span kinds:
+- `agent` for autonomous agent operations
+- `chain` for sequential operations
+- `workflow` for complex multi-step processes
+- `tool` for tool or function calls
+- `embedding` for vector embedding generation
+- `query` for database or search queries
+- `completion` for LLM completions
+- `chat` for chat-based LLM interactions
+- `rerank` for re-ranking operations
+
+### Events
+
+Spans can contain events. These are timestamped records of things that happen during span execution. Agenta automatically logs exceptions as events, which helps you debug errors in your traces.
+
+## Working with OpenTelemetry
+
+### Direct OpenTelemetry Integration
+
+You can instrument your application using standard OpenTelemetry SDKs. Agenta accepts any OpenTelemetry span that follows the specification. For Agenta-specific features (like cost tracking and formatted messages), use attributes in our semantic conventions. See the [semantic conventions guide](/observability/trace-with-opentelemetry/semantic-conventions) for details.
+
+### Auto-Instrumentation Compatibility
+
+Agenta works with auto-instrumentation from popular libraries, even if they are not listed in our integrations. We support semantic conventions from [OpenInference](https://github.com/Arize-ai/openinference), [OpenLLMetry](https://github.com/traceloop/openllmetry), and [PydanticAI](/observability/integrations/pydanticai).
+
+When these libraries send spans to Agenta, we automatically translate their conventions to our format. No extra configuration is needed. This means that if you have any package that auto-instruments using these conventions, it will work with Agenta.
+
+## Understanding Span Types
+
+Agenta distinguishes between two types of spans. This separation helps you analyze application behavior independently from evaluation results.
+
+### Invocation Spans
+
+Invocation spans capture your application's actual work. They record what your LLM application does when it executes.
+
+Examples include LLM calls and completions, retrieval operations, tool executions, and agent reasoning steps.
+
+### Annotation Spans
+
+Annotation spans capture evaluations and feedback about invocations. They include automatic evaluations (like LLM-as-a-judge or custom metrics), human feedback and ratings, and evaluation results from test runs.
+
+When you evaluate a span or add feedback, Agenta creates an annotation span. The annotation span links to the original invocation span (explained in the Links section below). This keeps application traces clean while still capturing evaluation data.
+
+## Organizing and Filtering Traces
+
+### Attributes: Adding Metadata
+
+Attributes add information to spans. They are key-value pairs attached to each span. Agenta treats certain attributes specially for better UI experience.
+
+**Special attributes** use the `ag.` namespace. Cost and tokens get displayed prominently with user-friendly filtering. Model and system information appears in span details. Data attributes (inputs and outputs) are formatted based on span kind.
+
+**Custom attributes** can be any key-value pair you add. They are searchable and filterable, but they do not get special UI treatment.
+
+See all available attributes in our [semantic conventions guide](/observability/trace-with-opentelemetry/semantic-conventions).
+
+### References: Linking to Agenta Entities
+
+References connect spans to entities you have created in Agenta. They use a structured format and enable powerful organization.
+
+You can reference applications and their variants, environments (production, staging, development), test sets and test cases, and evaluators.
+
+Common use cases include filtering traces by application (like "show all traces from my chatbot-v2 variant"), comparing performance, and tracking prompt versions.
+
+Each reference can point to a specific variant and version. This gives you precise control over trace organization. References are especially useful for teams managing multiple applications and configurations.
+
+Learn more about using references in the [reference prompt versions guide](/observability/trace-with-python-sdk/reference-prompt-versions).
+
+### Links: Connecting Related Spans
+
+Links connect spans across different traces. Agenta uses them to connect annotations to invocations.
+
+When you evaluate a span, we cannot modify it because spans are immutable in OpenTelemetry. Instead, we create a new annotation span and link it to the original invocation span. This preserves the original trace while connecting evaluation results to the spans they evaluate.
+
+Links enable several features. You can view all evaluations for a specific application run. You can see feedback attached to the relevant invocation. You can filter traces by evaluation results.
+
+Links happen automatically when you use Agenta's evaluation features.
+
+## Applications, Variants, and Environments
+
+Agenta organizes your observability data around three key concepts:
+
+**Applications** are top-level containers for your LLM applications. An application could be a chatbot, a summarization tool, or any other LLM-powered feature.
+
+**Variants** are different versions or configurations of your application. You might have a "gpt-4-turbo" variant and a "claude-opus" variant. Or you might have variants for different prompts or parameters.
+
+**Environments** are deployment stages. Common environments include development, staging, and production.
+
+This organization helps you compare performance across different configurations and track behavior in different environments.
+
+## How Agenta Enhances OpenTelemetry
+
+Agenta uses standard OpenTelemetry for tracing. We add LLM-specific enhancements on top of it.
+
+### Automatic Cost Tracking and Token Counting
+
+We calculate costs for LLM calls based on model pricing. We track token usage (prompt tokens, completion tokens, and total) for each interaction. These metrics appear prominently in the UI and support user-friendly filtering.
+
+### Prompt Versioning Integration
+
+You can link traces to specific prompt versions in your registry. This helps you understand which prompt configuration generated each trace.
+
+### Test Set Integration
+
+You can convert production traces into test cases with one click. This makes it easy to build test sets from real user interactions.
+
+### LLM-Aware UI
+
+The Agenta UI understands LLM-specific data. Chat messages are formatted nicely. You can filter by cost, tokens, model, and other LLM-specific attributes. The UI shows parent-child relationships in your agent workflows clearly.
+
+## Next Steps
+
+- [Get started with Python SDK](/observability/quickstart-python)
+- [Learn about tracing with OpenTelemetry](/observability/trace-with-opentelemetry/getting-started)
+- [Explore integrations](/observability/integrations/openai) for popular LLM frameworks
diff --git a/docs/docs/observability/04-opentelemetry.mdx b/docs/docs/observability/04-opentelemetry.mdx
deleted file mode 100644
index 8fc4f076d1..0000000000
--- a/docs/docs/observability/04-opentelemetry.mdx
+++ /dev/null
@@ -1,192 +0,0 @@
----
-title: Distributed Tracing with OpenTelemetry
-sidebar_label: Distributed Tracing in Otel
-description: "Learn how to use OpenTelemetry to instrument your LLM application with Agenta for enhanced observability."
----
-
-Agenta provides built-in OpenTelemetry instrumentation to simplify distributed tracing in your applications. This guide explains how to implement and manage distributed tracing using the Agenta SDK, and how to integrate external tracing setups with Agenta.
-
-## Using OpenTelemetry with Agenta
-Agenta supports distributed tracing out of the box when using the provided SDK functions:
-
-### 1. Sending Requests (Propagation)
-
-When making requests to other services or sub-systems, use `agenta.tracing.inject()` to inject necessary headers:
-
-```python
-method = "POST"
-url = "https://example-service/api"
-params = {}
-headers = agenta.tracing.inject()  # automatically injects 'Authorization', 'Traceparent', 'Baggage'
-body = {"key": "value"}
-
-response = requests.request(
-    method=method,
-    url=url,
-    params=params,
-    headers=headers,
-    json=body,
-)
-```
-The `agenta.tracing.inject()` function returns headers containing:
-
-- `Authorization`: Authentication information
-- `Traceparent`: Identifies the current trace and span
-- `Baggage`: Contains application-specific context
-
-These headers can be modified before sending them as part of the request if needed.
-
-### 2. Receiving Requests (Extraction)
-
-Agenta simplifies receiving and handling incoming trace contexts:
-
-- If you're using `ag.route()` and `ag.instrument()`, extraction is automatic.
-- For manual extraction, use `agenta.tracing.extract()`:
-
-```python
-traceparent, baggage = agenta.tracing.extract()  # includes 'Traceparent', 'Baggage'
-
-# Use traceparent and baggage to set up your OpenTelemetry context
-# (Implementation depends on your specific use case)
-```
-
-:::note
-`extract()` does not provide `Authorization` because there are many authentication methods (apikey, bearer, secret, access tokens), each requiring different handling. The middlewares and decorators in the Agenta SDK handle this automatically when you use `ag.route()` and `ag.instrument()`.
-:::
-
-## OpenTelemetry Tracing without Agenta SDK
-If you're working with systems that don't use the Agenta SDK, you can still integrate with Agenta's tracing infrastructure using standard OpenTelemetry.
-
-### 1. Setup Requirements
-Install dependencies:
-```bash
-pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
-```
-
-### 2. Configure Environment Variables 
-
-```bash
-# OTEL_PROPAGATORS = unset or "tracecontext,baggage"
-# OTEL_EXPORTER_OTLP_COMPRESSION = unset or "gzip"
-# OTEL_EXPORTER_OTLP_ENDPOINT = "https://cloud.agenta.ai/api/otlp"
-# OTEL_EXPORTER_OTLP_HEADERS = "authorization=ApiKey xxx"
-# OTEL_EXPORTER_OTLP_TRACES_ENDPOINT = "https://cloud.agenta.ai/api/otlp/v1/traces"
-# OTEL_EXPORTER_OTLP_TRACES_HEADERS = "authorization=ApiKey xxx"
-```
-
-### 3. Setup in Code
-```python
-from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
-from opentelemetry.baggage.propagation import W3CBaggagePropagator
-from opentelemetry.sdk.trace import TracerProvider, Span
-from opentelemetry.sdk.trace.export import BatchSpanProcessor
-from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter, Compression
-
-# Configuration
-endpoint = "https://cloud.agenta.ai/api/otlp/v1/traces"
-compression = Compression.Gzip
-headers = {
-    "traceparent": "00-xxx-xxx-01",
-    "baggage": "ag.refs.application.id=xxx",
-    "authorization": "ApiKey xxx",
-}
-
-# Set up provider, processor, and tracer
-provider = TracerProvider()
-
-processor = BatchSpanProcessor(
-    OTLPSpanExporter(
-        endpoint=endpoint,
-        headers={"authorization": headers["authorization"]},
-        compression=compression,
-    )
-)
-
-provider.add_span_processor(processor)
-
-tracer = provider.get_tracer("agenta.tracer")
-
-# Extract incoming trace context
-carrier = {"traceparent": headers["traceparent"]}
-context = TraceContextTextMapPropagator().extract(carrier=carrier, context=None)
-
-carrier = {"baggage": headers["baggage"]}
-context = W3CBaggagePropagator().extract(carrier=carrier, context=context)
-
-# Create and use spans
-with tracer.start_as_current_span(name="agenta", context=context) as span:
-    span: Span
-
-    print(hex(span.get_span_context().trace_id))
-    print(hex(span.get_span_context().span_id))
-    print(span.name)
-```
-
-## Using an OTEL Collector
-
-If you're using an OpenTelemetry collector, you can configure it to export traces to Agenta.
-Here's a sample configuration (`otel-collector-config.yml`):
-
-```yaml
-receivers:
-  otlp:
-    protocols:
-      http:
-        endpoint: 0.0.0.0:4318
-
-processors:
-  batch: {}
-
-exporters:
-  otlphttp/agenta:
-    endpoint: "https://cloud.agenta.ai/api/otlp"
-
-service:
-  pipelines:
-    traces:
-      receivers: [otlp]
-      processors: [batch]
-      exporters: [otlphttp/agenta]
-```
-
-With this configuration:
-
-- The collector receives traces via OTLP/HTTP on port 4318
-- It batches the spans for efficiency
-- It exports them to Agenta's OTLP endpoint
-
-## Span Attributes
-
-When using OpenTelemetry without the Agenta SDK, you need to manually set the appropriate attributes on your spans to integrate properly with Agenta's ecosystem.
-
-### Namespace Convention
-
-Agenta uses the `ag.*` namespace for its attributes. Here are the key namespaces:
-
-- `ag.refs.*`: References to Agenta entities (applications, etc.)
-- `ag.data.*`: Input, internal, and output data
-- `ag.metrics.*`: Performance metrics and costs
-
-### Examples
-
-```python
-# Reference to Agenta application
-span.set_attribute("ag.refs.application.id", AGENTA_APPLICATION_ID)
-
-# Data attributes
-span.set_attribute("ag.data.inputs.key", "Hello,")
-span.set_attribute("ag.data.internals.key", "(Leo)")
-span.set_attribute("ag.data.outputs.key", "World!")
-
-# Metrics - unit values
-span.set_attribute("ag.metrics.unit.some_key", 3)
-span.set_attribute("ag.metrics.acc.some_key", 15)
-
-# Cost and token metrics
-span.set_attribute("ag.metrics.unit.costs.total", 1)
-span.set_attribute("ag.metrics.unit.tokens.total", 100)
-```
-
-:::info
-Apart from these custom attributes, standard OpenTelemetry events, links, status, and exceptions work as usual.
-:::
\ No newline at end of file
diff --git a/docs/docs/observability/09-troubleshooting.mdx b/docs/docs/observability/09-troubleshooting.mdx
new file mode 100644
index 0000000000..1894bf298d
--- /dev/null
+++ b/docs/docs/observability/09-troubleshooting.mdx
@@ -0,0 +1,100 @@
+---
+title: "Troubleshooting"
+sidebar_label: "Troubleshooting"
+description: "Common issues and solutions for Agenta observability with Python SDK and OpenTelemetry"
+sidebar_position: 9
+---
+
+## Common Issues
+
+This page covers common issues you might encounter when using Agenta observability with either the Python SDK or OpenTelemetry.
+
+## Invalid Content Format
+
+You may receive a 500 error with the message "Failed to parse OTLP stream." This happens when you send trace data in JSON format instead of Protobuf.
+
+Agenta's OTLP endpoints accept only the Protobuf format (binary encoding). The server cannot parse JSON payloads. When you configure your OpenTelemetry exporter to use JSON encoding, the request will fail.
+
+### Solution
+
+Configure your OpenTelemetry exporter to use Protobuf encoding. Most OpenTelemetry exporters use Protobuf by default, so you typically don't need to specify it explicitly.
+
+For the Python SDK, import `OTLPSpanExporter` from the proto package (not the json package). The exporter class name should include "proto" in its import path.
+
+For the OpenTelemetry Collector, verify that the encoding field is set to `proto` or omit it entirely (since proto is the default).
+
+For JavaScript and Node.js, use `@opentelemetry/exporter-trace-otlp-proto` instead of the JSON variant. Avoid any exporter package with "json" in its name.
+
+Do not set `encoding: json` in your configuration files. Agenta does not support JSON-encoded OTLP payloads.
+
+## Payload Too Large
+
+Your collector may receive a 413 response when posting to `/otlp/v1/traces`. This means the batch size exceeds the limit. Agenta accepts batches up to 5 MB by default.
+
+Reduce the batch size in your collector configuration. You can also enable compression (such as gzip) to keep requests under the limit.
+
+## Missing Traces in Serverless Functions
+
+Some traces may not appear in the Agenta dashboard when you run observability in serverless environments. This includes AWS Lambda, Vercel Functions, Cloudflare Workers, and Google Cloud Functions.
+
+OpenTelemetry batches spans in background processes before exporting them. This improves efficiency. However, serverless functions terminate abruptly. They often stop before the background processes finish sending trace data to Agenta. The spans get buffered but never exported.
+
+### Solution
+
+Call the `force_flush()` method before your function terminates. This ensures all spans export before the function exits. Import `get_tracer_provider` from `opentelemetry.trace` and call `force_flush()` on it. Place this call in a finally block so it runs even if errors occur.
+
+## Traces Not Appearing in UI
+
+First, verify that your `AGENTA_API_KEY` is set correctly and has the necessary permissions.
+
+Next, check your endpoint configuration. Point to the correct Agenta host. For cloud deployments, use `https://cloud.agenta.ai`. For self-hosted instances, use your instance URL (such as `http://localhost`).
+
+Finally, confirm that you call `ag.init()` before any instrumented functions execute.
+
+## Authentication Errors
+
+You may receive 401 Unauthorized errors when sending traces. This happens for three main reasons.
+
+First, verify that your API key is correct. Check for typos or missing characters.
+
+Second, confirm that the key has not expired. Some API keys have expiration dates.
+
+Third, ensure you use the correct format. The authorization header should follow this pattern: `ApiKey YOUR_KEY_HERE`.
+
+## Performance Issues
+
+### High memory usage
+
+You can reduce memory usage in three ways. Enable gzip compression for OTLP exports to reduce the size of data in memory. Lower the number of spans sent per batch. Implement sampling to avoid sending 100% of traces in high-volume scenarios.
+
+### High latency
+
+Instrumentation should not add significant latency. If it does, check three things. Ensure spans export in the background using async export. Tune your batch size and export intervals to find the right balance. Review custom instrumentation to verify that custom spans are not performing expensive operations.
+
+## OpenTelemetry-Specific Issues
+
+### Context propagation not working
+
+Distributed tracing may fail to work across services. Check three things to fix this. Verify that propagators are configured correctly (set `OTEL_PROPAGATORS=tracecontext,baggage`). Confirm that headers pass between services. Ensure all services use compatible OpenTelemetry versions.
+
+### Spans not nesting correctly
+
+Spans may appear flat instead of nested in the trace view. This indicates a context problem. Verify that context passes correctly between functions. Check that you use `start_as_current_span` with proper context. Make parent-child relationships explicit in your code.
+
+## Python SDK-Specific Issues
+
+### Decorator not capturing data
+
+The `@ag.instrument()` decorator may fail to capture inputs and outputs. This happens for three reasons. The decorator must be the top-most decorator on your function. You must call `ag.init()` before the function runs. The function must return a value (printing alone is not enough).
+
+### Metadata not appearing
+
+Data from `ag.tracing.store_meta()` may not show in the UI. Call this method only within an instrumented function. Check that the span context is active when you call it. Verify that your data format is JSON-serializable.
+
+## Need More Help?
+
+Check the [Agenta documentation](/observability/concepts) for more details about observability concepts. Review our [integration guides](/observability/integrations/openai) for framework-specific help. Visit our [GitHub issues](https://github.com/anthropics/agenta/issues) to report bugs. Join our community for support.
+
+## Next steps
+
+Review the [setup instructions](/observability/trace-with-python-sdk/setup-tracing) to ensure correct configuration. Explore [distributed tracing](/observability/trace-with-opentelemetry/distributed-tracing) to understand how traces work across services. Check the [integrations](/observability/integrations/openai) page for your specific framework.
diff --git a/docs/docs/observability/_using-the-ui/01-filtering-traces.mdx b/docs/docs/observability/_using-the-ui/01-filtering-traces.mdx
new file mode 100644
index 0000000000..8a7e64686b
--- /dev/null
+++ b/docs/docs/observability/_using-the-ui/01-filtering-traces.mdx
@@ -0,0 +1,17 @@
+---
+title: "Filtering Traces"
+sidebar_label: "Filtering Traces"
+description: "Learn how to filter and search traces in the Agenta observability UI"
+sidebar_position: 1
+---
+
+<!-- TODO: Add content for filtering traces -->
+
+## Overview
+
+Learn how to filter and search traces in the Agenta observability UI
+
+## Next steps
+
+- Explore [query data options](/observability/query-data/query-api)
+- Learn about [Python SDK tracing](/observability/trace-with-python-sdk/setup-tracing)
diff --git a/docs/docs/observability/_using-the-ui/02-adding-comments.mdx b/docs/docs/observability/_using-the-ui/02-adding-comments.mdx
new file mode 100644
index 0000000000..98e180d2a2
--- /dev/null
+++ b/docs/docs/observability/_using-the-ui/02-adding-comments.mdx
@@ -0,0 +1,17 @@
+---
+title: "Adding Comments"
+sidebar_label: "Adding Comments"
+description: "Learn how to add comments and notes to traces for team collaboration"
+sidebar_position: 2
+---
+
+<!-- TODO: Add content for adding comments -->
+
+## Overview
+
+Learn how to add comments and notes to traces for team collaboration
+
+## Next steps
+
+- Explore [query data options](/observability/query-data/query-api)
+- Learn about [Python SDK tracing](/observability/trace-with-python-sdk/setup-tracing)
diff --git a/docs/docs/observability/_using-the-ui/03-adding-annotations.mdx b/docs/docs/observability/_using-the-ui/03-adding-annotations.mdx
new file mode 100644
index 0000000000..99fe14cdcd
--- /dev/null
+++ b/docs/docs/observability/_using-the-ui/03-adding-annotations.mdx
@@ -0,0 +1,17 @@
+---
+title: "Adding Annotations"
+sidebar_label: "Adding Annotations"
+description: "Learn how to annotate traces with custom evaluations and feedback"
+sidebar_position: 3
+---
+
+<!-- TODO: Add content for adding annotations -->
+
+## Overview
+
+Learn how to annotate traces with custom evaluations and feedback
+
+## Next steps
+
+- Explore [query data options](/observability/query-data/query-api)
+- Learn about [Python SDK tracing](/observability/trace-with-python-sdk/setup-tracing)
diff --git a/docs/docs/observability/_using-the-ui/04-exporting-data.mdx b/docs/docs/observability/_using-the-ui/04-exporting-data.mdx
new file mode 100644
index 0000000000..47275e0d0c
--- /dev/null
+++ b/docs/docs/observability/_using-the-ui/04-exporting-data.mdx
@@ -0,0 +1,17 @@
+---
+title: "Exporting Data"
+sidebar_label: "Exporting Data"
+description: "Learn how to export trace data from the Agenta UI for analysis"
+sidebar_position: 4
+---
+
+<!-- TODO: Add content for exporting data -->
+
+## Overview
+
+Learn how to export trace data from the Agenta UI for analysis
+
+## Next steps
+
+- Explore [query data options](/observability/query-data/query-api)
+- Learn about [Python SDK tracing](/observability/trace-with-python-sdk/setup-tracing)
diff --git a/docs/docs/observability/_using-the-ui/_category_.json b/docs/docs/observability/_using-the-ui/_category_.json
new file mode 100644
index 0000000000..91b2c97fee
--- /dev/null
+++ b/docs/docs/observability/_using-the-ui/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Using the UI",
+  "position": 7,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/observability/integrations/06-llamaindex.mdx b/docs/docs/observability/integrations/06-llamaindex.mdx
index 98c3e29a9c..5253e193e3 100644
--- a/docs/docs/observability/integrations/06-llamaindex.mdx
+++ b/docs/docs/observability/integrations/06-llamaindex.mdx
@@ -238,4 +238,4 @@ def rag_pipeline(query: str):
 
 ## Next Steps
 
-For more advanced observability features and configuration options, see our [complete observability documentation](/observability/observability-sdk).
+For more advanced observability features and configuration options, see our [complete observability documentation](/observability/trace-with-python-sdk/setup-tracing).
diff --git a/docs/docs/observability/integrations/07-langgraph.mdx b/docs/docs/observability/integrations/07-langgraph.mdx
index d2ada0c780..5305b0450d 100644
--- a/docs/docs/observability/integrations/07-langgraph.mdx
+++ b/docs/docs/observability/integrations/07-langgraph.mdx
@@ -407,4 +407,4 @@ def content_moderator(user_content: str):
 
 ## Next Steps
 
-For more advanced observability features and configuration options, see our [complete observability documentation](/observability/observability-sdk).
+For more advanced observability features and configuration options, see our [complete observability documentation](/observability/trace-with-python-sdk/setup-tracing).
diff --git a/docs/docs/observability/integrations/08-openai-agents.mdx b/docs/docs/observability/integrations/08-openai-agents.mdx
index ba7cfd463f..9a4491915e 100644
--- a/docs/docs/observability/integrations/08-openai-agents.mdx
+++ b/docs/docs/observability/integrations/08-openai-agents.mdx
@@ -175,7 +175,7 @@ The trace provides visibility into your application's execution, helping you:
 - Analyze orchestration effectiveness and workflow optimization, and identify bottlenecks
 
 :::info
-After setting up observability for your OpenAI Agents SDK application, you can use Agenta's [evaluation](/evaluation/overview) features to evaluate the performance of your agents.
+After setting up observability for your OpenAI Agents SDK application, you can use Agenta's [evaluation](/evaluation/concepts) features to evaluate the performance of your agents.
 :::
 
 ## Real-world Example
@@ -276,4 +276,4 @@ async def content_creation_system(content_brief: str):
 
 ## Next Steps
 
-For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/observability-sdk).
\ No newline at end of file
+For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/trace-with-python-sdk/setup-tracing).
\ No newline at end of file
diff --git a/docs/docs/observability/integrations/10-pydanticai.mdx b/docs/docs/observability/integrations/10-pydanticai.mdx
index 96cbece97f..eec3ddd731 100644
--- a/docs/docs/observability/integrations/10-pydanticai.mdx
+++ b/docs/docs/observability/integrations/10-pydanticai.mdx
@@ -249,4 +249,4 @@ The trace provides comprehensive visibility into your application's execution, h
 
 ## Next Steps
 
-For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/observability-sdk).
+For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/trace-with-python-sdk/setup-tracing).
diff --git a/docs/docs/observability/integrations/11-dspy.mdx b/docs/docs/observability/integrations/11-dspy.mdx
index 0dbe0444f0..f32e59dfdd 100644
--- a/docs/docs/observability/integrations/11-dspy.mdx
+++ b/docs/docs/observability/integrations/11-dspy.mdx
@@ -276,4 +276,4 @@ def multi_step_reasoning(question: str):
 
 ## Next Steps
 
-For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/observability-sdk).
\ No newline at end of file
+For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/trace-with-python-sdk/setup-tracing).
\ No newline at end of file
diff --git a/docs/docs/observability/integrations/12-agno.mdx b/docs/docs/observability/integrations/12-agno.mdx
index ee850464a1..28c816c8fd 100644
--- a/docs/docs/observability/integrations/12-agno.mdx
+++ b/docs/docs/observability/integrations/12-agno.mdx
@@ -248,4 +248,4 @@ The trace provides comprehensive visibility into your application's execution, h
 
 ## Next Steps
 
-For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/observability-sdk).
\ No newline at end of file
+For more detailed information about Agenta's observability features and advanced configuration options, visit the [Agenta Observability SDK Documentation](/observability/trace-with-python-sdk/setup-tracing).
\ No newline at end of file
diff --git a/docs/docs/observability/integrations/_category_.json b/docs/docs/observability/integrations/_category_.json
index 52b3e501bb..72d05a3297 100644
--- a/docs/docs/observability/integrations/_category_.json
+++ b/docs/docs/observability/integrations/_category_.json
@@ -1,4 +1,4 @@
 {
-  "position": 6,
+  "position": 9,
   "label": "Integrations"
-}
+}
\ No newline at end of file
diff --git a/docs/docs/observability/query-data/01-query-api.mdx b/docs/docs/observability/query-data/01-query-api.mdx
new file mode 100644
index 0000000000..a5021300f8
--- /dev/null
+++ b/docs/docs/observability/query-data/01-query-api.mdx
@@ -0,0 +1,716 @@
+---
+title: "Query Trace Data with the Agenta API"
+sidebar_label: "Query via API"
+description: "Learn how to programmatically query and filter LLM traces and spans using the Agenta Query Data API with Python and JavaScript examples"
+sidebar_position: 1
+---
+
+```mdx-code-block
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+```
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/query-data-api-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+The Agenta API lets you query trace and span data programmatically. You can filter by attributes, time ranges, and status codes. The API returns data in two formats: flat spans or hierarchical trace trees.
+
+## API Endpoint
+
+```
+POST /api/preview/tracing/spans/query
+```
+
+Send queries as JSON in the request body. This approach works best for complex filters with nested conditions.
+
+## Authentication
+
+Include your API key in the request header:
+
+```http
+Authorization: ApiKey YOUR_API_KEY
+```
+
+You can create API keys from the Settings page in your Agenta workspace.
+
+## Response Format Options
+
+### Focus Parameter
+
+Choose how the API returns your data:
+
+**`span`**: Returns individual spans in a flat list. Use this when you need span-level details.
+
+```json
+{
+  "focus": "span"
+}
+```
+
+Example response:
+
+```json
+{
+  "count": 2,
+  "spans": [
+    {
+      "trace_id": "abc123",
+      "span_id": "span001",
+      "parent_id": "root",
+      "span_name": "openai.chat",
+      "status_code": "STATUS_CODE_OK",
+      "attributes": {
+        "ag": {
+          "metrics": {
+            "costs": { "cumulative": { "total": 0.0023 } },
+            "tokens": { "cumulative": { "total": 450 } }
+          }
+        }
+      }
+    },
+    {
+      "trace_id": "def456",
+      "span_id": "span002",
+      "parent_id": "root",
+      "span_name": "generate_response",
+      "status_code": "STATUS_CODE_OK",
+      "attributes": { "..." }
+    }
+  ]
+}
+```
+
+**`trace`** (default): Returns complete traces as hierarchical trees. Each trace groups its spans by `trace_id`. Use this when you need to see the full request flow.
+
+```json
+{
+  "focus": "trace"
+}
+```
+
+Example response:
+
+```json
+{
+  "count": 1,
+  "traces": {
+    "abc123": {
+      "spans": {
+        "generate_response": {
+          "span_id": "root",
+          "span_name": "generate_response",
+          "status_code": "STATUS_CODE_OK",
+          "spans": {
+            "openai.chat": {
+              "span_id": "span001",
+              "span_name": "openai.chat",
+              "parent_id": "root",
+              "status_code": "STATUS_CODE_OK",
+              "attributes": {
+                "ag": {
+                  "metrics": {
+                    "costs": { "cumulative": { "total": 0.0023 } },
+                    "tokens": { "cumulative": { "total": 450 } }
+                  }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+## Time Windows and Limits
+
+Control which traces you retrieve using time ranges and result limits.
+
+### Time Range
+
+Specify start and end timestamps using ISO 8601 format or Unix timestamps:
+
+```json
+{
+  "oldest": "2024-01-01T00:00:00Z",
+  "newest": "2024-01-31T23:59:59Z"
+}
+```
+
+The `oldest` timestamp is included in results. The `newest` timestamp is excluded.
+
+You can also use Unix timestamps (in seconds):
+
+```json
+{
+  "oldest": 1704067200,
+  "newest": 1706745599
+}
+```
+
+### Result Limit
+
+Limit the number of results:
+
+```json
+{
+  "limit": 100
+}
+```
+
+Combine time ranges with limits for precise control:
+
+```json
+{
+  "oldest": "2024-01-01T00:00:00Z",
+  "newest": "2024-01-31T23:59:59Z",
+  "limit": 50
+}
+```
+
+## Filtering Data
+
+Filter traces and spans using conditions with various operators.
+
+### Basic Filter Structure
+
+Each filter needs an operator and conditions:
+
+```json
+{
+  "filter": {
+    "operator": "and",
+    "conditions": [
+      {
+        "field": "span_name",
+        "operator": "is",
+        "value": "my_span"
+      }
+    ]
+  }
+}
+```
+
+### Logical Operators
+
+Combine conditions using these operators:
+
+**`and`** (default): All conditions must match.
+
+**`or`**: At least one condition must match.
+
+**`not`**: Negates the condition.
+
+**`nand`**: At least one condition does not match.
+
+**`nor`**: No conditions match.
+
+Example using OR logic:
+
+```json
+{
+  "filter": {
+    "operator": "or",
+    "conditions": [
+      { "field": "span_name", "value": "chat" },
+      { "field": "span_name", "value": "completion" }
+    ]
+  }
+}
+```
+
+### Fields You Can Filter
+
+Filter by these standard fields:
+
+**`trace_id`**: Trace identifier
+
+**`span_id`**: Span identifier
+
+**`parent_id`**: Parent span identifier
+
+**`span_name`**: Span name
+
+**`span_kind`**: Type of span (`SPAN_KIND_UNSPECIFIED`, `SPAN_KIND_INTERNAL`, `SPAN_KIND_SERVER`, `SPAN_KIND_CLIENT`, `SPAN_KIND_PRODUCER`, `SPAN_KIND_CONSUMER`)
+
+**`start_time`**: Start timestamp
+
+**`end_time`**: End timestamp
+
+**`status_code`**: Status code (`STATUS_CODE_UNSET`, `STATUS_CODE_OK`, `STATUS_CODE_ERROR`)
+
+**`status_message`**: Status message text
+
+**`attributes`**: Span attributes (requires `key` parameter)
+
+**`links`**: Linked spans
+
+**`references`**: References to applications, variants, or revisions
+
+### Filtering by Attributes
+
+Access span attributes using the `key` parameter:
+
+```json
+{
+  "field": "attributes",
+  "key": "ag.type.span",
+  "operator": "is",
+  "value": "llm"
+}
+```
+
+For nested attributes, use dot notation:
+
+```json
+{
+  "field": "attributes",
+  "key": "ag.metrics.unit.cost",
+  "operator": "gt",
+  "value": 0.01
+}
+```
+
+### Comparison Operators
+
+#### Equality
+
+**`is`** (default): Exact match
+
+**`is_not`**: Not equal
+
+```json
+{
+  "field": "span_name",
+  "operator": "is",
+  "value": "openai.chat"
+}
+```
+
+#### Numeric Comparisons
+
+**`eq`**: Equal to
+
+**`neq`**: Not equal to
+
+**`gt`**: Greater than
+
+**`gte`**: Greater than or equal to
+
+**`lt`**: Less than
+
+**`lte`**: Less than or equal to
+
+**`btwn`**: Between (inclusive, provide array `[min, max]`)
+
+Example filtering by duration:
+
+```json
+{
+  "field": "attributes",
+  "key": "ag.metrics.unit.duration",
+  "operator": "gt",
+  "value": 1000
+}
+```
+
+Example using between:
+
+```json
+{
+  "field": "attributes",
+  "key": "ag.metrics.unit.duration",
+  "operator": "btwn",
+  "value": [500, 2000]
+}
+```
+
+#### String Matching
+
+**`startswith`**: Starts with prefix
+
+**`endswith`**: Ends with suffix
+
+**`contains`**: Contains substring
+
+**`matches`**: Regular expression match
+
+**`like`**: SQL-like pattern (supports `%` wildcard)
+
+Example searching span names:
+
+```json
+{
+  "field": "span_name",
+  "operator": "contains",
+  "value": "api"
+}
+```
+
+#### List Operations
+
+**`in`**: Value exists in the list
+
+**`not_in`**: Value does not exist in the list
+
+Example filtering multiple traces:
+
+```json
+{
+  "field": "trace_id",
+  "operator": "in",
+  "value": [
+    "trace_id_1",
+    "trace_id_2",
+    "trace_id_3"
+  ]
+}
+```
+
+#### Existence Checks
+
+**`exists`**: Field or attribute exists (any value, including null)
+
+**`not_exists`**: Field or attribute does not exist
+
+Example checking for cost tracking:
+
+```json
+{
+  "field": "attributes",
+  "key": "ag.metrics.unit.cost",
+  "operator": "exists"
+}
+```
+
+Note: When using `exists` or `not_exists`, omit the `value` field.
+
+## Advanced Filtering Examples
+
+### Multiple Conditions
+
+Filter production traces with slow response times:
+
+```json
+{
+  "filter": {
+    "operator": "and",
+    "conditions": [
+      {
+        "field": "attributes",
+        "key": "environment",
+        "value": "production"
+      },
+      {
+        "field": "attributes",
+        "key": "ag.metrics.unit.duration",
+        "operator": "gt",
+        "value": 1000
+      },
+      {
+        "field": "status_code",
+        "operator": "is_not",
+        "value": "STATUS_CODE_ERROR"
+      }
+    ]
+  }
+}
+```
+
+### Nested Logical Operators
+
+Find API calls that either errored or took too long:
+
+```json
+{
+  "filter": {
+    "operator": "and",
+    "conditions": [
+      {
+        "field": "span_name",
+        "operator": "startswith",
+        "value": "api_"
+      },
+      {
+        "operator": "or",
+        "conditions": [
+          {
+            "field": "status_code",
+            "value": "STATUS_CODE_ERROR"
+          },
+          {
+            "field": "attributes",
+            "key": "ag.metrics.unit.duration",
+            "operator": "gt",
+            "value": 5000
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+### Filter by Application References
+
+Find traces for specific applications:
+
+```json
+{
+  "filter": {
+    "conditions": [
+      {
+        "field": "references",
+        "operator": "in",
+        "value": [
+          { "id": "application_id_1" },
+          { "id": "application_id_2" }
+        ]
+      }
+    ]
+  }
+}
+```
+
+### Filter by Linked Spans
+
+Find spans linked to specific traces:
+
+```json
+{
+  "filter": {
+    "conditions": [
+      {
+        "field": "links",
+        "operator": "in",
+        "value": [
+          {
+            "trace_id": "trace_id_value",
+            "span_id": "span_id_value"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+## Complete Query Example
+
+Here's a full query that finds successful LLM calls from the last month:
+
+```json
+{
+  "focus": "trace",
+  "oldest": "2024-01-01T00:00:00Z",
+  "newest": "2024-01-31T23:59:59Z",
+  "limit": 50,
+  "filter": {
+    "operator": "and",
+    "conditions": [
+      {
+        "field": "attributes",
+        "key": "ag.type.span",
+        "value": "llm"
+      },
+      {
+        "field": "status_code",
+        "operator": "is_not",
+        "value": "STATUS_CODE_ERROR"
+      }
+    ]
+  }
+}
+```
+
+## Response Structure
+
+The API returns JSON with these fields:
+
+```json
+{
+  "count": 42,
+  "spans": [...],     // Present when focus=span
+  "traces": {...}     // Present when focus=trace
+}
+```
+
+**`count`**: Number of results returned
+
+**`spans`**: Array of spans (when `focus=span`)
+
+**`traces`**: Dictionary of trace trees indexed by trace_id (when `focus=trace`)
+
+### Span Format
+
+Each span contains:
+
+```json
+{
+  "trace_id": "...",
+  "span_id": "...",
+  "parent_id": "...",
+  "span_name": "openai.chat",
+  "span_kind": "SPAN_KIND_CLIENT",
+  "start_time": "2024-01-15T10:30:00Z",
+  "end_time": "2024-01-15T10:30:03Z",
+  "status_code": "STATUS_CODE_OK",
+  "attributes": {
+    "ag": {
+      "data": {
+        "inputs": {...},
+        "outputs": {...}
+      },
+      "metrics": {
+        "costs": {...},
+        "tokens": {...},
+        "duration": {...}
+      },
+      "type": {
+        "span": "llm"
+      }
+    }
+  }
+}
+```
+
+### Trace Format
+
+Each trace organizes spans hierarchically by name:
+
+```json
+{
+  "trace_id_1": {
+    "spans": {
+      "root_span": {
+        "span_id": "...",
+        "spans": {
+          "child_span": {
+            "span_id": "...",
+            "..."
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+## Error Responses
+
+The API returns standard HTTP status codes:
+
+**200 OK**: Query succeeded
+
+**400 Bad Request**: Invalid filter syntax or parameters
+
+**401 Unauthorized**: Missing or invalid API key
+
+**422 Unprocessable Entity**: Validation errors in the request
+
+## Python Example
+
+Here's a complete example using Python:
+
+```python
+import requests
+from datetime import datetime, timedelta, timezone
+
+AGENTA_HOST = "https://cloud.agenta.ai"
+API_KEY = "your_api_key_here"
+
+# Query spans from the last 7 days
+now = datetime.now(timezone.utc)
+week_ago = now - timedelta(days=7)
+
+query = {
+    "focus": "span",
+    "oldest": week_ago.isoformat(),
+    "newest": now.isoformat(),
+    "limit": 100,
+    "filter": {
+        "operator": "and",
+        "conditions": [
+            {
+                "field": "attributes",
+                "key": "ag.type.span",
+                "value": "llm"
+            },
+            {
+                "field": "status_code",
+                "value": "STATUS_CODE_OK"
+            }
+        ]
+    }
+}
+
+response = requests.post(
+    f"{AGENTA_HOST}/api/preview/tracing/spans/query",
+    headers={
+        "Authorization": f"ApiKey {API_KEY}",
+        "Content-Type": "application/json"
+    },
+    json=query
+)
+
+data = response.json()
+print(f"Found {data['count']} spans")
+```
+
+## JavaScript Example
+
+Here's the same query in JavaScript:
+
+```javascript
+const AGENTA_HOST = "https://cloud.agenta.ai";
+const API_KEY = "your_api_key_here";
+
+// Query spans from the last 7 days
+const now = new Date();
+const weekAgo = new Date(now.getTime() - 7 * 24 * 60 * 60 * 1000);
+
+const query = {
+    focus: "span",
+    oldest: weekAgo.toISOString(),
+    newest: now.toISOString(),
+    limit: 100,
+    filter: {
+        operator: "and",
+        conditions: [
+            {
+                field: "attributes",
+                key: "ag.type.span",
+                value: "llm"
+            },
+            {
+                field: "status_code",
+                value: "STATUS_CODE_OK"
+            }
+        ]
+    }
+};
+
+const response = await fetch(
+    `${AGENTA_HOST}/api/preview/tracing/spans/query`,
+    {
+        method: "POST",
+        headers: {
+            "Authorization": `ApiKey ${API_KEY}`,
+            "Content-Type": "application/json"
+        },
+        body: JSON.stringify(query)
+    }
+);
+
+const data = await response.json();
+console.log(`Found ${data.count} spans`);
+```
+
+## Next Steps
+
+Learn about the [Analytics Data API](/observability/query-data/analytics-data) for aggregated metrics and time-series data.
+
+Explore the [API Reference](/reference/api/category) for complete endpoint documentation.
+
+Check out [Filtering in the UI](/observability/concepts) to learn about the visual query builder.
diff --git a/docs/docs/observability/query-data/02-analytics-data.mdx b/docs/docs/observability/query-data/02-analytics-data.mdx
new file mode 100644
index 0000000000..3de7455510
--- /dev/null
+++ b/docs/docs/observability/query-data/02-analytics-data.mdx
@@ -0,0 +1,473 @@
+---
+title: "Analyze Observability Metrics with the Agenta Analytics API"
+sidebar_label: "Analytics via API"
+description: "Learn how to retrieve and analyze aggregated LLM performance metrics including costs, latency, token usage, and error rates using the Agenta Analytics API with Python and JavaScript examples"
+sidebar_position: 2
+---
+
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/analytics-api-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+## Overview
+
+The Agenta Analytics API retrieves aggregated metrics from your LLM traces. The API groups your data into time buckets and calculates metrics like cost, latency, token usage, and error rates.
+
+Use analytics to:
+- Track LLM costs over time
+- Monitor performance trends
+- Identify error patterns
+- Analyze token consumption
+
+**Endpoint**: `POST /api/preview/tracing/spans/analytics`
+
+**Authentication**: You can create API keys from the Settings page in your Agenta workspace.
+
+## Quick Start
+
+### Python
+
+```python
+import requests
+from datetime import datetime, timedelta, timezone
+
+# Setup
+AGENTA_HOST = "https://cloud.agenta.ai"
+API_KEY = "your_api_key_here"
+BASE_URL = f"{AGENTA_HOST}/api/preview/tracing/spans/analytics"
+
+headers = {
+    "Authorization": f"ApiKey {API_KEY}",
+    "Content-Type": "application/json"
+}
+
+# Get analytics for last 7 days with daily buckets
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=7)
+
+payload = {
+    "focus": "trace",
+    "interval": 1440,  # 1440 minutes = daily buckets
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    }
+}
+
+response = requests.post(BASE_URL, headers=headers, json=payload)
+data = response.json()
+
+print(f"Found {data['count']} daily buckets")
+for bucket in data['buckets']:
+    if bucket['total']['count'] > 0:
+        print(f"\nDate: {bucket['timestamp'][:10]}")
+        print(f"  Traces: {bucket['total']['count']}")
+        print(f"  Cost: ${bucket['total']['costs']:.4f}")
+        print(f"  Avg Duration: {bucket['total']['duration'] / bucket['total']['count']:.0f}ms")
+        print(f"  Errors: {bucket['errors']['count']}")
+```
+
+### JavaScript
+
+```javascript
+const AGENTA_HOST = "https://cloud.agenta.ai";
+const API_KEY = "your_api_key_here";
+const BASE_URL = `${AGENTA_HOST}/api/preview/tracing/spans/analytics`;
+
+// Get analytics for last 7 days with daily buckets
+const newest = new Date();
+const oldest = new Date(newest.getTime() - 7 * 24 * 60 * 60 * 1000);
+
+const payload = {
+  focus: "trace",
+  interval: 1440, // 1440 minutes = daily buckets
+  windowing: {
+    oldest: oldest.toISOString(),
+    newest: newest.toISOString()
+  }
+};
+
+const response = await fetch(BASE_URL, {
+  method: "POST",
+  headers: {
+    "Authorization": `ApiKey ${API_KEY}`,
+    "Content-Type": "application/json"
+  },
+  body: JSON.stringify(payload)
+});
+
+const data = await response.json();
+
+console.log(`Found ${data.count} daily buckets`);
+data.buckets.forEach(bucket => {
+  if (bucket.total.count > 0) {
+    const date = bucket.timestamp.substring(0, 10);
+    const avgDuration = bucket.total.duration / bucket.total.count;
+    console.log(`\nDate: ${date}`);
+    console.log(`  Traces: ${bucket.total.count}`);
+    console.log(`  Cost: $${bucket.total.costs.toFixed(4)}`);
+    console.log(`  Avg Duration: ${avgDuration.toFixed(0)}ms`);
+    console.log(`  Errors: ${bucket.errors.count}`);
+  }
+});
+```
+
+## Request Parameters
+
+All parameters are sent in the JSON request body:
+
+### Focus (Required)
+
+Controls the aggregation level:
+
+- **`trace`**: Aggregate by complete traces (most common)
+- **`span`**: Aggregate individual spans
+
+Most analytics queries use `trace` to analyze complete LLM requests.
+
+```python
+payload = {
+    "focus": "trace",
+    "interval": 1440
+}
+```
+
+### Interval (Required)
+
+Bucket size in minutes. Common values:
+
+- **`60`** = Hourly buckets
+- **`1440`** = Daily buckets (24 hours)
+- **`10080`** = Weekly buckets (7 days)
+
+```python
+payload = {
+    "focus": "trace",
+    "interval": 1440  # Daily buckets
+}
+```
+
+### Windowing (Optional)
+
+Specify the time range for your analytics. If not provided, defaults to the last 30 days.
+
+```python
+from datetime import datetime, timedelta, timezone
+
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=30)
+
+payload = {
+    "focus": "trace",
+    "interval": 1440,
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    }
+}
+```
+
+### Filter (Optional)
+
+Filter which traces to include in analytics. Uses the same filter syntax as the Query API.
+
+**Filter by status code:**
+```python
+payload = {
+    "focus": "trace",
+    "interval": 1440,
+    "filter": {
+        "conditions": [
+            {
+                "field": "status.code",
+                "operator": "eq",
+                "value": "STATUS_CODE_OK"
+            }
+        ]
+    }
+}
+```
+
+**Filter by span name:**
+```python
+payload = {
+    "focus": "trace",
+    "interval": 1440,
+    "filter": {
+        "conditions": [
+            {
+                "field": "name",
+                "operator": "contains",
+                "value": "openai"
+            }
+        ]
+    }
+}
+```
+
+**Multiple conditions:**
+```python
+payload = {
+    "focus": "trace",
+    "interval": 1440,
+    "filter": {
+        "operator": "and",
+        "conditions": [
+            {
+                "field": "status.code",
+                "operator": "eq",
+                "value": "STATUS_CODE_OK"
+            },
+            {
+                "field": "attributes.ag.metrics.costs.cumulative.total",
+                "operator": "gt",
+                "value": 0.01
+            }
+        ]
+    }
+}
+```
+
+## Response Format
+
+The API returns aggregated metrics grouped into time buckets:
+
+```json
+{
+  "count": 7,
+  "buckets": [
+    {
+      "timestamp": "2025-10-24T00:00:00Z",
+      "interval": 1440,
+      "total": {
+        "count": 150,
+        "duration": 45000.5,
+        "costs": 0.0234,
+        "tokens": 1200.0
+      },
+      "errors": {
+        "count": 5,
+        "duration": 2300.0,
+        "costs": 0.0,
+        "tokens": 0.0
+      }
+    },
+    {
+      "timestamp": "2025-10-25T00:00:00Z",
+      "interval": 1440,
+      "total": {
+        "count": 200,
+        "duration": 60000.0,
+        "costs": 0.0312,
+        "tokens": 1600.0
+      },
+      "errors": {
+        "count": 3,
+        "duration": 1500.0,
+        "costs": 0.0,
+        "tokens": 0.0
+      }
+    }
+  ]
+}
+```
+
+### Response Fields
+
+- **`count`**: Number of time buckets returned
+- **`buckets`**: Array of time-based aggregated metrics
+
+### Bucket Fields
+
+- **`timestamp`**: Start time of the bucket (ISO 8601)
+- **`interval`**: Bucket size in minutes
+- **`total`**: Aggregated metrics for all traces in this bucket
+  - **`count`**: Number of traces
+  - **`duration`**: Total duration in milliseconds
+  - **`costs`**: Total cost in USD
+  - **`tokens`**: Total tokens used
+- **`errors`**: Aggregated metrics for failed traces only
+  - **`count`**: Number of failed traces
+  - **`duration`**: Total duration of failed traces in milliseconds
+  - **`costs`**: Total cost of failed traces (usually 0)
+  - **`tokens`**: Total tokens in failed traces (usually 0)
+
+## Common Use Cases
+
+### Monitor Daily Costs
+
+Track LLM spending over time:
+
+```python
+import requests
+from datetime import datetime, timedelta, timezone
+
+# Get daily costs for last 30 days
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=30)
+
+payload = {
+    "focus": "trace",
+    "interval": 1440,  # Daily buckets
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    }
+}
+
+response = requests.post(BASE_URL, headers=headers, json=payload)
+data = response.json()
+
+# Calculate totals
+total_cost = sum(b['total']['costs'] for b in data['buckets'])
+total_traces = sum(b['total']['count'] for b in data['buckets'])
+
+print(f"Total Cost (30 days): ${total_cost:.2f}")
+print(f"Total Traces: {total_traces:,}")
+print(f"Average Cost per Trace: ${total_cost/total_traces:.4f}")
+```
+
+### Analyze Error Trends
+
+Monitor error rates to identify reliability issues:
+
+```python
+# Get hourly metrics for last 7 days
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=7)
+
+payload = {
+    "focus": "trace",
+    "interval": 60,  # Hourly buckets
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    }
+}
+
+response = requests.post(BASE_URL, headers=headers, json=payload)
+data = response.json()
+
+# Find high error rate periods
+print("Hours with high error rates (>5%):")
+for bucket in data['buckets']:
+    if bucket['total']['count'] > 0:
+        error_rate = (bucket['errors']['count'] / bucket['total']['count']) * 100
+        if error_rate > 5:
+            print(f"  {bucket['timestamp']}: {error_rate:.1f}%")
+```
+
+### Track Token Usage
+
+Monitor token consumption patterns:
+
+```python
+# Get daily token usage for last 7 days
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=7)
+
+payload = {
+    "focus": "trace",
+    "interval": 1440,  # Daily buckets
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    }
+}
+
+response = requests.post(BASE_URL, headers=headers, json=payload)
+data = response.json()
+
+print("Daily Token Usage:")
+for bucket in data['buckets']:
+    if bucket['total']['count'] > 0:
+        date = bucket['timestamp'][:10]
+        avg_tokens = bucket['total']['tokens'] / bucket['total']['count']
+        print(f"  {date}: {bucket['total']['tokens']:,.0f} total ({avg_tokens:.0f} avg)")
+```
+
+### Compare Performance
+
+Analyze latency trends over time:
+
+```python
+# Get hourly performance for last 24 hours
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=1)
+
+payload = {
+    "focus": "trace",
+    "interval": 60,  # Hourly buckets
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    }
+}
+
+response = requests.post(BASE_URL, headers=headers, json=payload)
+data = response.json()
+
+print("Hourly Average Latency:")
+latencies = []
+for bucket in data['buckets']:
+    if bucket['total']['count'] > 0:
+        avg_duration = bucket['total']['duration'] / bucket['total']['count']
+        latencies.append(avg_duration)
+        hour = bucket['timestamp'][11:16]
+        print(f"  {hour}: {avg_duration:.0f}ms")
+
+if latencies:
+    print(f"\nStatistics:")
+    print(f"  Min: {min(latencies):.0f}ms")
+    print(f"  Max: {max(latencies):.0f}ms")
+    print(f"  Avg: {sum(latencies)/len(latencies):.0f}ms")
+```
+
+### Filter by Successful Traces Only
+
+Analyze only successful requests:
+
+```python
+# Get metrics for successful traces only
+newest = datetime.now(timezone.utc)
+oldest = newest - timedelta(days=7)
+
+payload = {
+    "focus": "trace",
+    "interval": 1440,
+    "windowing": {
+        "oldest": oldest.isoformat(),
+        "newest": newest.isoformat()
+    },
+    "filter": {
+        "conditions": [
+            {
+                "field": "status.code",
+                "operator": "eq",
+                "value": "STATUS_CODE_OK"
+            }
+        ]
+    }
+}
+
+response = requests.post(BASE_URL, headers=headers, json=payload)
+data = response.json()
+
+# Calculate success metrics
+total_count = sum(b['total']['count'] for b in data['buckets'])
+total_cost = sum(b['total']['costs'] for b in data['buckets'])
+total_duration = sum(b['total']['duration'] for b in data['buckets'])
+
+print("Successful Traces (Last 7 Days):")
+print(f"  Count: {total_count:,}")
+print(f"  Total Cost: ${total_cost:.4f}")
+print(f"  Avg Duration: {total_duration/total_count:.0f}ms")
+```
+
+## Next Steps
+
+- Learn about [Query API](/observability/query-data/query-api) for detailed trace data
+- Explore [Using the UI](/observability/concepts) for visual analytics
+- Read about [Semantic Conventions](/observability/trace-with-opentelemetry/semantic-conventions) for available metrics
diff --git a/docs/docs/observability/query-data/_category_.json b/docs/docs/observability/query-data/_category_.json
new file mode 100644
index 0000000000..93a5859395
--- /dev/null
+++ b/docs/docs/observability/query-data/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Query Data",
+  "position": 8,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/observability/trace-with-opentelemetry/01-getting-started.mdx b/docs/docs/observability/trace-with-opentelemetry/01-getting-started.mdx
new file mode 100644
index 0000000000..2048c5c03f
--- /dev/null
+++ b/docs/docs/observability/trace-with-opentelemetry/01-getting-started.mdx
@@ -0,0 +1,103 @@
+---
+title: "Getting Started with OpenTelemetry"
+sidebar_label: "Getting Started"
+description: "Learn how to configure OpenTelemetry to send traces to Agenta's observability platform"
+sidebar_position: 1
+---
+
+Agenta accepts traces via the OpenTelemetry Protocol (OTLP) endpoint. You can use any OpenTelemetry-compatible instrumentation library to send traces to Agenta.
+
+## OTLP Endpoint
+
+Agenta accepts traces via the **OTLP/HTTP protocol** using **protobuf** encoding:
+
+**Endpoint:** `https://cloud.agenta.ai/api/otlp/v1/traces`
+
+For self-hosted installations, replace `https://cloud.agenta.ai` with your instance URL.
+
+:::warning
+Agenta does **not** support `gRPC` for the OpenTelemetry endpoint. Please use **HTTP/protobuf** instead.
+:::
+
+## Authentication
+
+Agenta uses ApiKey-based authentication for the OTLP endpoint:
+
+```javascript
+headers: {
+  Authorization: `ApiKey ${AGENTA_API_KEY}`
+}
+```
+
+### Getting Your API Key
+
+1. Visit the [Agenta API Keys page](https://cloud.agenta.ai/settings?tab=apiKeys)
+2. Click on **Create New API Key** and follow the prompts
+3. Copy the API key and set it as an environment variable:
+
+```bash
+export AGENTA_API_KEY="YOUR_AGENTA_API_KEY"
+export AGENTA_HOST="https://cloud.agenta.ai"  # Change for self-hosted
+```
+
+## Configuration
+
+When using OpenTelemetry SDKs directly (without the Agenta SDK), configure the OTLP exporter to point to Agenta:
+
+```bash
+OTEL_EXPORTER_OTLP_ENDPOINT="https://cloud.agenta.ai/api/otlp"
+OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://cloud.agenta.ai/api/otlp/v1/traces"
+OTEL_EXPORTER_OTLP_HEADERS="Authorization=ApiKey ${AGENTA_API_KEY}"
+```
+
+:::info
+If your collector requires signal-specific environment variables, use the trace-specific endpoint:
+
+```bash
+OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://cloud.agenta.ai/api/otlp/v1/traces"
+OTEL_EXPORTER_OTLP_TRACES_HEADERS="Authorization=ApiKey ${AGENTA_API_KEY}"
+```
+:::
+
+## Supported Languages
+
+OpenTelemetry SDKs are available for many languages:
+
+- **Python**: Use the [Agenta Python SDK](/observability/quickstart-python) or OpenTelemetry SDK directly
+- **Node.js / TypeScript**: See the [OpenTelemetry Quick Start](/observability/quick-start-opentelemetry)
+- **Java**: Use the OpenTelemetry Java SDK
+- **Go**: Use the OpenTelemetry Go SDK
+- **.NET**: Use the OpenTelemetry .NET SDK
+- **Ruby, PHP, Rust**: OpenTelemetry SDKs available for all
+
+All can send traces to Agenta using the OTLP endpoint above.
+
+## Using OpenTelemetry Instrumentation Libraries
+
+Agenta is compatible with many OpenTelemetry instrumentation libraries that extend language and framework support. These libraries work seamlessly with Agenta's OTLP endpoint:
+
+### Popular Libraries
+
+- **[OpenLLMetry](https://github.com/Arize-ai/openllmetry)**: Supports multiple LLMs (OpenAI, Anthropic, Azure, etc.) and frameworks (LangChain, LlamaIndex)
+- **[OpenLIT](https://github.com/openlit/openlit)**: Comprehensive instrumentation for LLMs, vector DBs, and frameworks
+- **[OpenInference](https://arize-ai.github.io/openinference/)**: Arize's OpenTelemetry instrumentation for LLM applications
+
+### Framework Integrations
+
+Many frameworks have OpenTelemetry support built-in or via plugins:
+
+- **LangChain**: OpenTelemetry instrumentation available
+- **LlamaIndex**: OpenTelemetry support via plugins
+- **AutoGen**: OpenTelemetry compatible
+- **Semantic Kernel**: OpenTelemetry integration available
+- **Spring AI**: Java framework with OpenTelemetry support
+
+See the [semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions) page for details on how Agenta maps OpenTelemetry attributes.
+
+## Next Steps
+
+- Follow the [OpenTelemetry Quick Start](/observability/quick-start-opentelemetry) for a complete example
+- Learn about [semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions) for better trace display
+- Explore [distributed tracing](/observability/trace-with-opentelemetry/distributed-tracing) across services
+- Configure the [OpenTelemetry Collector](/observability/trace-with-opentelemetry/otel-collector-configuration) to forward traces
+
diff --git a/docs/docs/observability/05-otel-semconv.mdx b/docs/docs/observability/trace-with-opentelemetry/03-semantic-conventions.mdx
similarity index 51%
rename from docs/docs/observability/05-otel-semconv.mdx
rename to docs/docs/observability/trace-with-opentelemetry/03-semantic-conventions.mdx
index 440657179c..6fbe80c3de 100644
--- a/docs/docs/observability/05-otel-semconv.mdx
+++ b/docs/docs/observability/trace-with-opentelemetry/03-semantic-conventions.mdx
@@ -1,16 +1,18 @@
 ---
-title: Agenta OpenTelemetry Semantic Conventions
-sidebar_label: Otel Semantic Conventions
-description: "Learn about the OpenTelemetry semantic conventions used by Agenta's LLM observability system."
+title: "Semantic Conventions"
+sidebar_label: "Semantic Conventions"
+description: "Learn about the OpenTelemetry semantic conventions used by Agenta's LLM observability system"
+sidebar_position: 3
 ---
 
 This document describes how Agenta applies domain-specific OpenTelemetry conventions to capture and analyze traces from LLM applications.
 
 ## How It Works
+
 Agenta accepts any span that follows the OpenTelemetry specification. To unlock LLM-specific features—such as nicely formatted chat messages, per-request cost and latency, and links to prompt configurations or evaluators—we add attributes under the `ag.` namespace.
 
 :::info
-The OTLP endpoint accepts batches up to **5&nbsp;MB** by default (after decompression). Larger requests return an HTTP `413` status code.
+The OTLP endpoint accepts batches up to **5 MB** by default (after decompression). Larger requests return an HTTP `413` status code.
 :::
 
 We support two primary instrumentation approaches:
@@ -29,7 +31,7 @@ All Agenta-specific attributes are organized under the `ag` namespace to avoid c
 The `ag.data` namespace contains the core execution data for each span:
 
 - **`ag.data.inputs`**: Input parameters for the span
-- **`ag.data.outputs`**: Output results from the span  
+- **`ag.data.outputs`**: Output results from the span
 - **`ag.data.internals`**: Internal variables and intermediate values
 
 #### Data Format
@@ -56,8 +58,7 @@ The `ag.data` namespace contains the core execution data for each span:
 }
 ```
 
-
-**Internals**: User-provided internal information such as context variables, intermediate calculations, or evaluation data that aren't part of the primary inputs/outputs. These are set by the user [in the SDK using](/observability/observability-sdk#storing-internals) `ag.store_internals()`. 
+**Internals**: User-provided internal information such as context variables, intermediate calculations, or evaluation data that aren't part of the primary inputs/outputs. These are set by the user [in the SDK using](/observability/trace-with-python-sdk/adding-metadata) `ag.tracing.store_internals()`.
 
 #### SDK Integration
 
@@ -71,7 +72,7 @@ def my_function(input_param):
     return result
 ```
 
-The decorator automatically captures function inputs and outputs in `ag.data.inputs` and `ag.data.outputs` unless you choose to [mask sensitive data](/observability/observability-sdk).
+The decorator automatically captures function inputs and outputs in `ag.data.inputs` and `ag.data.outputs` unless you choose to [mask sensitive data](/observability/trace-with-python-sdk/redact-sensitive-data).
 
 ### ag.meta
 
@@ -128,84 +129,126 @@ Auto-instrumentation maps common semantic-convention keys—e.g. gen_ai.system,
 
 Metadata is displayed in the observability overview page as contextual information to help navigate and understand span execution.
 
-### ag.refs
+### References
+
+Use the top level `references` array to link spans to Agenta entities. Every entry represents one relationship and includes:
+
+- `attributes.key`: the reference category (for example `application`, `evaluator_variant`)
+- `id`, `slug`, or `version`: supply whichever identifiers you have; you can include more than one field if available
 
-The `ag.refs` namespace contains references to entities within the Agenta system:
+Example payload:
 
 ```json
 {
-  "refs": {
-    "application": {
-      "id": "uuid"
-    },
-    "variant": {
-      "id": "uuid", 
-      "version": "1"
-    },
-    "environment": {
-      "slug": "production",
-      "id": "uuid"
-    }
-  }
+  "references": [
+    {"id": "019a0159-82d3-7760-9868-4f8c7da8e9c0", "attributes": {"key": "application"}},
+    {"slug": "production", "attributes": {"key": "environment"}},
+    {"id": "019a0159-82d3-7760-9868-4f8c7da8e9c1", "version": "4", "attributes": {"key": "application_variant"}}
+  ]
 }
 ```
 
-#### Reference Types
+Supported categories:
 
-- **Application**: Links to the Agenta application
-- **Variant**: References specific prompt variants and versions
-- **Environment**: Links to deployment environments (production, staging, etc.)
-- **Test Set**: References to test datasets
-- **Test Case**: Links to individual test cases
-- **Evaluation Run**: References to evaluation executions
-- **Evaluator**: Links to evaluation functions
+- application, application_variant, application_revision
+- environment, environment_variant, environment_revision
+- evaluator, evaluator_variant, evaluator_revision
+- testset, testset_variant, testset_revision, testcase
+- query, query_variant, query_revision
+- workflow, workflow_variant, workflow_revision
 
-These references enable navigation within the Agenta UI and allow filtering spans by specific entities.
+Consumers (UI, analytics, filtering) read from this array. Instrumentation libraries that cannot emit the array may still set the attribute form (`ag.references.<category>.<field>`); the ingestion service converts that dictionary into the same array before storage.
+
+:::warning
+The legacy `ag.refs.*` namespace is deprecated and will be removed after existing SDKs migrate. Do not rely on it.
+:::
 
 ### ag.metrics
 
-The `ag.metrics` namespace tracks performance and cost metrics:
+The `ag.metrics` namespace tracks performance, cost, and error metrics:
 
 ```json
 {
   "metrics": {
-    "acc": {
-      "costs": {
-        "total": 0.00003925
+    "costs": {
+      "cumulative": {
+        "total": 0.0070902,
+        "prompt": 0.00355,
+        "completion": 0.00354
       },
-      "tokens": {
-        "total": 157,
-        "prompt": 26,
-        "completion": 131
-      },
-      "duration": {
-        "total": 1251.157
+      "incremental": {
+        "total": 0.0070902
       }
     },
-    "unit": {
-      "costs": {
-        "total": 0.00003925
+    "tokens": {
+      "cumulative": {
+        "total": 992,
+        "prompt": 175,
+        "completion": 817
       },
-      "tokens": {
-        "total": 157,
-        "prompt": 26,
-        "completion": 131
+      "incremental": {
+        "total": 992,
+        "prompt": 175,
+        "completion": 817
       }
-    }
+    },
+    "duration": {
+      "cumulative": 19889.343
+    },
+    "errors": {}
   }
 }
 ```
 
-#### Metric Types
+#### Aggregation Types
+
+Metrics are tracked at two levels:
+
+- **`incremental`**: Metrics for this span only (excluding child spans)
+- **`cumulative`**: Metrics for this span plus all child spans aggregated together
 
-- **Accumulated (`acc`)**: Metrics rolled up across child spans
-- **Unit (`unit`)**: Metrics specific to the individual span
+This dual tracking allows you to see both the cost of individual operations and the total cost of complex workflows.
 
-#### Available Metrics
+#### Metric Categories
+
+##### Costs
+
+Tracks LLM API costs in USD with the following breakdown:
+
+**Cumulative (this span + children):**
+- **`ag.metrics.costs.cumulative.total`**: Total cost across all LLM calls in this span and its children
+- **`ag.metrics.costs.cumulative.prompt`**: Cost attributed to input tokens
+- **`ag.metrics.costs.cumulative.completion`**: Cost attributed to output/completion tokens
+
+**Incremental (this span only):**
+- **`ag.metrics.costs.incremental.total`**: Cost for this span's operations only
+- **`ag.metrics.costs.incremental.prompt`**: Prompt cost for this span only
+- **`ag.metrics.costs.incremental.completion`**: Completion cost for this span only
+
+:::info
+Cost calculation uses the latest pricing for each model provider. Costs are automatically calculated when using standard LLM integrations. Cumulative metrics are automatically calculated by the backend by aggregating incremental values.
+:::
+
+##### Tokens
+
+Tracks token usage at both aggregation levels:
+
+**Cumulative:**
+- **`ag.metrics.tokens.cumulative.total`**: Total tokens across all operations
+- **`ag.metrics.tokens.cumulative.prompt`**: Input tokens across all operations
+- **`ag.metrics.tokens.cumulative.completion`**: Output tokens across all operations
+
+**Incremental:**
+- **`ag.metrics.tokens.incremental.total`**: Tokens for this span only
+- **`ag.metrics.tokens.incremental.prompt`**: Input tokens for this span only
+- **`ag.metrics.tokens.incremental.completion`**: Output tokens for this span only
+
+##### Duration
+
+Tracks execution time in milliseconds:
+
+- **`ag.metrics.duration.cumulative`**: Total execution time including all child spans
 
-- **Costs**: Total cost in USD for LLM API calls
-- **Tokens**: Token usage breakdown (total, prompt, completion)
-- **Duration**: Execution time in milliseconds
 
 ## Additional Agenta Attributes
 
@@ -213,8 +256,7 @@ The `ag.metrics` namespace tracks performance and cost metrics:
 
 The `ag.type` namespace contains type information about the span:
 
-`ag.type.node`  can be `workflow`, `task`, `tool`, `embedding`, `query`, `completion`, `chat`, `rerank`.
-
+`ag.type.node` can be `workflow`, `task`, `tool`, `embedding`, `query`, `completion`, `chat`, `rerank`.
 
 ### ag.tags
 
@@ -242,10 +284,15 @@ When using auto-instrumentation libraries, most attributes are saved twice - onc
 In addition to Agenta-specific conventions, traces include standard OpenTelemetry attributes:
 
 - **Links**: Relationships between spans
-- **Events**: Timestamped events within spans  
+- **Events**: Timestamped events within spans
 - **Version**: OpenTelemetry version information
 - **Status Code**: Span completion status
 - **Start Time**: Span initiation timestamp
 - **Span Name**: Human-readable span identifier
 - **Span Kind**: Type of span (server, client, internal, etc.)
 
+## Next steps
+
+- Learn about [distributed tracing](/observability/trace-with-opentelemetry/distributed-tracing)
+- Explore [Python SDK tracing](/observability/trace-with-python-sdk/setup-tracing) for easier instrumentation
+- See [integration guides](/observability/integrations/openai) for specific frameworks
diff --git a/docs/docs/observability/trace-with-opentelemetry/04-distributed-tracing.mdx b/docs/docs/observability/trace-with-opentelemetry/04-distributed-tracing.mdx
new file mode 100644
index 0000000000..57a3a99021
--- /dev/null
+++ b/docs/docs/observability/trace-with-opentelemetry/04-distributed-tracing.mdx
@@ -0,0 +1,85 @@
+---
+title: "Distributed Tracing"
+sidebar_label: "Distributed Tracing"
+description: "Learn how to implement distributed tracing across services with OpenTelemetry and Agenta"
+sidebar_position: 4
+---
+
+## OpenTelemetry Tracing without Agenta SDK
+
+
+If you're working with systems that don't use the Agenta SDK, you can still integrate with Agenta's tracing infrastructure using standard OpenTelemetry.
+
+### 1. Setup Requirements
+
+Install dependencies:
+
+```bash
+pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
+```
+
+### 2. Configure Environment Variables
+
+```bash
+# OTEL_PROPAGATORS = unset or "tracecontext,baggage"
+# OTEL_EXPORTER_OTLP_COMPRESSION = unset or "gzip"
+# OTEL_EXPORTER_OTLP_ENDPOINT = "https://cloud.agenta.ai/api/otlp"
+# OTEL_EXPORTER_OTLP_HEADERS = "authorization=ApiKey xxx"
+# OTEL_EXPORTER_OTLP_TRACES_ENDPOINT = "https://cloud.agenta.ai/api/otlp/v1/traces"
+# OTEL_EXPORTER_OTLP_TRACES_HEADERS = "authorization=ApiKey xxx"
+```
+
+### 3. Setup in Code
+
+```python
+from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
+from opentelemetry.baggage.propagation import W3CBaggagePropagator
+from opentelemetry.sdk.trace import TracerProvider, Span
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter, Compression
+
+# Configuration
+endpoint = "https://cloud.agenta.ai/api/otlp/v1/traces"
+compression = Compression.Gzip
+headers = {
+    "traceparent": "00-xxx-xxx-01",
+    "baggage": "ag.refs.application.id=xxx",
+    "authorization": "ApiKey xxx",
+}
+
+# Set up provider, processor, and tracer
+provider = TracerProvider()
+
+processor = BatchSpanProcessor(
+    OTLPSpanExporter(
+        endpoint=endpoint,
+        headers={"authorization": headers["authorization"]},
+        compression=compression,
+    )
+)
+
+provider.add_span_processor(processor)
+
+tracer = provider.get_tracer("agenta.tracer")
+
+# Extract incoming trace context
+carrier = {"traceparent": headers["traceparent"]}
+context = TraceContextTextMapPropagator().extract(carrier=carrier, context=None)
+
+carrier = {"baggage": headers["baggage"]}
+context = W3CBaggagePropagator().extract(carrier=carrier, context=context)
+
+# Create and use spans
+with tracer.start_as_current_span(name="agenta", context=context) as span:
+    span: Span
+
+    print(hex(span.get_span_context().trace_id))
+    print(hex(span.get_span_context().span_id))
+    print(span.name)
+```
+
+## Next steps
+
+- Learn about [semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions)
+- Explore [collector configuration](/observability/trace-with-opentelemetry/otel-collector-configuration)
+- See [Python SDK distributed tracing](/observability/trace-with-python-sdk/distributed-tracing) for Agenta SDK approach
diff --git a/docs/docs/observability/trace-with-opentelemetry/05-otel-collector-configuration.mdx b/docs/docs/observability/trace-with-opentelemetry/05-otel-collector-configuration.mdx
new file mode 100644
index 0000000000..45c6571dfc
--- /dev/null
+++ b/docs/docs/observability/trace-with-opentelemetry/05-otel-collector-configuration.mdx
@@ -0,0 +1,69 @@
+---
+title: "OpenTelemetry Collector Configuration"
+sidebar_label: "OTEL Collector Configuration"
+description: "Configure the OpenTelemetry Collector to forward traces to Agenta"
+sidebar_position: 5
+---
+
+The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector) is a vendor-agnostic service that receives, processes, and exports telemetry data. You can use it to collect traces from multiple sources and forward them to Agenta.
+
+## Configuration
+
+Here's a configuration file (`otel-collector-config.yml`) that receives traces via OTLP/HTTP and forwards them to Agenta:
+
+```yaml
+receivers:
+  otlp:
+    protocols:
+      http:
+        endpoint: 0.0.0.0:4318
+
+processors:
+  batch:
+    timeout: 5s
+    send_batch_size: 512
+
+exporters:
+  otlphttp/agenta:
+    endpoint: "https://cloud.agenta.ai/api/otlp/v1/traces"
+    headers:
+      Authorization: "ApiKey ${AGENTA_API_KEY}"
+
+service:
+  pipelines:
+    traces:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [otlphttp/agenta]
+```
+
+### Configuration Details
+
+**Receivers**: The collector receives traces via OTLP/HTTP on port `4318`.
+
+**Processors**: The `batch` processor collects spans and sends them in batches:
+- `timeout`: Maximum time to wait before sending a batch (default: 5s)
+- `send_batch_size`: Number of spans to collect before sending (default: 512)
+
+**Exporters**: The `otlphttp/agenta` exporter forwards traces to Agenta using HTTP/protobuf:
+- Endpoint: `https://cloud.agenta.ai/api/otlp/v1/traces`
+- Authentication: Uses `ApiKey` authentication header
+
+:::warning
+Agenta only supports **HTTP/protobuf** for the OpenTelemetry endpoint. gRPC is not supported.
+:::
+
+For self-hosted Agenta deployments, replace the endpoint in the exporter configuration:
+
+```yaml
+exporters:
+  otlphttp/agenta:
+    endpoint: "http://your-agenta-instance:port/api/otlp/v1/traces"
+    headers:
+      Authorization: "ApiKey ${AGENTA_API_KEY}"
+```
+
+## Next Steps
+
+- Learn about [semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions) for proper trace formatting
+- Explore [distributed tracing](/observability/trace-with-opentelemetry/distributed-tracing) without the Agenta SDK
diff --git a/docs/docs/observability/trace-with-opentelemetry/_category_.json b/docs/docs/observability/trace-with-opentelemetry/_category_.json
new file mode 100644
index 0000000000..aef5391a04
--- /dev/null
+++ b/docs/docs/observability/trace-with-opentelemetry/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Trace with OpenTelemetry",
+  "position": 6,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/observability/trace-with-python-sdk/01-setup-tracing.mdx b/docs/docs/observability/trace-with-python-sdk/01-setup-tracing.mdx
new file mode 100644
index 0000000000..bd063ec70b
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/01-setup-tracing.mdx
@@ -0,0 +1,63 @@
+---
+title: "Setup Tracing"
+sidebar_label: "Setup Tracing"
+description: "Learn how to set up tracing with the Agenta Python SDK for LLM observability"
+sidebar_position: 1
+---
+
+```mdx-code-block
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+```
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+The Agenta Observability SDK integrates with the OpenTelemetry SDK. It wraps OpenTelemetry and provides a user-friendly way to instrument your LLM applications.
+
+The SDK provides:
+
+- Automatic OpenTelemetry setup
+- Easy function instrumentation using [decorators](/observability/trace-with-python-sdk/instrument-functions)
+- [Reference prompt versions](/observability/trace-with-python-sdk/reference-prompt-versions) to link traces to applications, variants, and environments
+- [Add attributes](/observability/trace-with-python-sdk/adding-metadata) to spans for additional metadata
+
+The SDK works with auto instrumentation. You should use auto instrumentation together with this SDK.
+
+## Installation
+
+**1. Install the Agenta SDK**
+
+```bash
+pip install -U agenta
+```
+
+## Configuration
+
+**2. Set environment variables**
+
+1. Visit the [Agenta API Keys page](https://cloud.agenta.ai/settings?tab=apiKeys).
+2. Click on **Create New API Key** and follow the prompts.
+
+```python
+import os
+
+os.environ["AGENTA_API_KEY"] = "YOUR_AGENTA_API_KEY"
+os.environ["AGENTA_HOST"] = "https://cloud.agenta.ai"
+```
+
+**3. Initialize the SDK**
+
+```python
+import agenta as ag
+
+ag.init()
+```
+
+That's it! You're now ready to instrument your functions and start capturing traces.
+
+## Next steps
+
+- Learn how to [instrument your functions](/observability/trace-with-python-sdk/instrument-functions)
+- Link traces to [prompt versions](/observability/trace-with-python-sdk/reference-prompt-versions)
+- Understand how to [redact sensitive data](/observability/trace-with-python-sdk/redact-sensitive-data)
diff --git a/docs/docs/observability/trace-with-python-sdk/02-instrument-functions.mdx b/docs/docs/observability/trace-with-python-sdk/02-instrument-functions.mdx
new file mode 100644
index 0000000000..86249b2b4b
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/02-instrument-functions.mdx
@@ -0,0 +1,55 @@
+---
+title: "Instrument Your Functions"
+sidebar_label: "Instrument Functions"
+description: "Learn how to instrument functions for LLM observability and tracing using the Agenta Python SDK decorator"
+sidebar_position: 2
+---
+
+```mdx-code-block
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+```
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+To instrument a function, add the `@ag.instrument()` decorator. This automatically captures all input and output data.
+
+The decorator has a `spankind` argument to categorize each span in the UI. Available types are:
+
+`agent`, `chain`, `workflow`, `tool`, `embedding`, `query`, `completion`, `chat`, `rerank`
+
+:::info
+The default span kind is `workflow`.
+:::
+
+:::caution
+The instrument decorator should be the top-most decorator on a function (i.e. the last decorator before the function call).
+:::
+
+```python
+import agenta as ag
+
+@ag.instrument(spankind="task")
+def my_llm_call(country: str):
+    prompt = f"What is the capital of {country}"
+    response = client.chat.completions.create(
+        model='gpt-4',
+        messages=[
+            {'role': 'user', 'content': prompt},
+        ],
+    )
+    return response.choices[0].text
+
+@ag.instrument(spankind="workflow")
+def generate(country: str):
+    return my_llm_call(country)
+```
+
+Agenta automatically determines the parent span based on the function call and nests the spans accordingly.
+
+## Next steps
+
+- Learn how to [add metadata and internals](/observability/trace-with-python-sdk/adding-metadata)
+- Link traces to [prompt versions](/observability/trace-with-python-sdk/reference-prompt-versions)
+- Understand how to [redact sensitive data](/observability/trace-with-python-sdk/redact-sensitive-data)
diff --git a/docs/docs/observability/trace-with-python-sdk/03-adding-metadata.mdx b/docs/docs/observability/trace-with-python-sdk/03-adding-metadata.mdx
new file mode 100644
index 0000000000..f0cbe08edc
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/03-adding-metadata.mdx
@@ -0,0 +1,75 @@
+---
+title: "Adding Metadata"
+sidebar_label: "Adding Metadata"
+description: "Learn how to add metadata and internals to spans for LLM observability and tracing"
+sidebar_position: 3
+---
+
+```mdx-code-block
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+```
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+You can add additional information to spans using metadata and internals. Both use semantic conventions under the `ag` namespace. Metadata is saved under `ag.meta`. Internals are saved under `ag.data.internals`.
+
+See the [semantic conventions guide](/observability/trace-with-opentelemetry/semantic-conventions) for more details on how attributes are organized.
+
+## Adding metadata
+
+Use `ag.tracing.store_meta()` to add metadata to a span. This function accesses the active span from the context and adds the key-value pairs to the metadata.
+
+```python
+@ag.instrument(spankind="task")
+def compile_prompt(country: str):
+    prompt = f"What is the capital of {country}"
+
+    # highlight-next-line
+    ag.tracing.store_meta({"prompt_template": prompt})
+
+    formatted_prompt = prompt.format(country=country)
+    return formatted_prompt
+```
+
+## Storing internals
+
+Use `ag.tracing.store_internals()` to store internals in a span:
+
+```python
+@ag.instrument(spankind="workflow")
+def rag_workflow(query: str):
+
+    context = retrieve_context(query)
+
+    # highlight-start
+    ag.tracing.store_internals({"context": context})
+    # highlight-end
+
+    prompt = f"Answer the following question {query} based on the context: {context}"
+
+    completion = client.chat.completions.create(
+        model='gpt-4',
+        messages=[
+            {'role': 'user', 'content': prompt},
+        ],
+    )
+    return completion.choices[0].message.content
+```
+
+## Differences between metadata and internals
+
+Both metadata and internals can be used for evaluation and filtering. The main differences are:
+
+1. Internals are searchable using plain text queries because they are saved under `ag.data`.
+2. Internals are shown in the overview tab of the observability drawer together with inputs and outputs, making them easy to see.
+
+As a rule of thumb, if your context is short, put important information that helps understand the span into internals.
+
+## Next steps
+
+- Link traces to [prompt versions](/observability/trace-with-python-sdk/reference-prompt-versions)
+- Understand how to [redact sensitive data](/observability/trace-with-python-sdk/redact-sensitive-data)
+- Explore [distributed tracing](/observability/trace-with-python-sdk/distributed-tracing) across services
+
diff --git a/docs/docs/observability/trace-with-python-sdk/04-reference-prompt-versions.mdx b/docs/docs/observability/trace-with-python-sdk/04-reference-prompt-versions.mdx
new file mode 100644
index 0000000000..19bbbb9d1a
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/04-reference-prompt-versions.mdx
@@ -0,0 +1,79 @@
+---
+title: "Reference Prompt Versions"
+sidebar_label: "Reference Prompt Versions"
+description: "Learn how to link traces to specific applications, variants, and environments in Agenta"
+sidebar_position: 4
+---
+
+```mdx-code-block
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+```
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+You can link a span to an application, variant, and environment by calling `ag.tracing.store_refs()`.
+
+Applications, variants, and environments can be referenced by their slugs, versions, and commit IDs (for specific versions).
+
+## Basic usage
+
+You can link a span to an application and variant like this:
+
+```python
+import agenta as ag
+
+@ag.instrument(spankind="workflow")
+def generate(country: str):
+    prompt = f"What is the capital of {country}"
+
+    formatted_prompt = prompt.format(country=country)
+
+    completion = client.chat.completions.create(
+        model='gpt-4',
+        messages=[
+            {'role': 'user', 'content': formatted_prompt},
+        ],
+    )
+
+    # highlight-start
+    ag.tracing.store_refs(
+        {
+            "application.slug": "capital-app",
+            "environment.slug": "production",
+        }
+    )
+    # highlight-end
+    return completion.choices[0].message.content
+```
+
+## Available reference keys
+
+`ag.tracing.store_refs()` takes a dict with keys from:
+
+- `application.slug`
+- `application.id`
+- `variant.slug`
+- `variant.id`
+- `variant.version`
+- `environment.slug`
+- `environment.id`
+- `environment.version`
+
+The values should be the slug, id, or version of the application, variant, and environment respectively.
+
+## Why link traces?
+
+Linking traces to applications and variants allows you to:
+
+- **Filter traces** by application, variant, or environment in the UI
+- **Compare performance** across different variants
+- **Track production behavior** by environment
+- **Create test sets** from production traces with proper context
+
+## Next steps
+
+- Learn how to [redact sensitive data](/observability/trace-with-python-sdk/redact-sensitive-data)
+- Understand how to [track costs](/observability/trace-with-python-sdk/track-costs)
+- Explore [distributed tracing](/observability/trace-with-python-sdk/distributed-tracing) across services
diff --git a/docs/docs/observability/trace-with-python-sdk/05-redact-sensitive-data.mdx b/docs/docs/observability/trace-with-python-sdk/05-redact-sensitive-data.mdx
new file mode 100644
index 0000000000..7b71f46203
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/05-redact-sensitive-data.mdx
@@ -0,0 +1,139 @@
+---
+title: "Redact Sensitive Data"
+sidebar_label: "Redact Sensitive Data"
+description: "Learn how to exclude sensitive data from traces using the Agenta Python SDK"
+sidebar_position: 5
+---
+
+```mdx-code-block
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+```
+
+<GoogleColabButton notebookPath="examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+In some cases, you may want to exclude parts of the inputs or outputs due to privacy concerns or because the data is too large to be stored in the span.
+
+## Simple redaction
+
+You can do this by setting the `ignore_inputs` and/or `ignore_outputs` arguments to `True` in the instrument decorator.
+
+```python
+import agenta as ag
+
+@ag.instrument(
+    spankind="workflow",
+    ignore_inputs=True,
+    ignore_outputs=True
+)
+def rag_workflow(query: str):
+    ...
+```
+
+## Selective redaction
+
+If you want more control, you can specify which parts of the inputs and outputs to exclude:
+
+```python
+@ag.instrument(
+    spankind="workflow",
+    ignore_inputs=["user_id"],
+    ignore_outputs=["pii"],
+)
+def rag_workflow(query: str, user_id: str):
+    ...
+    return {
+        "result": ...,
+        "pii": ...
+    }
+```
+
+## Custom redaction callback
+
+For even finer control, you can use a custom `redact()` callback, along with instructions in the case of errors.
+
+```python
+def my_redact(name, field, data):
+    if name == "rag_workflow":
+        if field == "inputs":
+            del data["user_id"]
+        if field == "outputs":
+            del data["pii"]
+
+    return data
+
+
+@ag.instrument(
+    spankind="workflow",
+    redact=my_redact,
+    redact_on_error=False,
+)
+def rag_workflow(query: str, user_id: str):
+    ...
+    return {
+        "result": ...,
+        "pii": ...
+    }
+```
+
+## Global redaction rules
+
+Finally, if you want to set up global rules for redaction, you can provide a global `redact()` callback that applies everywhere.
+
+```python
+from typing import Dict, Any
+
+def global_redact(
+    name: str,
+    field: str,
+    data: Dict[str, Any]
+):
+    if "pii" in data:
+        del data["pii"]
+
+    return data
+
+
+ag.init(
+    redact=global_redact,
+    redact_on_error=True,
+)
+
+def local_redact(
+    name: str,
+    field: str,
+    data: Dict[str, Any]
+):
+    if name == "rag_workflow":
+        if field == "inputs":
+            del data["user_id"]
+
+    return data
+
+
+@ag.instrument(
+    spankind="workflow",
+    redact=local_redact,
+    redact_on_error=False,
+)
+def rag_workflow(query: str, user_id: str):
+    ...
+    return {
+        "result": ...,
+        "pii": ...
+    }
+```
+
+## Best practices
+
+- **Use selective redaction** rather than blocking all inputs/outputs when possible
+- **Test your redaction rules** to ensure they work as expected
+- **Consider global rules** for organization-wide PII policies
+- **Document redacted fields** so team members know what data is missing
+
+## Next steps
+
+- Understand how to [track costs](/observability/trace-with-python-sdk/track-costs)
+- Explore [distributed tracing](/observability/trace-with-python-sdk/distributed-tracing) across services
+- Learn how to [annotate traces](/observability/trace-with-python-sdk/annotate-traces) programmatically
diff --git a/docs/docs/observability/trace-with-python-sdk/06-track-costs.mdx b/docs/docs/observability/trace-with-python-sdk/06-track-costs.mdx
new file mode 100644
index 0000000000..9ca6bc1b01
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/06-track-costs.mdx
@@ -0,0 +1,167 @@
+---
+title: "Track Costs"
+sidebar_label: "Track Costs"
+description: "Learn how Agenta automatically tracks and aggregates LLM costs, token usage, and performance metrics across your traces"
+sidebar_position: 6
+---
+
+Agenta automatically tracks costs, token usage, and performance metrics for your LLM applications. This data is captured in the `ag.metrics` namespace of each span.
+
+## Overview
+
+When you instrument your application with Agenta, we automatically collect cost and performance metrics for spans of type chat.
+
+Costs are calculated using the latest pricing for each model provider. Token usage is tracked separately for input (prompt) and output (completion) tokens. Execution time is measured in milliseconds for each operation.
+
+## Metrics Structure
+
+### Cost Metrics
+
+Costs are tracked in USD with the following breakdown:
+
+```json
+{
+  "metrics": {
+    "costs": {
+      "cumulative": {
+        "total": 0.0070902,
+        "prompt": 0.00355,
+        "completion": 0.00354
+      }
+    }
+  }
+}
+```
+
+The `total` field shows the total cost across all LLM calls in this span and its children. The `prompt` field shows the cost attributed to input tokens. The `completion` field shows the cost for output tokens.
+
+### Token Usage
+
+Token consumption is tracked with separate counts for input and output:
+
+```json
+{
+  "metrics": {
+    "tokens": {
+      "cumulative": {
+        "total": 992,
+        "prompt": 175,
+        "completion": 817
+      }
+    }
+  }
+}
+```
+
+The `total` field shows all tokens used (prompt plus completion). The `prompt` field shows input tokens consumed. The `completion` field shows output tokens generated.
+
+### Duration
+
+Execution time is measured in milliseconds:
+
+```json
+{
+  "metrics": {
+    "duration": {
+      "cumulative": 19889.343
+    }
+  }
+}
+```
+
+:::info
+Agenta tracks metrics at two levels. **Incremental metrics** represent costs for a single span only. **Cumulative metrics** aggregate values from the current span plus all child spans.
+:::
+
+## How to Track Costs
+
+### With Auto-Instrumentation
+
+When you use auto-instrumentation from [compatible libraries](/observability/concepts#auto-instrumentation-compatibility), prompts and tokens are automatically extracted and formatted. Costs are calculated when possible.
+
+```python
+import agenta as ag
+from openinference.instrumentation.openai import OpenAIInstrumentor
+
+ag.init()
+OpenAIInstrumentor().instrument()
+
+@ag.instrument()
+def generate_response(prompt: str):
+    response = client.chat.completions.create(
+        model="gpt-4",
+        messages=[{"role": "user", "content": prompt}]
+    )
+    return response.choices[0].message.content
+```
+
+### With Manual Instrumentation
+
+You can manually add cost metrics to spans using incremental metrics:
+
+```python
+import agenta as ag
+
+@ag.instrument()
+def custom_llm_call(prompt: str):
+    # Your custom LLM call logic
+    response = my_custom_llm.generate(prompt)
+
+    # Manually track incremental metrics (for this span only)
+    ag.tracing.store_metrics({
+        "costs.incremental.total": 0.0025,
+        "costs.incremental.prompt": 0.0015,
+        "costs.incremental.completion": 0.001,
+        "tokens.incremental.total": 150,
+        "tokens.incremental.prompt": 100,
+        "tokens.incremental.completion": 50
+    })
+
+    # Cumulative metrics are automatically calculated by the backend
+
+    return response
+```
+
+## Automatic Cost Calculation
+
+Agenta calculates costs automatically for major LLM providers using the LiteLLM library. When the cost is not provided in the span and the span type is chat, we try to infer the cost from the number of tokens.
+
+### Custom Pricing
+
+For custom models or providers, you can manually set costs using incremental metrics:
+
+```python
+import agenta as ag
+
+@ag.instrument()
+def custom_model_call(prompt: str):
+    response = my_model.generate(prompt)
+
+    # Calculate custom cost
+    prompt_tokens = len(prompt.split())
+    completion_tokens = len(response.split())
+
+    # Custom pricing
+    cost_per_prompt_token = 0.00001
+    cost_per_completion_token = 0.00002
+
+    prompt_cost = prompt_tokens * cost_per_prompt_token
+    completion_cost = completion_tokens * cost_per_completion_token
+    total_cost = prompt_cost + completion_cost
+
+    # Set incremental metrics
+    ag.tracing.store_metrics({
+        "costs.incremental.total": total_cost,
+        "costs.incremental.prompt": prompt_cost,
+        "costs.incremental.completion": completion_cost,
+        "tokens.incremental.total": prompt_tokens + completion_tokens,
+        "tokens.incremental.prompt": prompt_tokens,
+        "tokens.incremental.completion": completion_tokens
+    })
+
+    return response
+```
+
+## Next steps
+
+Learn about [adding metadata](/observability/trace-with-python-sdk/adding-metadata) to enrich your traces.
diff --git a/docs/docs/observability/trace-with-python-sdk/07-distributed-tracing.mdx b/docs/docs/observability/trace-with-python-sdk/07-distributed-tracing.mdx
new file mode 100644
index 0000000000..8423b80b66
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/07-distributed-tracing.mdx
@@ -0,0 +1,67 @@
+---
+title: "Distributed Tracing"
+sidebar_label: "Distributed Tracing"
+description: "Learn how to implement distributed tracing across services with the Agenta SDK"
+sidebar_position: 7
+---
+
+When using the Agenta SDK, distributed tracing is handled automatically with the provided SDK functions. This guide shows you how to propagate trace context across services and extract it when receiving requests.
+
+## Using OpenTelemetry with Agenta SDK
+
+Agenta supports distributed tracing out of the box when using the provided SDK functions:
+
+### 1. Sending Requests (Propagation)
+
+When making requests to other services or sub-systems, use `agenta.tracing.inject()` to inject necessary headers:
+
+```python
+import agenta as ag
+
+method = "POST"
+url = "https://example-service/api"
+params = {}
+headers = agenta.tracing.inject()  # automatically injects 'Authorization', 'Traceparent', 'Baggage'
+body = {"key": "value"}
+
+response = requests.request(
+    method=method,
+    url=url,
+    params=params,
+    headers=headers,
+    json=body,
+)
+```
+
+The `agenta.tracing.inject()` function returns headers containing:
+
+- `Authorization`: Authentication information
+- `Traceparent`: Identifies the current trace and span
+- `Baggage`: Contains application-specific context
+
+These headers can be modified before sending them as part of the request if needed.
+
+### 2. Receiving Requests (Extraction)
+
+Agenta simplifies receiving and handling incoming trace contexts:
+
+- If you're using `ag.route()` and `ag.instrument()`, extraction is automatic.
+- For manual extraction, use `agenta.tracing.extract()`:
+
+```python
+traceparent, baggage = agenta.tracing.extract()  # includes 'Traceparent', 'Baggage'
+
+# Use traceparent and baggage to set up your OpenTelemetry context
+# (Implementation depends on your specific use case)
+```
+
+:::note
+`extract()` does not provide `Authorization` because there are many authentication methods (apikey, bearer, secret, access tokens), each requiring different handling. The middlewares and decorators in the Agenta SDK handle this automatically when you use `ag.route()` and `ag.instrument()`.
+:::
+
+## Next Steps
+
+- Learn about [tracing without the Agenta SDK](/observability/trace-with-opentelemetry/distributed-tracing) for raw OpenTelemetry setup
+- Explore [semantic conventions](/observability/trace-with-opentelemetry/semantic-conventions) for better trace formatting
+- See [instrumenting functions](/observability/trace-with-python-sdk/instrument-functions) for automatic instrumentation
+
diff --git a/docs/docs/evaluation/07-annotate-api.mdx b/docs/docs/observability/trace-with-python-sdk/08-annotate-traces.mdx
similarity index 95%
rename from docs/docs/evaluation/07-annotate-api.mdx
rename to docs/docs/observability/trace-with-python-sdk/08-annotate-traces.mdx
index 289c6340e1..4240c70168 100644
--- a/docs/docs/evaluation/07-annotate-api.mdx
+++ b/docs/docs/observability/trace-with-python-sdk/08-annotate-traces.mdx
@@ -1,21 +1,23 @@
 ---
-title: Annotate Traces from API
-sidebar_label: "Annotate Traces from API"
-description: "Learn how to add annotations to traces using the Agenta API"
+title: "Annotate Traces"
+sidebar_label: "Annotate Traces"
+description: "Learn how to programmatically add annotations to traces using the Agenta API - collect feedback, scores, and custom metrics"
+sidebar_position: 8
 ---
 
 ```mdx-code-block
 import Image from "@theme/IdealImage";
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
 ```
 
-:::info
-Annotations are currently in preview. The interface is subject to change.
-:::
-
 Annotations in Agenta let you enrich the traces created by your LLM applications. You can add scores, comments, expected answers and other metrics to help evaluate your application's performance.
 
+<GoogleColabButton notebookPath="examples/jupyter/observability/annotate-traces-tutorial.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
 <Image
   style={{ display: "block", margin: "10px auto", marginBottom: "20px" }}
   img={require("/images/evaluation/viewing-annotations.png")}
@@ -98,7 +100,7 @@ You can use numbers, text, booleans, or categorical values.
 
 Each annotation links to a specific span in your trace using a `trace_id` and `span_id`. These IDs follow the OpenTelemetry format.
 
-:::warning  
+:::warning
 Agenta doesn't check if the trace and span IDs exist. Make sure they're valid before sending.
 :::
 
@@ -156,7 +158,7 @@ annotation_data = {
 
 # Make the API request
 response = requests.post(
-    f"{base_url}/api/preview/annotations",
+    f"{base_url}/api/preview/annotations/",
     headers=headers,
     json=annotation_data
 )
@@ -171,12 +173,12 @@ else:
 ```
 
 </TabItem>
-<TabItem value="javascript" label="JavaScript">
+<TabItem value="javascript" label="JS/TS">
 
 ```javascript
 async function createAnnotation() {
   const baseUrl = 'https://cloud.agenta.ai';
-  
+
   const headers = {
     'Content-Type': 'application/json',
     'Authorization': 'ApiKey YOUR_API_KEY'
@@ -206,7 +208,7 @@ async function createAnnotation() {
   };
 
   try {
-    const response = await fetch(`${baseUrl}/api/preview/annotations`, {
+    const response = await fetch(`${baseUrl}/api/preview/annotations/`, {
       method: 'POST',
       headers: headers,
       body: JSON.stringify(annotationData)
@@ -353,7 +355,7 @@ response = requests.post(
 ```
 
 </TabItem>
-<TabItem value="javascript" label="JavaScript">
+<TabItem value="javascript" label="JS/TS">
 
 You can query annotations in several ways:
 
@@ -362,7 +364,7 @@ You can query annotations in several ways:
 ```javascript
 async function queryAnnotation() {
   const baseUrl = 'https://cloud.agenta.ai';
-  
+
   const headers = {
     'Content-Type': 'application/json',
     'Authorization': 'ApiKey YOUR_API_KEY'
@@ -404,7 +406,7 @@ queryAnnotation();
 ```javascript
 async function queryAnnotationsForInvocation() {
   const baseUrl = 'https://cloud.agenta.ai';
-  
+
   const headers = {
     'Content-Type': 'application/json',
     'Authorization': 'ApiKey YOUR_API_KEY'
@@ -481,12 +483,12 @@ else:
 ```
 
 </TabItem>
-<TabItem value="javascript" label="JavaScript">
+<TabItem value="javascript" label="JS/TS">
 
 ```javascript
 async function deleteAnnotation() {
   const baseUrl = 'https://cloud.agenta.ai';
-  
+
   const headers = {
     'Content-Type': 'application/json',
     'Authorization': 'ApiKey YOUR_API_KEY'
diff --git a/docs/docs/observability/trace-with-python-sdk/_06-sample-traces.mdx b/docs/docs/observability/trace-with-python-sdk/_06-sample-traces.mdx
new file mode 100644
index 0000000000..45982d3e81
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/_06-sample-traces.mdx
@@ -0,0 +1,23 @@
+---
+title: "Sample Traces"
+sidebar_label: "Sample Traces"
+description: "Learn how to implement sampling strategies to reduce trace volume in Agenta"
+sidebar_position: 7
+---
+
+<!-- TODO: Add content for sampling strategies -->
+
+## Overview
+
+Sampling allows you to control which traces are sent to Agenta, helping reduce costs and storage when dealing with high-volume applications.
+
+## Why sample?
+
+- Reduce storage costs
+- Minimize performance impact
+- Focus on interesting traces (errors, slow requests)
+
+## Next steps
+
+- Learn about [batching traces](/observability/trace-with-python-sdk/batch-traces)
+- Explore [cost tracking](/observability/trace-with-python-sdk/track-costs)
diff --git a/docs/docs/observability/trace-with-python-sdk/_07-batch-traces.mdx b/docs/docs/observability/trace-with-python-sdk/_07-batch-traces.mdx
new file mode 100644
index 0000000000..395f1835ef
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/_07-batch-traces.mdx
@@ -0,0 +1,17 @@
+---
+title: "Batch Traces"
+sidebar_label: "Batch Traces"
+description: "Learn how to batch traces for efficient transmission to Agenta"
+sidebar_position: 8
+---
+
+<!-- TODO: Add content for batch traces -->
+
+## Overview
+
+Learn how to batch traces for efficient transmission to Agenta
+
+## Next steps
+
+- Explore other [Python SDK features](/observability/trace-with-python-sdk/setup-tracing)
+- Learn about [using the UI](/observability/using-the-ui/filtering-traces)
diff --git a/docs/docs/observability/trace-with-python-sdk/_09-track-chat-sessions.mdx b/docs/docs/observability/trace-with-python-sdk/_09-track-chat-sessions.mdx
new file mode 100644
index 0000000000..025e2e5df6
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/_09-track-chat-sessions.mdx
@@ -0,0 +1,17 @@
+---
+title: "Track Chat Sessions"
+sidebar_label: "Track Chat Sessions"
+description: "Learn how to track multi-turn conversations and chat sessions"
+sidebar_position: 10
+---
+
+<!-- TODO: Add content for track chat sessions -->
+
+## Overview
+
+Learn how to track multi-turn conversations and chat sessions
+
+## Next steps
+
+- Explore other [Python SDK features](/observability/trace-with-python-sdk/setup-tracing)
+- Learn about [using the UI](/observability/using-the-ui/filtering-traces)
diff --git a/docs/docs/observability/trace-with-python-sdk/_10-track-users.mdx b/docs/docs/observability/trace-with-python-sdk/_10-track-users.mdx
new file mode 100644
index 0000000000..6b0d4c476a
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/_10-track-users.mdx
@@ -0,0 +1,17 @@
+---
+title: "Track Users"
+sidebar_label: "Track Users"
+description: "Learn how to associate traces with specific users for better analytics"
+sidebar_position: 11
+---
+
+<!-- TODO: Add content for track users -->
+
+## Overview
+
+Learn how to associate traces with specific users for better analytics
+
+## Next steps
+
+- Explore other [Python SDK features](/observability/trace-with-python-sdk/setup-tracing)
+- Learn about [using the UI](/observability/using-the-ui/filtering-traces)
diff --git a/docs/docs/observability/trace-with-python-sdk/_category_.json b/docs/docs/observability/trace-with-python-sdk/_category_.json
new file mode 100644
index 0000000000..0f24e70204
--- /dev/null
+++ b/docs/docs/observability/trace-with-python-sdk/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Trace with Python SDK",
+  "position": 5,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/docs/prompt-engineering/01-quick-start.mdx b/docs/docs/prompt-engineering/01-quick-start.mdx
index e724bc94bb..08b34dcca0 100644
--- a/docs/docs/prompt-engineering/01-quick-start.mdx
+++ b/docs/docs/prompt-engineering/01-quick-start.mdx
@@ -7,7 +7,7 @@ sidebar_position: 1
 import Image from "@theme/IdealImage";
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
-import DocCard from '@theme/DocCard';
+import CustomDocCard from '@site/src/components/CustomDocCard';
 ```
 
 In this tutorial, we'll walk through three simple steps to get started with Agenta:
@@ -166,7 +166,7 @@ response = client.chat.completions.create(
 )
 ```
 </TabItem>
-<TabItem value="api" label="Using API (JavaScript)">
+<TabItem value="api" label="Using API (JS/TS)">
 
 ```javascript
 const fetchConfigs = async () => {
@@ -276,7 +276,7 @@ Model names follow LiteLLM naming conventions: `provider/model` (e.g., `cohere/c
 :::
 
 :::info
-For simpler observability and cost tracking, Agenta also offers an endpoint to directly call LLMs with your prompt configuration. Learn more in the [proxy LLM calls](/prompt-engineering/prompt-management/proxy-calls) section.
+For simpler observability and cost tracking, Agenta also offers an endpoint to directly call LLMs with your prompt configuration. Learn more in the [proxy LLM calls](/prompt-engineering/integrating-prompts/proxy-calls) section.
 :::
 
 ## Next Steps
@@ -287,24 +287,26 @@ To continue your journey with Agenta:
 
 <section className='row'>
 <article key="1" className="col col--6 margin-bottom--lg">
-<DocCard
+<CustomDocCard
     item={{
       type: "link",
-      href: "/evaluation/no-code-evaluation",
+      href: "/evaluation/evaluation-from-ui/quick-start",
       label: "Explore the Prompt Management SDK",
       description: "Learn advanced features of the prompt management SDK"
     }}
+    noIcon={true}
   />
 </article>
 
 <article key='2' className="col col--6 margin-bottom--lg">
-<DocCard
+<CustomDocCard
     item={{
       type: "link",
-      href: "/evaluation/sdk-evaluation",
+      href: "/tutorials/sdk/evaluate-with-SDK",
       label: "Explore the Playground",
       description: "Learn how to use the playground"
     }}
+    noIcon={true}
   />
 </article>
 </section>
diff --git a/docs/docs/prompt-engineering/02-concepts.mdx b/docs/docs/prompt-engineering/02-concepts.mdx
new file mode 100644
index 0000000000..33f69b2080
--- /dev/null
+++ b/docs/docs/prompt-engineering/02-concepts.mdx
@@ -0,0 +1,75 @@
+---
+title: "Prompt Management Concepts"
+sidebar_label: "Concepts"
+description: "Learn how to effectively manage, version, and deploy LLM prompts and configurations. Discover how prompt management helps teams collaborate, track changes, and maintain consistency across development and production environments"
+sidebar_position: 2
+---
+
+<!-- TODO: Improve this concepts page to better explain:
+- The relationship between applications, variants, environments, and versions
+- The Git-like workflow model (branches, commits, deployments)
+- When to create new variants vs. new versions
+- Best practices for organizing prompts and configurations
+- The difference between prompts and full application configurations
+- Consider adding more diagrams and examples
+-->
+
+```mdx-code-block
+import CustomDocCard from '@site/src/components/CustomDocCard';
+import clsx from 'clsx';
+import Image from "@theme/IdealImage";
+
+```
+
+
+## Why do I need a prompt management system?
+
+A prompt management system lets everyone on your team collaborate on prompts. This includes product owners, developers, and subject matter experts. The system organizes your prompts so that:
+- Product teams can change prompts without going through developers each time
+- You can version prompts and roll back to a previous version if needed
+- You can link LLM application spans to prompt versions
+- You can run evaluations on prompts and compare them to each other
+
+You can read more about the benefits of prompt management in our [blog post](https://agenta.ai/blog/the-definitive-guide-to-prompt-management-systems).
+
+
+:::info
+Agenta allows you to version not only prompts, but **any configuration**. For instance, for a RAG pipeline, you can version a configuration with the parameters `chunk_size` or `embedding_model`. You can read more about this in [custom workflows](/custom-workflows/overview).
+:::
+
+## Versioning in Agenta
+
+<Image
+  style={{ display: "block", margin: "32px 0" }}
+  img={require("/images/prompt_management/taxonomy-concepts.png")}
+  alt="Taxonomy of entities in Agenta"
+  loading="lazy"
+/>
+
+Agenta uses a Git-like structure for versioning. Instead of having one commit history for each prompt, you can create multiple branches called **variants**. Each variant has its own version history. You can then deploy specific versions to **environments** (development, staging, production).
+
+## Entities in Agenta
+
+### Applications
+
+Prompts are applications in Agenta. Each application has a different type that determines how it runs. You can create chat applications and completion applications from the UI using templates. You can also create custom applications from your own code using [custom workflows](/custom-workflows/overview). 
+
+### Variants
+Variants are similar to branches in **git**. Each variant is an independent branch with its own commit history. When you create a prompt from the UI, we create a `default` variant for you. Unlike git branches, you cannot merge variants together. Use variants to experiment with different approaches or configurations. 
+
+### Versions
+
+Versions are similar to commits in **git**. Each version is an immutable snapshot of a variant. When you make changes to a variant, you create a new version. Each version has a **commit id** that uniquely identifies it. Each version contains a **configuration** with prompts and other parameters.
+
+### Environments
+
+Environments are deployment targets for your prompts. Agenta provides three environments: **development**, **staging**, and **production**. Each environment points to a specific version from a variant. When you deploy a new version to an environment, the environment updates to point to that version. Environments track their deployment history, so you can roll back to any previously deployed version.
+
+
+## Best practices for Organizing Prompts
+
+1. Create a new variant for each experiment or approach you explore. Some teams create variants per model. Others create variants per user or per approach. When you try a new approach, create a new variant.
+2. Add commit notes to each version. Explain the changes you made.
+3. Deploy variant versions to staging when ready. Use at least a staging environment and a production environment.
+4. Have your internal team test the staging environment before deploying to production. At minimum, do a vibe check on the prompt.
+
diff --git a/docs/docs/prompt-engineering/02-overview.mdx b/docs/docs/prompt-engineering/02-overview.mdx
deleted file mode 100644
index 6864593dd9..0000000000
--- a/docs/docs/prompt-engineering/02-overview.mdx
+++ /dev/null
@@ -1,90 +0,0 @@
----
-title: "Overview"
-description: "Learn how to effectively manage, version, and deploy LLM prompts and configurations. Discover how prompt management helps teams collaborate, track changes, and maintain consistency across development and production environments"
----
-
-```mdx-code-block
-import DocCard from '@theme/DocCard';
-import clsx from 'clsx';
-import Image from "@theme/IdealImage";
-
-```
-
-
-Building LLM-powered applications is an iterative process. In each iteration, you aim to improve the application's performance by refining prompts, adjusting configurations, and evaluating outputs.
-
-<Image
-  style={{ display: "block", margin: "20px auto" }}
-  img={require("/images/prompt_management/illustration-llmops.png")}
-  alt="Illustration of the LLMOPs process"
-  loading="lazy"
-/>
-
-### Why do I need a prompt management system?
-
-A prompt management system enables everyone on the team—from **product owners** to **subject matter experts**—to collaborate in creating prompts. Additionally it helps you answer the following questions:
-
-- Which prompts have we tried?
-- What were the outputs of these prompts?
-- How do the evaluation results of these prompts compare?
-- Which prompt version was used for a specific generation in production?
-- What was the effect of publishing the new version of this prompt in production?
-- Who on the team made changes to a particular prompt version in production?
-
-### Features in agenta
-
-Agenta provides you with the following capabilities:
-
-- A playground where developers and subject matter experts can collaboratively create and test prompts and compare models
-- A prompt management system where, you can: 
-  - **Versioning Prompts**: Keeping track of different prompts you've tested and a history of changes in production. 
-  - **Linking Prompts to Experiments**: Connecting each prompt version to its evaluation metrics to understand the effect of changes and determine the best variant.
-  - **Linking Prompts to Traces**: Monitoring how changes in prompt versions affect the traces and production metrics.
-
-:::info
-Agenta goes beyond prompt management to encompass the entire configuration of your LLM applications. If your LLM workflow is more complex than a single prompt (e.g., Retrieval-Augmented Generation (RAG) or a chain of prompts), you can version the **whole configuration** together.
-
-In contrast to a **prompt**, a **configuration** of an LLM application can include additional parameters beyond prompt templates and models (with their parameters). For instance:
-
-- An LLM application using a **chain of two prompts** would have a configuration that includes the two prompts and their respective model parameters.
-- An application that includes a **RAG pipeline** would have a configuration that includes parameters such as `top_k` and `embedding`.
-
-```json title="Example RAG configuration"
-{
-  "top_k": 3,
-  "embedding": "text-embedding-3-large",
-  "prompt-query": "We have provided context information below. {context_str}. Given this information, please answer the question: {query_str}\n",
-  "model-query": "openai/gpt-o1",
-  "temperature-query": "1.0"
-}
-```
-:::
-
-
-### Get started
-
-
-<section className='row'>
-<article key="1" className="col col--6 margin-bottom--lg">
-<DocCard
-    item={{
-      type: "link",
-      href: "/evaluation/no-code-evaluation",
-      label: "Explore the Prompt Management SDK",
-      description: "Learn advanced features of the prompt management SDK"
-    }}
-  />
-</article>
-
-<article key='2' className="col col--6 margin-bottom--lg">
-<DocCard
-    item={{
-      type: "link",
-      href: "/evaluation/sdk-evaluation",
-      label: "Explore the Playground",
-      description: "Learn how to use the playground"
-    }}
-  />
-</article>
-</section>
-
diff --git a/docs/docs/prompt-engineering/prompt-management/_category_.json b/docs/docs/prompt-engineering/_category_.json
similarity index 100%
rename from docs/docs/prompt-engineering/prompt-management/_category_.json
rename to docs/docs/prompt-engineering/_category_.json
diff --git a/docs/docs/prompt-engineering/_managing-prompts-ui/01-create-and-commit.mdx b/docs/docs/prompt-engineering/_managing-prompts-ui/01-create-and-commit.mdx
new file mode 100644
index 0000000000..58b906273b
--- /dev/null
+++ b/docs/docs/prompt-engineering/_managing-prompts-ui/01-create-and-commit.mdx
@@ -0,0 +1,19 @@
+---
+title: "Create and Commit Prompts"
+sidebar_position: 1
+sidebar_label: "Create and Commit Prompts"
+description: "Learn how to create variants and commit changes using the Agenta UI."
+draft: true
+
+---
+
+<!-- TODO: Add comprehensive guide on creating and committing prompts from the UI -->
+<!-- This should cover:
+- How to create a new variant in the playground
+- How to modify prompt templates
+- How to commit changes with commit messages
+- How to view commit history
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction. For now, please refer to the [Quick Start guide](/prompt-engineering/quick-start) which covers creating and committing prompts from the UI.
diff --git a/docs/docs/prompt-engineering/_managing-prompts-ui/01-creating-prompts.mdx b/docs/docs/prompt-engineering/_managing-prompts-ui/01-creating-prompts.mdx
new file mode 100644
index 0000000000..6506c050b0
--- /dev/null
+++ b/docs/docs/prompt-engineering/_managing-prompts-ui/01-creating-prompts.mdx
@@ -0,0 +1,17 @@
+---
+title: "Creating Prompts"
+sidebar_position: 1
+sidebar_label: "Creating Prompts"
+description: "Learn how to create prompts from the Agenta UI."
+draft: true
+
+---
+
+import Image from "@theme/IdealImage";
+
+This guide shows you how to create prompts from the Agenta UI.
+## Creating a new application
+
+### Creating a new application from the UI
+
+Go to the App Management page and click on the **Create New Prompt** button, then choose the type of application you want to create (chat for multi-turn chat applications or completion for single-turn applications).
diff --git a/docs/docs/prompt-engineering/_managing-prompts-ui/02-deploy.mdx b/docs/docs/prompt-engineering/_managing-prompts-ui/02-deploy.mdx
new file mode 100644
index 0000000000..eda7f3df09
--- /dev/null
+++ b/docs/docs/prompt-engineering/_managing-prompts-ui/02-deploy.mdx
@@ -0,0 +1,20 @@
+---
+title: "Deploy to Environments"
+sidebar_position: 2
+sidebar_label: "Deploy to Environments"
+description: "Learn how to deploy variants to different environments using the Agenta UI."
+draft: true
+
+---
+
+<!-- TODO: Add comprehensive guide on deploying from the UI -->
+<!-- This should cover:
+- How to deploy from the Playground
+- How to deploy from the Registry page
+- How to deploy from the Deployments page
+- How to add deployment notes
+- How to choose between development, staging, and production
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction. For now, please refer to the [Quick Start guide](/prompt-engineering/quick-start) which covers deploying variants from the UI.
diff --git a/docs/docs/prompt-engineering/_managing-prompts-ui/03-versions-rollback.mdx b/docs/docs/prompt-engineering/_managing-prompts-ui/03-versions-rollback.mdx
new file mode 100644
index 0000000000..7bb5c5763b
--- /dev/null
+++ b/docs/docs/prompt-engineering/_managing-prompts-ui/03-versions-rollback.mdx
@@ -0,0 +1,19 @@
+---
+title: "Manage Versions and Rollback"
+sidebar_position: 3
+sidebar_label: "Manage Versions and Rollback"
+description: "Learn how to view version history and rollback to previous versions using the Agenta UI."
+draft: true
+---
+
+<!-- TODO: Add comprehensive guide on version management and rollback from the UI -->
+<!-- This should cover:
+- How to view variant version history in the Registry
+- How to compare different versions
+- How to rollback to a previous version
+- How to view deployment history
+- How to understand version numbers and commits
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction. For now, please refer to the [Quick Start guide](/prompt-engineering/quick-start) and the [Concepts page](/prompt-engineering/concepts) for information on versioning.
diff --git a/docs/docs/prompt-engineering/_managing-prompts-ui/_category_.json b/docs/docs/prompt-engineering/_managing-prompts-ui/_category_.json
new file mode 100644
index 0000000000..c209af07e8
--- /dev/null
+++ b/docs/docs/prompt-engineering/_managing-prompts-ui/_category_.json
@@ -0,0 +1,5 @@
+{
+    "position": 4,
+    "label": "Managing Prompts from UI",
+    "collapsed": true
+}
\ No newline at end of file
diff --git a/docs/docs/prompt-engineering/integrating-prompts/01-integrating-with-agenta.mdx b/docs/docs/prompt-engineering/integrating-prompts/01-integrating-with-agenta.mdx
new file mode 100644
index 0000000000..8a5291e6ff
--- /dev/null
+++ b/docs/docs/prompt-engineering/integrating-prompts/01-integrating-with-agenta.mdx
@@ -0,0 +1,55 @@
+---
+title: "How to Integrate with Agenta"
+description: "Integrate applications and prompts created in Agenta into your projects."
+---
+
+import Image from "@theme/IdealImage";
+
+Agenta integrates with your workflow. You can use the latest version of your deployed prompt in your application. With Agenta, **you can update prompts directly from the web interface without modifying your code** each time.
+
+Here are two ways to use prompts from Agenta in your code:
+
+### [1. As a prompt management system](/prompt-engineering/integrating-prompts/fetch-prompt-programatically)
+
+Prompts are managed and stored in the Agenta backend. You use the Agenta SDK to fetch the latest deployed version of your prompt. Then you use it in your application.
+
+**Advantages**:
+
+- Agenta operates outside your application's critical path.
+- You can fetch and cache the latest prompt version for zero latency usage.
+
+**Considerations**:
+
+- You need to set up [observability integration](/observability/quickstart-python) yourself. This is required if you want to trace your calls for debugging and cost tracking.
+
+<Image
+  class="bg-white"
+  img={require("/images/prompt_management/as-a-prompt-management.png")}
+  loading="lazy"
+  alt="A sequence diagram showing how to integrate with Agenta as a prompt management system"
+/>
+
+
+
+### [2. As a middleware/gateway (invoking prompt)](/prompt-engineering/integrating-prompts/proxy-calls)
+
+You invoke your prompts directly through Agenta. Agenta provides you with an endpoint that forwards requests to the LLM on your behalf.
+
+**Advantages**:
+
+- Simplified deployment.
+- Automatic tracing without any code changes.
+
+**Considerations**:
+
+- Adds slight latency to the response (approximately 0.3 seconds).
+- Streaming is not supported for these endpoints.
+
+This approach works best for applications where latency is not critical.
+
+<Image
+  class="bg-white"
+  img={require("/images/prompt_management/as-a-proxy.png")}
+  loading="lazy"
+  alt="A sequence diagram showing how to integrate with Agenta as    a proxy"
+/>
diff --git a/docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt-programatically.mdx b/docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt-programatically.mdx
new file mode 100644
index 0000000000..22ea6668c6
--- /dev/null
+++ b/docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt-programatically.mdx
@@ -0,0 +1,330 @@
+---
+title: "Fetch Prompts via SDK/API"
+sidebar_position: 2
+sidebar_label: "Fetch Prompts via SDK/API"
+description: "Learn how to fetch prompt configurations using the Agenta SDK or API."
+---
+
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
+
+This guide shows you how to fetch prompt configurations from variants or environments using the Agenta SDK.
+
+<GoogleColabButton notebookPath="examples/jupyter/prompt-management/how-to-prompt-management.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+## Fetching a Prompt Configuration
+
+You can fetch the configurations from a variant reference (`app_slug`, `variant_slug`, `variant_version`) or an environment reference (`app_slug`, `environment_slug`). The default behavior when fetching is to fetch the latest configuration from the `production` environment. If you don't provide a `variant_version` parameter but only a `variant_slug` or an `environment_slug`, the SDK will fetch the latest version of the variant from the specified **environment/variant**.
+
+:::tip
+Check the [reference](/reference/sdk/configuration-management#prompt-configuration-schema) section for more details on the data format used for prompts.
+:::
+
+### Default Behavior when fetching
+
+If you don't provide either `variant` or `environment` identifiers, the SDK fetches the latest configuration deployed to the `production` environment.
+<Tabs>
+  <TabItem value="Python SDK">
+
+```python
+config = ag.ConfigManager.get_from_registry(
+    app_slug="my-app-slug",
+    variant_slug="my-variant-slug",
+    variant_version=2  # Optional: fetches latest if not provided
+)
+
+print("Fetched configuration from production:")
+print(config)
+```
+Example Output:
+
+```python
+{
+  "prompt": {
+    "messages": [
+      {
+        "role": "system",
+        "content": "You are an assistant that provides concise answers"
+      },
+      {
+        "role": "user",
+        "content": "Explain {{topic}} in simple terms"
+      }
+    ],
+    "llm_config": {
+      "model": "gpt-3.5-turbo",
+      "top_p": 1.0,
+      "max_tokens": 150,
+      "temperature": 0.7,
+      "presence_penalty": 0.0,
+      "frequency_penalty": 0.0
+    },
+    "template_format": "curly"
+  }
+}
+```
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Fetch configuration from production environment (default behavior)
+const fetchResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/fetch', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    application_ref: {
+      slug: 'my-app-slug',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const config = await fetchResponse.json();
+console.log('Fetched configuration from production:');
+console.log(config);
+```
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Fetch configuration from production environment (default behavior)
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/fetch" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "application_ref": {
+      "slug": "my-app-slug",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+  </TabItem>
+</Tabs>
+
+:::tip
+Agenta provides a helper class `PromptTemplate` to format the configuration and then use it to generate the prompt.
+```python
+from openai import OpenAI
+from agenta.sdk.types import PromptTemplate
+
+# Fetch configuration
+config = ag.ConfigManager.get_from_registry(
+    app_slug="my-app-slug"
+)
+
+# Format the prompt with variables
+prompt = PromptTemplate(**config['prompt']).format(topic="AI")
+
+# Use with OpenAI
+client = OpenAI()
+response = client.chat.completions.create(
+    **prompt.to_openai_kwargs()
+)
+
+print(response.choices[0].message.content)
+```
+:::
+
+### Fetching by Variant Reference
+
+<Tabs>
+  <TabItem value="Python SDK">
+
+```python
+# Fetch configuration by variant
+config = ag.ConfigManager.get_from_registry(
+    app_slug="my-app-slug",
+    variant_slug="my-variant-slug",
+    variant_version=2  # Optional: If not provided, fetches the latest version
+)
+
+print("Fetched configuration:")
+print(config)
+```
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Fetch configuration by variant
+const fetchResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/fetch', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    variant_ref: {
+      slug: 'my-variant-slug',
+      version: 2,
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app-slug',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const config = await fetchResponse.json();
+console.log('Fetched configuration:');
+console.log(config);
+```
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Fetch configuration by variant
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/fetch" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "variant_ref": {
+      "slug": "my-variant-slug",
+      "version": 2,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app-slug",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+  </TabItem>
+</Tabs>
+
+### Fetching by Environment Reference
+
+<Tabs>
+  <TabItem value="Python SDK">
+
+```python
+# Fetch the latest configuration from the staging environment
+config = ag.ConfigManager.get_from_registry(
+    app_slug="my-app",
+    environment_slug="staging",
+    environment_version=1  # Optional: If not provided, fetches the latest version
+)
+
+print("Fetched configuration from staging:")
+print(config)
+```
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Fetch the latest configuration from the staging environment
+const fetchResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/fetch', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    environment_ref: {
+      slug: 'staging',
+      version: 1,  // Optional: omit to fetch latest
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const config = await fetchResponse.json();
+console.log('Fetched configuration from staging:');
+console.log(config);
+```
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Fetch the latest configuration from the staging environment
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/fetch" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "environment_ref": {
+      "slug": "staging",
+      "version": 1,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+  </TabItem>
+</Tabs>
+
+## Response Format
+
+The API response contains your prompt configuration under `params`:
+
+```json
+{
+  "params": {
+    "prompt": {
+      "messages": [
+        {
+          "role": "system",
+          "content": "You are an assistant that provides concise answers"
+        },
+        {
+          "role": "user",
+          "content": "Explain {{topic}} in simple terms"
+        }
+      ],
+      "llm_config": {
+        "model": "gpt-3.5-turbo",
+        "max_tokens": 150,
+        "temperature": 0.7,
+        "top_p": 1.0,
+        "frequency_penalty": 0.0,
+        "presence_penalty": 0.0
+      },
+      "template_format": "curly"
+    }
+  },
+  "url": "https://cloud.agenta.ai/services/completion",
+  "application_ref": {
+    "slug": "my-app-slug",
+    "version": null,
+    "id": "..."
+  },
+  "variant_ref": {
+    "slug": "my-variant-slug",
+    "version": 2,
+    "id": "..."
+  },
+  "environment_ref": {
+    "slug": "production",
+    "version": 1,
+    "id": "..."
+  }
+}
+```
+
+:::tip Asynchronous Operations in Python SDK
+All SDK methods have async counterparts with an `a` prefix:
+
+```python
+async def async_operations():
+    # Fetch configuration asynchronously
+    config = await ag.ConfigManager.aget_from_registry(...)
+```
+:::
diff --git a/docs/docs/prompt-engineering/prompt-management/03-proxy-calls.mdx b/docs/docs/prompt-engineering/integrating-prompts/03-proxy-calls.mdx
similarity index 97%
rename from docs/docs/prompt-engineering/prompt-management/03-proxy-calls.mdx
rename to docs/docs/prompt-engineering/integrating-prompts/03-proxy-calls.mdx
index d0012fd612..2dcdface09 100644
--- a/docs/docs/prompt-engineering/prompt-management/03-proxy-calls.mdx
+++ b/docs/docs/prompt-engineering/integrating-prompts/03-proxy-calls.mdx
@@ -1,7 +1,8 @@
 ---
-title: "Proxy LLM Calls"
+title: "Invoke Prompt (Proxy LLM Calls)"
 description: "How to invoke your deployed prompts through Agenta's proxy service with automatic tracing and logging."
 sidebar_position: 4
+sidebar_label: "Invoke Prompts"
 ---
 
 import Image from "@theme/IdealImage";
diff --git a/docs/docs/prompt-engineering/integrating-prompts/_category_.json b/docs/docs/prompt-engineering/integrating-prompts/_category_.json
new file mode 100644
index 0000000000..78650487dd
--- /dev/null
+++ b/docs/docs/prompt-engineering/integrating-prompts/_category_.json
@@ -0,0 +1,5 @@
+{
+    "position": 6,
+    "label": "Integrate with Agenta",
+    "collapsed": true
+}
\ No newline at end of file
diff --git a/docs/docs/prompt-engineering/managing-prompts-programatically/01-setup.mdx b/docs/docs/prompt-engineering/managing-prompts-programatically/01-setup.mdx
new file mode 100644
index 0000000000..c8e9108dd8
--- /dev/null
+++ b/docs/docs/prompt-engineering/managing-prompts-programatically/01-setup.mdx
@@ -0,0 +1,62 @@
+---
+title: "Setup and Installation"
+sidebar_position: 1
+sidebar_label: "Setup and Overview"
+description: "Learn how to install and initialize the Agenta SDK for prompt management."
+---
+
+import Image from "@theme/IdealImage";
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+
+This guide shows you how to set up the Agenta SDK for managing prompts programmatically.
+
+<GoogleColabButton notebookPath="examples/jupyter/prompt-management/how-to-prompt-management.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+## Prerequisites
+
+Before starting, familiarize yourself with how versioning works in Agenta. Details are available on the [concepts page](/concepts/concepts).
+
+:::info Versioning in Agenta
+
+{" "}
+
+<Image
+  style={{ display: "block", margin: "16px 0px 16px 0px" }}
+  img={require("/images/prompt_management/taxonomy-concepts.png")}
+  alt="Taxonomy of concepts in Agenta"
+  loading="lazy"
+/>
+
+Agenta uses a Git-like structure for prompt versioning:
+- Create multiple branches called **variants**
+- Each variant is versioned.
+- Deploy specific versions to **environments** (development, staging, production)
+
+**Typical workflow:**
+1. Create a variant (branch)
+2. Commit changes to the variant creating a new version
+3. Deploy the version to an environment
+
+:::
+
+## Setup the SDK
+
+Initialize the SDK before using any operations:
+
+```python
+import os
+import agenta as ag
+
+# Set your API credentials
+os.environ["AGENTA_API_KEY"] = "your-api-key-here"
+os.environ["AGENTA_HOST"] = "https://cloud.agenta.ai" # only needed if self-hosting
+
+# Initialize the SDK
+ag.init()
+```
+
+:::info
+You can skip this step if you are using the API directly.
+:::
diff --git a/docs/docs/prompt-engineering/managing-prompts-programatically/02-creating-prompts.mdx b/docs/docs/prompt-engineering/managing-prompts-programatically/02-creating-prompts.mdx
new file mode 100644
index 0000000000..0b2c40d39f
--- /dev/null
+++ b/docs/docs/prompt-engineering/managing-prompts-programatically/02-creating-prompts.mdx
@@ -0,0 +1,82 @@
+---
+title: "Create Prompts"
+sidebar_position: 1
+sidebar_label: "Create Prompts"
+description: "Learn how to create prompts from the Agenta SDK."
+---
+
+import Image from "@theme/IdealImage";
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
+
+This guide shows you how to set up the Agenta SDK for managing prompts programmatically.
+
+<GoogleColabButton notebookPath="examples/jupyter/prompt-management/how-to-prompt-management.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+
+## Creating a new application
+
+Prompts are applications in Agenta. To create a new prompt, you need to specify the slug, and the type of the application. 
+
+The type of the application can be `SERVICE:completion`, `SERVICE:chat`, or `CUSTOM`:
+
+- `SERVICE:completion` is for single-turn prompts.
+- `SERVICE:chat` is for multi-turn prompts.
+- `CUSTOM` is for custom prompts.
+
+<Tabs>
+  <TabItem value="Python SDK">
+
+```python
+# Creates an empty application
+app = ag.AppManager.create(
+    app_slug="my-app-slug",
+    template_key="SERVICE:completion", # we define here the app type
+    # template_key="SERVICE:chat" # chat prompts
+    # template_key="CUSTOM" # custom configuration (schema-less, however unless you provide a URI, you can only use the registry but not the playground)
+)
+```
+
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+const response = await fetch('https://cloud.agenta.ai/api/apps', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    app_name: 'my-app-slug',
+    template_key: 'SERVICE:completion'
+  })
+});
+
+const result = await response.json();
+console.log(result);
+```
+
+</TabItem>
+  <TabItem value="API">
+
+```bash
+curl -X POST "https://cloud.agenta.ai/api/apps" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "app_name": "my-app-slug",
+    "template_key": "SERVICE:completion"
+  }'
+```
+  
+  </TabItem>
+</Tabs>
+
+
+:::warning
+The app created until now is empty. You cannot use it from the UI yet. You need to create a variant and commit changes to it to be able to use it (next section).
+:::
\ No newline at end of file
diff --git a/docs/docs/prompt-engineering/prompt-management/02-prompt-management-sdk.mdx b/docs/docs/prompt-engineering/managing-prompts-programatically/03-create-and-commit.mdx
similarity index 62%
rename from docs/docs/prompt-engineering/prompt-management/02-prompt-management-sdk.mdx
rename to docs/docs/prompt-engineering/managing-prompts-programatically/03-create-and-commit.mdx
index 834e778f49..9678568066 100644
--- a/docs/docs/prompt-engineering/prompt-management/02-prompt-management-sdk.mdx
+++ b/docs/docs/prompt-engineering/managing-prompts-programatically/03-create-and-commit.mdx
@@ -1,91 +1,27 @@
 ---
-title: "How to Manage Prompts with the SDK"
+title: "Version Prompts"
 sidebar_position: 2
-sidebar_label: "Manage Prompts with the SDK"
-description: "Learn how to create variants, commit changes, deploy to environments, and fetch configurations using the Agenta SDK."
+sidebar_label: "Version Prompts"
+description: "Learn how to create variants and commit changes using the Agenta SDK."
 ---
 
-import Image from "@theme/IdealImage";
 import GoogleColabButton from "@site/src/components/GoogleColabButton";
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
 
-
-This guide covers all prompt management operations using the Agenta SDK: creating variants, committing changes, deploying to environments, and fetching configurations.
+This guide covers how to create variants and commit changes to them using the Agenta SDK.
 
 <GoogleColabButton notebookPath="examples/jupyter/prompt-management/how-to-prompt-management.ipynb">
   Open in Google Colaboratory
 </GoogleColabButton>
 
-## Prerequisites
-
-Before starting, familiarize yourself with how versioning works in Agenta. Details are available on the [concepts page](/concepts/concepts).
-
-:::info Versioning in Agenta
-
-{" "}
-
-<Image
-  style={{ display: "block", margin: "16px 0px 16px 0px" }}
-  img={require("/images/prompt_management/taxonomy-concepts.png")}
-  alt="Taxonomy of concepts in Agenta"
-  loading="lazy"
-/>
-
-Agenta uses a Git-like structure for prompt versioning:
-- Create multiple branches called **variants**
-- Each variant is versioned.
-- Deploy specific versions to **environments** (development, staging, production)
-
-**Typical workflow:**
-1. Create a variant (branch)
-2. Commit changes to the variant creating a new version
-3. Deploy the version to an environment
-
-:::
-
-## Setup
-
-Initialize the SDK before using any operations:
-
-```python
-import os
-import agenta as ag
-
-# Set your API credentials
-os.environ["AGENTA_API_KEY"] = "your-api-key-here" 
-os.environ["AGENTA_HOST"] = "https://cloud.agenta.ai" # only needed if self-hosting
-
-# Initialize the SDK
-ag.init()
-```
-## Creating a new application
-
-You can create a new application from the UI or programmatically.
-
-### Creating a new application from the UI
-
-Go to the App Management page and click on the "Create New Prompt" button, then choose the type of application you want to create (chat for multi-turn chat applications or completion for single-turn applications).
-
-### Creating a new application from the SDK
-
-```Python
-# Creates an empty application
-app = ag.AppManager.create(
-    app_slug="my-app-slug",
-    template_key="SERVICE:completion", # we define here the app type
-    # template_key="SERVICE:chat" # chat prompts
-    # template_key="CUSTOM" # custom configuration (schema-less, however unless you provide a URI, you can only use the registry but not the playground)
-)
-```
-
-:::warning
-The app created until now is empty. You cannot use it from the UI yet. You need to create a variant and commit changes to it to be able to use it (next section).
-:::
-
 ## Creating and Managing Variants
 
 ### Create a New Variant
 
 Use `VariantManager.create` to create a new variant with initial configuration:
+<Tabs>
+  <TabItem value="Python SDK">
 
 ```python
 from agenta.sdk.types import PromptTemplate, Message, ModelConfig
@@ -121,6 +57,147 @@ variant = ag.VariantManager.create(
 )
 ```
 
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Step 1: Add variant configuration (creates empty variant)
+const addConfigResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/add', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    variant_ref: {
+      slug: 'my-variant-slug',
+      version: null,
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app-slug',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const config = await addConfigResponse.json();
+
+// Step 2: Commit parameters to the variant
+const commitResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/commit', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    config: {
+      params: {
+        prompt: {
+          messages: [
+            {
+              role: 'system',
+              content: 'You are an assistant that provides concise answers'
+            },
+            {
+              role: 'user',
+              content: 'Explain {{topic}} in simple terms'
+            }
+          ],
+          llm_config: {
+            model: 'gpt-3.5-turbo',
+            max_tokens: 150,
+            temperature: 0.7,
+            top_p: 1.0,
+            frequency_penalty: 0.0,
+            presence_penalty: 0.0
+          }
+        }
+      },
+      variant_ref: {
+        slug: 'my-variant-slug',
+        version: null,
+        id: null
+      },
+      application_ref: {
+        slug: 'my-app-slug',
+        version: null,
+        id: null
+      }
+    }
+  })
+});
+
+const result = await commitResponse.json();
+console.log(result);
+```
+
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Step 1: Add variant configuration (creates empty variant)
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/add" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "variant_ref": {
+      "slug": "my-variant-slug",
+      "version": null,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app-slug",
+      "version": null,
+      "id": null
+    }
+  }'
+
+# Step 2: Commit parameters to the variant
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/commit" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "config": {
+      "params": {
+        "prompt": {
+          "messages": [
+            {
+              "role": "system",
+              "content": "You are an assistant that provides concise answers"
+            },
+            {
+              "role": "user", 
+              "content": "Explain {{topic}} in simple terms"
+            }
+          ],
+          "llm_config": {
+            "model": "gpt-3.5-turbo",
+            "max_tokens": 150,
+            "temperature": 0.7,
+            "top_p": 1.0,
+            "frequency_penalty": 0.0,
+            "presence_penalty": 0.0
+          }
+        }
+      },
+      "variant_ref": {
+        "slug": "my-variant-slug",
+        "version": null,
+        "id": null
+      },
+      "application_ref": {
+        "slug": "my-app-slug",
+        "version": null,
+        "id": null
+      }
+    }
+  }'
+```
+
+  </TabItem>
+</Tabs>
 
 :::tip
 Use `VariantManager.acreate` for async variant creation.
@@ -165,6 +242,8 @@ This command will create a new variant and initialize it with the first commit c
 
 To save changes to a variant (creating a new version), use the `VariantManager.commit` method with explicit parameters.
 
+<Tabs>
+  <TabItem value="Python SDK">  
 ```python
 
 config2=Config(
@@ -206,6 +285,107 @@ print(variant)
 #     print("Committed new version of variant (async):")
 #     print(variant)
 ```
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Commit new version with updated parameters
+const commitResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/commit', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    config: {
+      params: {
+        prompt: {
+          messages: [
+            {
+              role: 'system',
+              content: 'You are an assistant that provides VERY concise answers'
+            },
+            {
+              role: 'user',
+              content: 'Explain {{topic}} in simple terms'
+            }
+          ],
+          llm_config: {
+            model: 'anthropic/claude-3-5-sonnet-20240620',
+            max_tokens: 150,
+            temperature: 0.7,
+            top_p: 1.0,
+            frequency_penalty: 0.0,
+            presence_penalty: 0.0
+          }
+        }
+      },
+      variant_ref: {
+        slug: 'my-variant-slug',
+        version: null,
+        id: null
+      },
+      application_ref: {
+        slug: 'my-app-slug',
+        version: null,
+        id: null
+      }
+    }
+  })
+});
+
+const result = await commitResponse.json();
+console.log('Committed new version of variant:');
+console.log(result);
+```
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Commit new version with updated parameters
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/commit" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "config": {
+      "params": {
+        "prompt": {
+          "messages": [
+            {
+              "role": "system",
+              "content": "You are an assistant that provides VERY concise answers"
+            },
+            {
+              "role": "user", 
+              "content": "Explain {{topic}} in simple terms"
+            }
+          ],
+          "llm_config": {
+            "model": "anthropic/claude-3-5-sonnet-20240620",
+            "max_tokens": 150,
+            "temperature": 0.7,
+            "top_p": 1.0,
+            "frequency_penalty": 0.0,
+            "presence_penalty": 0.0
+          }
+        }
+      },
+      "variant_ref": {
+        "slug": "my-variant-slug",
+        "version": null,
+        "id": null
+      },
+      "application_ref": {
+        "slug": "my-app-slug",
+        "version": null,
+        "id": null
+      }
+    }
+  }'
+```
+
+  </TabItem>
+</Tabs>
 
 :::tip
 Use `VariantManager.acommit` for async version commit.
@@ -474,6 +654,9 @@ print(config)
 
 To delete a variant, use the `VariantManager.delete` method.
 
+<Tabs>
+  <TabItem value="Python SDK">
+
 ```python
 # Delete a variant
 ag.VariantManager.delete(
@@ -485,6 +668,63 @@ ag.VariantManager.delete(
 print("Variant deleted successfully.")
 ```
 
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Delete a variant
+const deleteResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/delete', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    variant_ref: {
+      slug: 'obsolete-variant',
+      version: null,
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app',
+      version: null,
+      id: null
+    }
+  })
+});
+
+if (deleteResponse.status === 204) {
+  console.log('Variant deleted successfully.');
+} else {
+  console.error('Failed to delete variant:', deleteResponse.status);
+}
+```
+
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Delete a variant
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/delete" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "variant_ref": {
+      "slug": "obsolete-variant",
+      "version": null,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+
+  </TabItem>
+</Tabs>
+
 :::warning
 
 - Deleting a variant removes all versions of the variant. This action is irreversible.
@@ -495,6 +735,9 @@ print("Variant deleted successfully.")
 
 To list all variants of an application, use the `VariantManager.list` method.
 
+<Tabs>
+  <TabItem value="Python SDK">
+
 ```python
 # List all variants (syncrhonously)
 variants = ag.VariantManager.list(
@@ -505,6 +748,50 @@ variants = ag.VariantManager.list(
 print(variants)
 ```
 
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// List all variants for an application
+const listResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/list', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    application_ref: {
+      slug: 'my-app',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const variants = await listResponse.json();
+console.log(variants);
+```
+
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# List all variants for an application
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/list" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "application_ref": {
+      "slug": "my-app",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+
+  </TabItem>
+</Tabs>
+
 **Sample Output:**
 
 ```python
@@ -601,9 +888,12 @@ print(variants)
 }]
 ```
 
-## Fetching a Variant's history
+## Fetching a Variant's History
 
-To list all versions for a variant of an application, use the `VariantManager.list` method.
+To list all versions for a variant of an application, use the `VariantManager.history` method.
+
+<Tabs>
+  <TabItem value="Python SDK">
 
 ```python
 # List all variant versions/history (synchronously)
@@ -616,23 +906,73 @@ versions = ag.VariantManager.history(
 print(versions)
 ```
 
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// List all versions/history for a specific variant
+const historyResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/history', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    variant_ref: {
+      slug: 'variant-slug',
+      version: null,
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const versions = await historyResponse.json();
+console.log(versions);
+```
+
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# List all versions/history for a specific variant
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/history" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "variant_ref": {
+      "slug": "variant-slug",
+      "version": null,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+
+  </TabItem>
+</Tabs>
+
 **Sample Output:**
 
 Same as `VariantManager.list` but limited to the history of a specific variant.
 
-## Asynchronous Operations
-
+:::tip Asynchronous Operations in Python SDK
 All SDK methods have async counterparts with an `a` prefix:
 
 ```python
 async def async_operations():
     # Create variant asynchronously
     variant = await ag.VariantManager.acreate(...)
-    
+
     # Commit changes asynchronously
     updated_variant = await ag.VariantManager.acommit(...)
-    
-    # Fetch configuration asynchronously
-    config = await ag.ConfigManager.aget_from_registry(...)
 ```
-
+:::
diff --git a/docs/docs/prompt-engineering/managing-prompts-programatically/04-deploy.mdx b/docs/docs/prompt-engineering/managing-prompts-programatically/04-deploy.mdx
new file mode 100644
index 0000000000..72b528f597
--- /dev/null
+++ b/docs/docs/prompt-engineering/managing-prompts-programatically/04-deploy.mdx
@@ -0,0 +1,211 @@
+---
+title: "Deploy to Environments"
+sidebar_position: 4
+sidebar_label: "Deploy to Environments"
+description: "Learn how to deploy variants to different environments using the Agenta SDK."
+---
+
+import GoogleColabButton from "@site/src/components/GoogleColabButton";
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
+
+This guide shows you how to deploy variants to environments (development, staging, production) using the Agenta SDK.
+
+<GoogleColabButton notebookPath="examples/jupyter/prompt-management/how-to-prompt-management.ipynb">
+  Open in Google Colaboratory
+</GoogleColabButton>
+
+## Deploying to Environments
+
+To deploy a variant to an environment, use the `DeploymentManager.deploy` method with the variant reference and `environment_slug`: The slug of the environment (`development`, `staging`, or `production`).
+
+<Tabs>
+  <TabItem value="Python SDK">
+
+```python
+deployment = ag.DeploymentManager.deploy(
+    app_slug="my-app-slug",
+    variant_slug="my-variant-slug",
+    variant_version=None,  # Deploys latest version if not specified
+    environment_slug="staging"  # Options: development, staging, production
+)
+
+print(f"Deployed to {deployment['environment_slug']}")
+```
+
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Deploy a variant to an environment
+const deployResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/deploy', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    variant_ref: {
+      slug: 'my-variant-slug',
+      version: null,
+      id: null
+    },
+    environment_ref: {
+      slug: 'staging',
+      version: null,
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app-slug',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const deployment = await deployResponse.json();
+console.log(`Deployed to ${deployment.environment_slug}`);
+```
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Deploy a variant to an environment
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/deploy" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "variant_ref": {
+      "slug": "my-variant-slug",
+      "version": null,
+      "id": null
+    },
+    "environment_ref": {
+      "slug": "staging",
+      "version": null,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app-slug",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+  </TabItem>
+</Tabs>
+
+:::warning
+- Deploying a variant without specifying a `variant_version` deploys the latest version.
+- Only predefined environments with slugs `development`, `staging`, and `production` are currently supported.
+:::
+
+**Sample Output:**
+
+```python
+Deployed variant to environment:
+{
+    "app_id": "01963413-3d39-7650-80ce-3ad5d688da6c",
+    "app_slug": "completion",
+    "variant_id": "01968c11-6f7c-7773-b273-922c5807be7b",
+    "variant_slug": "my-variant-slug4",
+    "variant_version": 5,
+    "environment_id": "01968c14-c35d-7440-bcc8-9def594f017f",
+    "environment_slug": "staging",
+    "environment_version": 2,
+    "committed_at": "2025-05-01T07:26:08.935406+00:00",
+    "committed_by": "user@agenta.ai",
+    "committed_by_id": "0196247a-ec9d-7051-8880-d58279570aa1",
+    "deployed_at": "2025-05-01T13:41:33.149595+00:00",
+    "deployed_by": "user@agenta.ai",
+    "deployed_by_id": "0196247a-ec9d-7051-8880-d58279570aa1"
+}
+```
+
+## Rolling Back to a Previous Version
+
+To rollback to a previous version, you can deploy a specific version of a variant to an environment:
+
+<Tabs>
+  <TabItem value="Python SDK">
+
+```python
+# Deploy a specific version (rollback)
+deployment = ag.DeploymentManager.deploy(
+    app_slug="my-app-slug",
+    variant_slug="my-variant-slug",
+    variant_version=3,  # Specify the version you want to rollback to
+    environment_slug="production"
+)
+
+print(f"Rolled back to version {deployment['variant_version']}")
+```
+
+  </TabItem>
+  <TabItem value="JavaScript" label="JS/TS">
+
+```javascript
+// Deploy a specific version (rollback)
+const deployResponse = await fetch('https://cloud.agenta.ai/api/variants/configs/deploy', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+    'Authorization': 'Bearer YOUR_API_KEY'
+  },
+  body: JSON.stringify({
+    variant_ref: {
+      slug: 'my-variant-slug',
+      version: 3,
+      id: null
+    },
+    environment_ref: {
+      slug: 'production',
+      version: null,
+      id: null
+    },
+    application_ref: {
+      slug: 'my-app-slug',
+      version: null,
+      id: null
+    }
+  })
+});
+
+const deployment = await deployResponse.json();
+console.log(`Rolled back to version ${deployment.variant_version}`);
+```
+  </TabItem>
+  <TabItem value="API">
+
+```bash
+# Deploy a specific version (rollback)
+curl -X POST "https://cloud.agenta.ai/api/variants/configs/deploy" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "variant_ref": {
+      "slug": "my-variant-slug",
+      "version": 3,
+      "id": null
+    },
+    "environment_ref": {
+      "slug": "production",
+      "version": null,
+      "id": null
+    },
+    "application_ref": {
+      "slug": "my-app-slug",
+      "version": null,
+      "id": null
+    }
+  }'
+```
+  </TabItem>
+</Tabs>
+
+This effectively rolls back the environment to the specified version of the variant.
+
+:::tip
+You can view the full history of deployments and versions in the Agenta UI under the Registry and Deployments pages.
+:::
+
diff --git a/docs/docs/prompt-engineering/managing-prompts-programatically/05-fetch-prompts.mdx b/docs/docs/prompt-engineering/managing-prompts-programatically/05-fetch-prompts.mdx
new file mode 100644
index 0000000000..0e41a8ffef
--- /dev/null
+++ b/docs/docs/prompt-engineering/managing-prompts-programatically/05-fetch-prompts.mdx
@@ -0,0 +1,10 @@
+---
+title: "Fetch Prompts from Environments"
+sidebar_position: 5
+sidebar_label: "Fetch Prompts"
+description: "Learn how to fetch prompt configurations from environments using the Agenta SDK and API."
+---
+
+import { Redirect } from '@docusaurus/router';
+
+<Redirect to="/prompt-engineering/integrating-prompts/fetch-prompt-programatically" />
diff --git a/docs/docs/prompt-engineering/managing-prompts-programatically/_category_.json b/docs/docs/prompt-engineering/managing-prompts-programatically/_category_.json
new file mode 100644
index 0000000000..3ec07828bb
--- /dev/null
+++ b/docs/docs/prompt-engineering/managing-prompts-programatically/_category_.json
@@ -0,0 +1,5 @@
+{
+    "position": 3,
+    "label": "Manage Prompts with SDK/API",
+    "collapsed": true
+}
\ No newline at end of file
diff --git a/docs/docs/prompt-engineering/playground/01-using-the-playground.mdx b/docs/docs/prompt-engineering/playground/01-using-playground.mdx
similarity index 99%
rename from docs/docs/prompt-engineering/playground/01-using-the-playground.mdx
rename to docs/docs/prompt-engineering/playground/01-using-playground.mdx
index 3c4f2f57f4..5f7a030a81 100644
--- a/docs/docs/prompt-engineering/playground/01-using-the-playground.mdx
+++ b/docs/docs/prompt-engineering/playground/01-using-playground.mdx
@@ -111,7 +111,7 @@ This parallel testing helps you understand how different prompts and models hand
 
 ### Version Control and Deployment
 
-The playground integrates with Agenta's [prompt management system](/prompt-engineering/overview) to track and deploy changes:
+The playground integrates with Agenta's [prompt management system](/prompt-engineering/concepts) to track and deploy changes:
 
 1. Click "Commit" to save a new version of your variant
 2. Changes remain in development until explicitly deployed
diff --git a/docs/docs/prompt-engineering/playground/02-adding-custom-providers.mdx b/docs/docs/prompt-engineering/playground/02-custom-providers.mdx
similarity index 100%
rename from docs/docs/prompt-engineering/playground/02-adding-custom-providers.mdx
rename to docs/docs/prompt-engineering/playground/02-custom-providers.mdx
diff --git a/docs/docs/prompt-engineering/playground/_03-tools-function-calling.mdx b/docs/docs/prompt-engineering/playground/_03-tools-function-calling.mdx
new file mode 100644
index 0000000000..23f571bc15
--- /dev/null
+++ b/docs/docs/prompt-engineering/playground/_03-tools-function-calling.mdx
@@ -0,0 +1,17 @@
+---
+title: "Use Tools and Function Calling"
+sidebar_position: 3
+sidebar_label: "Tools and Function Calling"
+description: "Learn how to use tools and function calling in the Agenta playground."
+---
+
+<!-- TODO: Add comprehensive guide on using tools and function calling in the playground -->
+<!-- This should cover:
+- How to define tools in the playground
+- How to configure function calling
+- How to test function calling outputs
+- Examples with different LLM providers that support function calling
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction.
diff --git a/docs/docs/prompt-engineering/playground/_04-structured-output.mdx b/docs/docs/prompt-engineering/playground/_04-structured-output.mdx
new file mode 100644
index 0000000000..59a64ee1e1
--- /dev/null
+++ b/docs/docs/prompt-engineering/playground/_04-structured-output.mdx
@@ -0,0 +1,18 @@
+---
+title: "Work with Structured Output"
+sidebar_position: 4
+sidebar_label: "Structured Output"
+description: "Learn how to use structured output and JSON schema in the Agenta playground."
+---
+
+<!-- TODO: Add comprehensive guide on structured output in the playground -->
+<!-- This should cover:
+- How to configure JSON schema response format
+- How to use dynamic variables in JSON schema (covered briefly in using-playground.mdx)
+- How to test and validate structured outputs
+- Examples with different schema types
+- Best practices for schema design
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction. For basic information on dynamic JSON schema variables, see the [Using the Playground guide](/prompt-engineering/playground/using-playground#dynamic-json-schema-variables).
diff --git a/docs/docs/prompt-engineering/playground/_05-test-sets.mdx b/docs/docs/prompt-engineering/playground/_05-test-sets.mdx
new file mode 100644
index 0000000000..c772b6aa8a
--- /dev/null
+++ b/docs/docs/prompt-engineering/playground/_05-test-sets.mdx
@@ -0,0 +1,18 @@
+---
+title: "Load and Save Test Sets"
+sidebar_position: 5
+sidebar_label: "Load and Save Test Sets"
+description: "Learn how to load and save test sets in the Agenta playground."
+---
+
+<!-- TODO: Add comprehensive guide on working with test sets in the playground -->
+<!-- This should cover:
+- How to load existing test sets
+- How to run multiple test cases
+- How to save individual test cases to test sets
+- How to create new test sets from the playground
+- How to edit test cases
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction. For basic information on test sets, see the [Working with Test Sets section](/prompt-engineering/playground/using-playground#working-with-test-sets) in the playground guide.
diff --git a/docs/docs/prompt-engineering/playground/_06-images.mdx b/docs/docs/prompt-engineering/playground/_06-images.mdx
new file mode 100644
index 0000000000..231f5b22ab
--- /dev/null
+++ b/docs/docs/prompt-engineering/playground/_06-images.mdx
@@ -0,0 +1,18 @@
+---
+title: "Use Images in Prompts"
+sidebar_position: 6
+sidebar_label: "Images"
+description: "Learn how to use images in prompts with vision-capable models in the Agenta playground."
+---
+
+<!-- TODO: Add comprehensive guide on using images in the playground -->
+<!-- This should cover:
+- How to upload images to prompts
+- Which models support vision/images
+- How to reference images in prompts
+- Best practices for image-based prompts
+- Examples with different vision models
+- Screenshots and step-by-step instructions
+-->
+
+This section is under construction.
diff --git a/docs/docs/prompt-engineering/playground/_category_.json b/docs/docs/prompt-engineering/playground/_category_.json
index 9b8516fb9d..5d11187b61 100644
--- a/docs/docs/prompt-engineering/playground/_category_.json
+++ b/docs/docs/prompt-engineering/playground/_category_.json
@@ -1,5 +1,5 @@
 {
-    "position": 4,
-    "label": "Playground",
-    "collapsed": false
+    "position": 5,
+    "label": "Use the Playground",
+    "collapsed": true
 }
\ No newline at end of file
diff --git a/docs/docs/prompt-engineering/prompt-management/01-how-to-integrate-with-agenta.mdx b/docs/docs/prompt-engineering/prompt-management/01-how-to-integrate-with-agenta.mdx
deleted file mode 100644
index e76248986f..0000000000
--- a/docs/docs/prompt-engineering/prompt-management/01-how-to-integrate-with-agenta.mdx
+++ /dev/null
@@ -1,56 +0,0 @@
----
-title: "Integrating with agenta"
-description: "Integrate applications and prompts created in Agenta into your projects."
----
-
-import Image from "@theme/IdealImage";
-
-Agenta easily integrates with your workflow, allowing you to use the latest version of the deployed prompt in your application. With Agenta, **you can update prompts directly from the web interface without modifying your code** each time.
-
-Here are the two ways you can use the prompts from Agenta in your code:
-
-### [1. As a prompt management system](/prompt-engineering/prompt-management/prompt-management-sdk):
-
-In this approach, prompts are managed and stored in the Agenta backend. You use the Agenta SDK to fetch the latest deployed version of your prompt and use it in your application.
-
-**Advantages**:
-
-- Agenta operates outside your application's critical path.
-- Allows you to fetch and cache the latest prompt version for zero latency usage.
-
-**Considerations**:
-
-- You need to set up the [integration with observability](/observability/quickstart) yourself if you want to trace your calls for debugging and cost tracking.
-
-<Image
-  class="bg-white"
-  img={require("/images/prompt_management/as-a-prompt-management.png")}
-  loading="lazy"
-  alt="A sequence diagram showing how to integrate with Agenta as a prompt management system"
-/>
-
-
-
-### **[2. As a middleware / model proxy](/prompt-engineering/prompt-management/proxy-calls)**:
-
-In this setup, Agenta provides you with an endpoint that forwards requests to the LLM on your behalf.
-
-**Advantages**:
-
-- Simplified deployment.
-- Automatic tracing without any changes to your code.
-
-**Considerations**:
-
-- Adds a slight latency to the response (approximately 0.3 seconds).
-- Currently, we don't support streaming for these endpoints.
-
-Overall, this approach is best suited for applications where latency isn't critical.
-
-<Image
-  class="bg-white"
-  img={require("/images/prompt_management/as-a-proxy.png")}
-  loading="lazy"
-  alt="A sequence diagram showing how to integrate with Agenta as    a proxy"
-/>
-
diff --git a/docs/docs/reference/sdk/01-configuration-management.mdx b/docs/docs/reference/sdk/01-configuration-management.mdx
index 977e3963c6..1fe59a2128 100644
--- a/docs/docs/reference/sdk/01-configuration-management.mdx
+++ b/docs/docs/reference/sdk/01-configuration-management.mdx
@@ -325,7 +325,7 @@ All configuration data must be nested under the `prompt` key.
 ```
 
 :::note
-You can use the `PromptTemplate` class in the SDK to create and validate a prompt configuration. See [how to commit a variant](/prompt-engineering/prompt-management/prompt-management-sdk#create-a-new-variant).
+You can use the `PromptTemplate` class in the SDK to create and validate a prompt configuration. See [how to commit a variant](/prompt-engineering/managing-prompts-programatically/create-and-commit#create-a-new-variant).
 :::
 
 ## Available Environments
diff --git a/docs/docs/tutorials/cookbooks/02-observability_langchain.mdx b/docs/docs/tutorials/cookbooks/02-observability_langchain.mdx
index 00299edbea..5a0bc75291 100644
--- a/docs/docs/tutorials/cookbooks/02-observability_langchain.mdx
+++ b/docs/docs/tutorials/cookbooks/02-observability_langchain.mdx
@@ -1,6 +1,6 @@
 ---
 title: "Tracing and Observability for LangChain with Agenta"
-sidebar_label: LangChain
+sidebar_label: Tracing for LangChain
 description: Learn how to instrument LangChain traces with Agenta for enhanced LLM observability. This guide covers setup, configuration, and best practices for monitoring LLM applications using LangChain and OpenAI models.
 ---
 
@@ -104,7 +104,7 @@ prompt_template = ChatPromptTemplate([
 llm = ChatOpenAI(model="gpt-4o-mini")
 
 loader = WebBaseLoader(
-    web_paths=("https://docs.agenta.ai/prompt-management/prompt-management-sdk",),
+    web_paths=("https://docs.agenta.ai/prompt-engineering/managing-prompts-programatically/create-and-commit",),
     bs_kwargs=dict(
         parse_only=bs4.SoupStrainer('article')  # Only parse the core
     ),
diff --git a/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx b/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx
index 4088f94235..61ea9a179c 100644
--- a/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx
+++ b/docs/docs/tutorials/cookbooks/RAG-QA-docs.mdx
@@ -18,7 +18,7 @@ At the end, we will have:
 - A **playground** for testing different embeddings, adjusting top_k values (number of context chunks to include), and experimenting with various prompts and models
 - **LLM-as-a-judge** and **RAG context relevancy** evaluations for our Q&A application
 - **Observability** with Agenta to debug and monitor our application
-- A **deployment** that we can either [directly invoke](/prompt-engineering/prompt-management/proxy-calls) **or** [fetch the configuration](/reference/sdk/configuration-management#get_from_registry) to run elsewhere
+- A **deployment** that we can either [directly invoke](/prompt-engineering/integrating-prompts/proxy-calls) **or** [fetch the configuration](/reference/sdk/configuration-management#get_from_registry) to run elsewhere
 
 You can try our playground by creating a free account at [https://cloud.agenta.ai](https://cloud.agenta.ai) and opening the demo.
 
@@ -536,7 +536,7 @@ To ensure our assistant provides accurate and relevant answers, we'll use evalua
 1. RAG Relevancy Evaluator: Measures how relevant the assistant's answers are with respect to the retrieved context.
 2. LLM-as-a-Judge Evaluator: Rates the quality of the assistant's responses.
 
-For the first, we use the RAG Relevancy evaluator as described in [Agenta's evaluation documentation](/evaluation/evaluators/rag-evaluators).
+For the first, we use the RAG Relevancy evaluator as described in [Agenta's evaluation documentation](/evaluation/configure-evaluators/rag-evaluators).
 
 **Configuration:**
 
@@ -555,7 +555,7 @@ You can use the evaluator playground to configure the evaluator and identify the
   loading="lazy"
 />
 
-We set and test an LLM-as-a-Judge evaluator to rate the quality of the assistant's responses the same way. More details on setting up LLM-as-a-Judge evaluators can be found [here](/evaluation/evaluators/llm-as-a-judge).
+We set and test an LLM-as-a-Judge evaluator to rate the quality of the assistant's responses the same way. More details on setting up LLM-as-a-Judge evaluators can be found [here](/evaluation/configure-evaluators/llm-as-a-judge).
 
 ## Deploying the assistant
 
@@ -563,7 +563,7 @@ After iterating through various prompts and parameters and evaluating their perf
 
 Simply click the `Deploy` button in the playground to accomplish this.
 
-Agenta provides us with [two endpoints](/prompt-engineering/prompt-management/how-to-integrate-with-agenta) to interact with our deployed application:
+Agenta provides us with [two endpoints](/prompt-engineering/integrating-prompts/integrating-with-agenta) to interact with our deployed application:
 
 - The first allows us to directly invoke the deployed application with the production configuration.
 - The second allows us to fetch the deployed configuration as a JSON and use it in our self-deployed application.
diff --git a/docs/docs/tutorials/sdk/manage-prompts-with-SDK.mdx b/docs/docs/tutorials/sdk/manage-prompts-with-SDK.mdx
index e7c7117ad1..b9316454b4 100644
--- a/docs/docs/tutorials/sdk/manage-prompts-with-SDK.mdx
+++ b/docs/docs/tutorials/sdk/manage-prompts-with-SDK.mdx
@@ -18,7 +18,7 @@ In this tutorial, we'll use the Agenta SDK to create a new prompt, commit change
 Before we begin, let's quickly review how Agenta versions prompts:
 
 Agenta follows a structure similar to **git** for prompt versioning. Instead of having one commit history, it uses **multiple branches (called variants)** where changes can be committed, and **environments** where these changes can be deployed (and used in your application). (You can read more about why we chose this approach [here](/concepts/concepts#motivation)).
-You can find more about how prompt versioning works in the [concepts page](/concepts/concepts).
+You can find more about how prompt versioning works in the [concepts page](/prompt-engineering/concepts).
 
 The workflow for deploying a change to production that we'll follow in this tutorial is:
 
diff --git a/docs/docs/tutorials/videos/creating-test-sets-from-production-data.md b/docs/docs/tutorials/videos/creating-test-sets-from-production-data.md
index 2c56ba11f7..3289570b7c 100644
--- a/docs/docs/tutorials/videos/creating-test-sets-from-production-data.md
+++ b/docs/docs/tutorials/videos/creating-test-sets-from-production-data.md
@@ -82,7 +82,7 @@ Once you have a test set, you can use it in several ways:
    - Compare your application's output against ground truth answers
    - Measure performance across different variants
 
-For more information on evaluations, see our [Evaluation documentation](/evaluation/overview).
+For more information on evaluations, see our [Evaluation documentation](/evaluation/concepts).
 
 ## Test Set Best Practices
 
@@ -102,8 +102,8 @@ For more information on evaluations, see our [Evaluation documentation](/evaluat
 
 Even with just inputs (no ground truth), you can evaluate your application using:
 
-1. **[Human evaluation](/evaluation/human_evaluation)**: Have people review the outputs for quality
-2. **[LLM as a judge](/evaluation/evaluators/llm-as-a-judge)**: Use a prompt that assesses outputs based on criteria like relevance or accuracy
+1. **[Human evaluation](/evaluation/human-evaluation/quick-start)**: Have people review the outputs for quality
+2. **[LLM as a judge](/evaluation/configure-evaluators/llm-as-a-judge)**: Use a prompt that assesses outputs based on criteria like relevance or accuracy
 
 Adding ground truth expands your evaluation options, allowing you to:
 - Compare outputs against expected answers
@@ -112,6 +112,6 @@ Adding ground truth expands your evaluation options, allowing you to:
 
 ## Related Resources
 
-- [Creating Test Sets](/evaluation/create-test-sets)
-- [Configuring Evaluators](/evaluation/configure-evaluators)
-- [Running Evaluations](/evaluation/no-code-evaluation)
+- [Creating Test Sets](/evaluation/managing-test-sets/upload-csv)
+- [Configuring Evaluators](/evaluation/configure-evaluators/overview)
+- [Running Evaluations](/evaluation/evaluation-from-ui/running-evaluations)
diff --git a/docs/docusaurus.config.ts b/docs/docusaurus.config.ts
index acc6417bfc..c748b139ed 100644
--- a/docs/docusaurus.config.ts
+++ b/docs/docusaurus.config.ts
@@ -308,7 +308,7 @@ const config: Config = {
           },
           {
             from: "/prompt-management/overview",
-            to: "/prompt-engineering/overview",
+            to: "/prompt-engineering/concepts",
           },
           {
             from: "/prompt-management/quick-start",
@@ -316,27 +316,27 @@ const config: Config = {
           },
           {
             from: "/prompt-management/prompt-management-sdk",
-            to: "/prompt-engineering/prompt-management/prompt-management-sdk",
+            to: "/prompt-engineering/managing-prompts-programatically/create-and-commit",
           },
           {
             from: "/prompt-management/adding-custom-providers",
-            to: "/prompt-engineering/playground/adding-custom-providers",
+            to: "/prompt-engineering/playground/custom-providers",
           },
           {
             from: "/prompt-management/using-the-playground",
-            to: "/prompt-engineering/playground/using-the-playground",
+            to: "/prompt-engineering/playground/using-playground",
           },
           {
             from: "/prompt-management/integration/how-to-integrate-with-agenta",
-            to: "/prompt-engineering/prompt-management/how-to-integrate-with-agenta",
+            to: "/prompt-engineering/integrating-prompts/integrating-with-agenta",
           },
           {
             from: "/prompt-management/integration/fetch-prompts",
-            to: "/prompt-engineering/prompt-management/how-to-integrate-with-agenta",
+            to: "/prompt-engineering/integrating-prompts/fetch-prompt-programatically",
           },
           {
             from: "/prompt-management/integration/proxy-calls",
-            to: "/prompt-engineering/prompt-management/proxy-calls",
+            to: "/prompt-engineering/integrating-prompts/proxy-calls",
           },
           {
             from: "/self-host/host-locally",
@@ -353,6 +353,161 @@ const config: Config = {
           {
             from: "/self-host/applying-schema-migration",
             to: "/self-host/upgrading",
+          },
+          // Prompt Engineering restructure redirects
+          {
+            from: "/prompt-engineering/overview",
+            to: "/prompt-engineering/concepts",
+          },
+          {
+            from: "/prompt-engineering/prompt-management/how-to-integrate-with-agenta",
+            to: "/prompt-engineering/integrating-prompts/integrating-with-agenta",
+          },
+          {
+            from: "/prompt-engineering/prompt-management/prompt-management-sdk",
+            to: "/prompt-engineering/managing-prompts-programatically/create-and-commit",
+          },
+          {
+            from: "/prompt-engineering/prompt-management/proxy-calls",
+            to: "/prompt-engineering/integrating-prompts/proxy-calls",
+          },
+          {
+            from: "/prompt-engineering/playground/using-the-playground",
+            to: "/prompt-engineering/playground/using-playground",
+          },
+          {
+            from: "/prompt-engineering/playground/adding-custom-providers",
+            to: "/prompt-engineering/playground/custom-providers",
+          },
+          // Evaluation restructure redirects
+          {
+            from: "/evaluation/create-test-sets",
+            to: "/evaluation/managing-test-sets/upload-csv",
+          },
+          {
+            from: "/evaluation/no-code-evaluation",
+            to: "/evaluation/evaluation-from-ui/running-evaluations",
+          },
+          {
+            from: "/evaluation/sdk-evaluation",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/configure-evaluators",
+            to: "/evaluation/configure-evaluators/overview",
+          },
+          {
+            from: "/evaluation/human_evaluation",
+            to: "/evaluation/human-evaluation/quick-start",
+          },
+          {
+            from: "/evaluation/annotate-api",
+            to: "/observability/trace-with-python-sdk/annotate-traces",
+          },
+          {
+            from: "/evaluation/evaluators/classification-entiry-extraction",
+            to: "/evaluation/configure-evaluators/classification-entity-extraction",
+          },
+          {
+            from: "/evaluation/evaluators/pattern-matching",
+            to: "/evaluation/configure-evaluators/regex-evaluator",
+          },
+          {
+            from: "/evaluation/configure-evaluators/pattern-matching",
+            to: "/evaluation/configure-evaluators/regex-evaluator",
+          },
+          {
+            from: "/evaluation/evaluators/semantic-similarity",
+            to: "/evaluation/configure-evaluators/semantic-similarity",
+          },
+          {
+            from: "/evaluation/evaluators/llm-as-a-judge",
+            to: "/evaluation/configure-evaluators/llm-as-a-judge",
+          },
+          {
+            from: "/evaluation/evaluators/rag-evaluators",
+            to: "/evaluation/configure-evaluators/rag-evaluators",
+          },
+          {
+            from: "/evaluation/evaluators/custom-evaluator",
+            to: "/evaluation/configure-evaluators/custom-evaluator",
+          },
+          {
+            from: "/evaluation/evaluators/webhook-evaluator",
+            to: "/evaluation/configure-evaluators/webhook-evaluator",
+          },
+          {
+            from: "/evaluation/quick-start-ui",
+            to: "/evaluation/evaluation-from-ui/quick-start",
+          },
+          {
+            from: "/evaluation/quick-start-sdk",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/overview",
+            to: "/evaluation/concepts",
+          },
+          {
+            from: "/evaluation/evaluation-from-sdk/quick-start",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/evaluation-from-sdk/setup-configuration",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/evaluation-from-sdk/managing-test-sets",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/evaluation-from-sdk/configuring-evaluators",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/evaluation-from-sdk/running-evaluations",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          {
+            from: "/evaluation/evaluation-from-sdk/viewing-results",
+            to: "/tutorials/sdk/evaluate-with-SDK",
+          },
+          // Observability restructure redirects
+          {
+            from: "/observability/observability-sdk",
+            to: "/observability/trace-with-python-sdk/setup-tracing",
+          },
+          {
+            from: "/observability/opentelemetry",
+            to: "/observability/trace-with-opentelemetry/distributed-tracing",
+          },
+          {
+            from: "/observability/otel-semconv",
+            to: "/observability/trace-with-opentelemetry/semantic-conventions",
+          },
+          {
+            from: "/observability/overview",
+            to: "/observability/concepts",
+          },
+          {
+            from: "/observability/quickstart",
+            to: "/observability/quickstart-python",
+          },
+          {
+            from: "/observability/trace-with-opentelemetry/setup-tracing",
+            to: "/observability/trace-with-opentelemetry/getting-started",
+          },
+          {
+            from: "/observability/using-the-ui/filtering-traces",
+            to: "/observability/concepts",
+          },
+          {
+            from: "/observability/concepts/semantic-conventions",
+            to: "/observability/trace-with-opentelemetry/semantic-conventions",
+          },
+          {
+            from: "/reference/api",
+            to: "/reference/api/category",
           }
         ],
         createRedirects(existingPath) {
@@ -387,4 +542,3 @@ const config: Config = {
 export default async function createConfig() {
   return config;
 }
-
diff --git a/docs/sidebars.ts b/docs/sidebars.ts
index 8d345d8e31..88a12d2078 100644
--- a/docs/sidebars.ts
+++ b/docs/sidebars.ts
@@ -19,40 +19,46 @@ const sidebars: SidebarsConfig = {
       label: "Prompt Engineering",
       ...CATEGORY_UTILITIES,
       items: [{ type: "autogenerated", dirName: "prompt-engineering" },
-              {type: "category",
-                collapsed: true,
-                collapsible: true,
-                label: "Tutorials",
-                items: [ "tutorials/sdk/manage-prompts-with-SDK"]}],
+      {
+        type: "category",
+        collapsed: true,
+        collapsible: true,
+        label: "Tutorials",
+        items: ["tutorials/sdk/manage-prompts-with-SDK"]
+      }],
     },
     {
       label: "Evaluation",
       ...CATEGORY_UTILITIES,
       items: [{ type: "autogenerated", dirName: "evaluation" },
-              {type: "category",
-                collapsed: false,
-                collapsible: true,
-                label: "Tutorials",
-                items: [ "tutorials/cookbooks/capture-user-feedback",
-                          "tutorials/sdk/evaluate-with-SDK"]}],
+      {
+        type: "category",
+        collapsed: true,
+        collapsible: true,
+        label: "Tutorials",
+        items: ["tutorials/cookbooks/capture-user-feedback",
+          "tutorials/sdk/evaluate-with-SDK"]
+      }],
     },
     {
       label: "Observability",
       ...CATEGORY_UTILITIES,
       items: [{ type: "autogenerated", dirName: "observability" },
-              {type: "category",
-              collapsed: true,
-              collapsible: true,
-              label: "Tutorials",
-              items: [ "tutorials/cookbooks/capture-user-feedback",
-                        "tutorials/cookbooks/observability_langchain"]}],
+      {
+        type: "category",
+        collapsed: true,
+        collapsible: true,
+        label: "Tutorials",
+        items: ["tutorials/cookbooks/capture-user-feedback",
+          "tutorials/cookbooks/observability_langchain"]
+      }],
     },
 
     {
       label: "Custom Workflows",
       ...CATEGORY_UTILITIES,
       items: [{ type: "autogenerated", dirName: "custom-workflows" }
-              ],
+      ],
     },
     {
       label: "Concepts",
diff --git a/docs/src/components/CARD_ICON_USAGE.md b/docs/src/components/CARD_ICON_USAGE.md
new file mode 100644
index 0000000000..fa42c216e0
--- /dev/null
+++ b/docs/src/components/CARD_ICON_USAGE.md
@@ -0,0 +1,156 @@
+# Custom Card Icon Usage Guide
+
+This guide explains how to use custom icons with DocCard components in the documentation.
+
+## Quick Start
+
+Import the CustomDocCard component in your MDX files:
+
+```mdx
+import CustomDocCard from '@site/src/components/CustomDocCard';
+```
+
+## Usage Examples
+
+### 1. Default Card (Standard Arrow Icon)
+```mdx
+import DocCard from '@theme/DocCard';
+
+<DocCard item={{
+  type: 'link',
+  href: '/docs/getting-started',
+  label: 'Getting Started',
+  description: 'Learn the basics'
+}} />
+```
+
+### 2. Card with Emoji Icon
+```mdx
+import CustomDocCard from '@site/src/components/CustomDocCard';
+
+<CustomDocCard
+  item={{
+    type: 'link',
+    href: '/docs/getting-started',
+    label: 'Getting Started',
+    description: 'Learn the basics'
+  }}
+  icon="🚀"
+/>
+```
+
+### 3. Card with Image/SVG Icon
+```mdx
+import CustomDocCard from '@site/src/components/CustomDocCard';
+
+<CustomDocCard
+  item={{
+    type: 'link',
+    href: '/docs/api',
+    label: 'API Reference',
+    description: 'Explore our API'
+  }}
+  imagePath="/img/icons/api.svg"
+/>
+```
+
+### 4. Card Without Icon
+```mdx
+import CustomDocCard from '@site/src/components/CustomDocCard';
+
+<CustomDocCard
+  item={{
+    type: 'link',
+    href: '/docs/faq',
+    label: 'FAQ',
+    description: 'Frequently asked questions'
+  }}
+  noIcon={true}
+/>
+```
+
+## Using with DocCardList
+
+For auto-generated card lists, you'll need to use a custom wrapper:
+
+```mdx
+import { useCurrentSidebarCategory } from '@docusaurus/theme-common';
+import CustomDocCard from '@site/src/components/CustomDocCard';
+
+export function CustomCardList({ icons = {} }) {
+  const category = useCurrentSidebarCategory();
+
+  return (
+    <div className="row">
+      {category.items.map((item, index) => (
+        <article key={index} className="col col--6 margin-bottom--lg">
+          <CustomDocCard
+            item={item}
+            icon={icons[item.docId]}
+          />
+        </article>
+      ))}
+    </div>
+  );
+}
+
+<CustomCardList icons={{
+  'getting-started': '🚀',
+  'api-reference': '📚',
+  'tutorials': '🎓'
+}} />
+```
+
+## Direct HTML Approach (Alternative)
+
+If you prefer not to use the component, you can add custom classes directly to regular DocCard:
+
+### Using data attribute for emoji:
+```mdx
+<div data-card-icon="🎯">
+  <DocCard item={{...}} />
+</div>
+```
+
+### Using className for no icon:
+```mdx
+<div className="no-icon">
+  <DocCard item={{...}} />
+</div>
+```
+
+## Supported Icon Types
+
+| Type | Method | Example |
+|------|--------|---------|
+| Emoji | `icon` prop | `icon="🚀"` |
+| Unicode | `icon` prop | `icon="★"` |
+| SVG File | `imagePath` prop | `imagePath="/img/icon.svg"` |
+| PNG/JPG | `imagePath` prop | `imagePath="/img/icon.png"` |
+| None | `noIcon` prop | `noIcon={true}` |
+
+## Icon Best Practices
+
+1. **Emoji Size**: Emojis are automatically sized at 24px
+2. **Image Icons**: Should be square (24x24px recommended) for best results
+3. **SVG Icons**: Preferred for crisp rendering at any resolution
+4. **Consistency**: Use similar icon styles across related cards
+5. **Accessibility**: Icons are decorative - ensure card titles are descriptive
+
+## CSS Classes Reference
+
+- `.custom-icon` - Applied when using emoji/text icons
+- `.icon-img` - Applied when using image/SVG icons
+- `.no-icon` - Applied when hiding the default icon
+- `[data-card-icon]` - Attribute used to pass emoji/text content
+
+## Troubleshooting
+
+**Icons not showing?**
+- Clear browser cache (Ctrl+Shift+R / Cmd+Shift+R)
+- Verify image paths are correct (relative to /static folder)
+- Check that CSS custom.css has been updated
+
+**Default arrow still showing?**
+- Ensure you're using `CustomDocCard` component or proper class names
+- Verify the `noIcon` prop is set to `true`
diff --git a/docs/src/components/CustomCardExample.mdx b/docs/src/components/CustomCardExample.mdx
new file mode 100644
index 0000000000..fc696bb2e7
--- /dev/null
+++ b/docs/src/components/CustomCardExample.mdx
@@ -0,0 +1,109 @@
+---
+title: Custom Card Icons Example
+description: Example page showing different card icon options
+---
+
+import CustomDocCard from '@site/src/components/CustomDocCard';
+import DocCard from '@theme/DocCard';
+
+# Custom Card Icons Example
+
+This page demonstrates the different ways to use custom icons with DocCard components.
+
+## Default Card (Standard Arrow)
+
+<DocCard item={{
+  type: 'link',
+  href: '/docs/',
+  label: 'Documentation Home',
+  description: 'Standard card with default arrow icon'
+}} />
+
+## Card with Emoji Icon
+
+<CustomDocCard
+  item={{
+    type: 'link',
+    href: '/docs/',
+    label: 'Getting Started',
+    description: 'Card with a rocket emoji icon'
+  }}
+  icon="🚀"
+/>
+
+## Card with Different Emoji
+
+<div className="row">
+  <div className="col col--6 margin-bottom--lg">
+    <CustomDocCard
+      item={{
+        type: 'link',
+        href: '/docs/',
+        label: 'API Reference',
+        description: 'Explore our API endpoints'
+      }}
+      icon="📚"
+    />
+  </div>
+  <div className="col col--6 margin-bottom--lg">
+    <CustomDocCard
+      item={{
+        type: 'link',
+        href: '/docs/',
+        label: 'Tutorials',
+        description: 'Step-by-step guides'
+      }}
+      icon="🎓"
+    />
+  </div>
+</div>
+
+## Card Without Icon
+
+<CustomDocCard
+  item={{
+    type: 'link',
+    href: '/docs/',
+    label: 'FAQ',
+    description: 'Frequently asked questions - no icon needed'
+  }}
+  noIcon={true}
+/>
+
+## More Examples
+
+<div className="row">
+  <div className="col col--4 margin-bottom--lg">
+    <CustomDocCard
+      item={{
+        type: 'link',
+        href: '/docs/',
+        label: 'Quick Start',
+        description: 'Get up and running fast'
+      }}
+      icon="⚡"
+    />
+  </div>
+  <div className="col col--4 margin-bottom--lg">
+    <CustomDocCard
+      item={{
+        type: 'link',
+        href: '/docs/',
+        label: 'Configuration',
+        description: 'Configure your setup'
+      }}
+      icon="⚙️"
+    />
+  </div>
+  <div className="col col--4 margin-bottom--lg">
+    <CustomDocCard
+      item={{
+        type: 'link',
+        href: '/docs/',
+        label: 'Best Practices',
+        description: 'Learn the best approaches'
+      }}
+      icon="✨"
+    />
+  </div>
+</div>
diff --git a/docs/src/components/CustomDocCard.tsx b/docs/src/components/CustomDocCard.tsx
new file mode 100644
index 0000000000..a03abbf428
--- /dev/null
+++ b/docs/src/components/CustomDocCard.tsx
@@ -0,0 +1,67 @@
+import React from 'react';
+import DocCard from '@theme/DocCard';
+import type { PropSidebarItem } from '@docusaurus/plugin-content-docs';
+
+interface CustomDocCardProps {
+  item: PropSidebarItem;
+  icon?: string; // Emoji or text icon
+  imagePath?: string; // Path to SVG/image icon
+  noIcon?: boolean; // Set to true to hide the default icon
+}
+
+/**
+ * CustomDocCard - A wrapper around Docusaurus DocCard with custom icon support
+ *
+ * Usage examples:
+ *
+ * 1. With emoji icon:
+ *    <CustomDocCard item={item} icon="🚀" />
+ *
+ * 2. With image/SVG icon:
+ *    <CustomDocCard item={item} imagePath="/img/icons/rocket.svg" />
+ *
+ * 3. Without icon:
+ *    <CustomDocCard item={item} noIcon={true} />
+ *
+ * 4. Default (standard arrow):
+ *    <CustomDocCard item={item} />
+ */
+export default function CustomDocCard({
+  item,
+  icon,
+  imagePath,
+  noIcon
+}: CustomDocCardProps) {
+  const getClassName = () => {
+    if (noIcon) return 'no-icon';
+    if (imagePath) return 'icon-img';
+    if (icon) return 'custom-icon';
+    return '';
+  };
+
+  const getStyle = () => {
+    if (imagePath) {
+      return {
+        '--card-icon-image': `url(${imagePath})`,
+      } as React.CSSProperties;
+    }
+    return {};
+  };
+
+  // For image icons, we need to apply the background via inline style
+  const cardProps: any = {
+    item,
+  };
+
+  return (
+    <div
+      className={getClassName()}
+      style={{
+        ...(icon && { '--card-icon': `"${icon}"` }),
+        ...(imagePath && { '--card-icon-bg': `url(${imagePath})` })
+      } as React.CSSProperties}
+    >
+      <DocCard {...cardProps} />
+    </div>
+  );
+}
diff --git a/docs/src/components/GitHubExampleButton.module.css b/docs/src/components/GitHubExampleButton.module.css
new file mode 100644
index 0000000000..8ffab5fa25
--- /dev/null
+++ b/docs/src/components/GitHubExampleButton.module.css
@@ -0,0 +1,72 @@
+.githubButton {
+    display: table !important;
+    width: 100% !important;
+    height: 48px !important;
+    background-color: var(--ifm-background-surface-color) !important;
+    color: var(--ifm-color-content) !important;
+    text-decoration: none !important;
+    border-radius: 8px !important;
+    border: 1px solid var(--ifm-color-emphasis-300) !important;
+    font-weight: 500 !important;
+    font-size: 16px !important;
+    cursor: pointer !important;
+    transition: all 0.2s ease !important;
+    box-shadow: 0 1px 2px rgba(0, 0, 0, 0.04) !important;
+    box-sizing: border-box !important;
+    position: relative !important;
+}
+
+.githubButton:hover {
+    border-color: var(--ifm-color-primary) !important;
+    box-shadow: 0 4px 12px rgba(0, 0, 0, 0.08) !important;
+    transform: translateY(-1px) !important;
+    text-decoration: none !important;
+}
+
+.githubButton:visited {
+    color: var(--ifm-color-content) !important;
+    text-decoration: none !important;
+}
+
+.githubButton:active {
+    text-decoration: none !important;
+}
+
+.logo {
+    position: absolute !important;
+    left: 16px !important;
+    top: 50% !important;
+    transform: translateY(-50%) !important;
+    width: 24px !important;
+    height: 24px !important;
+    display: block !important;
+}
+
+.text {
+    display: block !important;
+    text-align: center !important;
+    text-decoration: none !important;
+    line-height: 48px !important;
+    margin: 0 !important;
+    padding: 0 !important;
+    white-space: nowrap !important;
+    font-family: inherit !important;
+    font-size: 16px !important;
+    font-weight: 500 !important;
+    color: inherit !important;
+    width: 100% !important;
+    height: 48px !important;
+    position: absolute !important;
+    top: 0 !important;
+    left: 0 !important;
+}
+
+.arrow {
+    position: absolute !important;
+    right: 16px !important;
+    top: 50% !important;
+    transform: translateY(-50%) !important;
+    width: 16px !important;
+    height: 16px !important;
+    display: block !important;
+}
\ No newline at end of file
diff --git a/docs/src/components/GitHubExampleButton.tsx b/docs/src/components/GitHubExampleButton.tsx
new file mode 100644
index 0000000000..11e70ca8f0
--- /dev/null
+++ b/docs/src/components/GitHubExampleButton.tsx
@@ -0,0 +1,51 @@
+import React from 'react';
+import styles from './GitHubExampleButton.module.css';
+
+interface GitHubExampleButtonProps {
+    examplePath: string;
+    children?: React.ReactNode;
+}
+
+const GitHubExampleButton: React.FC<GitHubExampleButtonProps> = ({ examplePath, children }) => {
+    const baseUrl = 'https://github.com/Agenta-AI/agenta/tree/main';
+    const githubUrl = `${baseUrl}/${examplePath}`;
+
+    return (
+        <div className="margin-bottom--lg">
+            <a
+                href={githubUrl}
+                target="_blank"
+                rel="noopener noreferrer"
+                className={styles.githubButton}
+            >
+                {/* GitHub Logo */}
+                <svg
+                    viewBox="0 0 24 24"
+                    fill="currentColor"
+                    className={styles.logo}
+                >
+                    <path d="M12 0C5.37 0 0 5.37 0 12c0 5.31 3.435 9.795 8.205 11.385.6.105.825-.255.825-.57 0-.285-.015-1.23-.015-2.235-3.015.555-3.795-.735-4.035-1.41-.135-.345-.72-1.41-1.23-1.695-.42-.225-1.02-.78-.015-.795.945-.015 1.62.87 1.845 1.23 1.08 1.815 2.805 1.305 3.495.99.105-.78.42-1.305.765-1.605-2.67-.3-5.46-1.335-5.46-5.925 0-1.305.465-2.385 1.23-3.225-.12-.3-.54-1.53.12-3.18 0 0 1.005-.315 3.3 1.23.96-.27 1.98-.405 3-.405s2.04.135 3 .405c2.295-1.56 3.3-1.23 3.3-1.23.66 1.65.24 2.88.12 3.18.765.84 1.23 1.905 1.23 3.225 0 4.605-2.805 5.625-5.475 5.925.435.375.81 1.095.81 2.22 0 1.605-.015 2.895-.015 3.3 0 .315.225.69.825.57A12.02 12.02 0 0024 12c0-6.63-5.37-12-12-12z" />
+                </svg>
+
+                {/* Text - centered in the remaining space */}
+                <span className={styles.text}>
+                    {children || 'View Example on GitHub'}
+                </span>
+
+                {/* Arrow */}
+                <svg
+                    viewBox="0 0 24 24"
+                    fill="none"
+                    stroke="currentColor"
+                    strokeWidth="2"
+                    className={styles.arrow}
+                >
+                    <polyline points="9,18 15,12 9,6"></polyline>
+                </svg>
+            </a>
+        </div>
+    );
+};
+
+export default GitHubExampleButton;
+
diff --git a/docs/src/css/custom.css b/docs/src/css/custom.css
index c0a8f9b363..2ee23fd496 100644
--- a/docs/src/css/custom.css
+++ b/docs/src/css/custom.css
@@ -7,15 +7,48 @@
 
 /* You can override the default Infima variables here. */
 :root {
-  --ifm-color-primary: #6b7280;
-  --ifm-color-primary-dark: hsl(209, 13%, 40%);
-  --ifm-color-primary-darker: hsl(210, 21%, 28%);
-  --ifm-color-primary-darkest: hsl(211, 37%, 17%);
-  --ifm-color-primary-light: hsl(209, 14%, 55%);
-  --ifm-color-primary-lighter: hsl(210, 18%, 78%);
-  --ifm-color-primary-lightest: hsl(210, 24%, 87%);
-  --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.274);
+  /* Brand colors - Light mode */
+  --brand-main-black: #000;
+  --brand-heading-black: #242424;
+  --brand-secondary-text: #676771;
+  --brand-border-grey: #DCD5D1;
+  --brand-grey-bg: #413F40;
+  --brand-white-bg: #FFF;
+
+  /* Primary color mapping */
+  --ifm-color-primary: #676771;
+  --ifm-color-primary-dark: #676771;
+  --ifm-color-primary-darker: #242424;
+  --ifm-color-primary-darkest: #242424;
+  --ifm-color-primary-light: #676771;
+  --ifm-color-primary-lighter: #DCD5D1;
+  --ifm-color-primary-lightest: #DCD5D1;
+
+  /* Background colors */
+  --ifm-background-color: #FFF;
+  --ifm-navbar-background-color: #FFF;
+
+  /* Border colors */
+  --ifm-toc-border-color: #DCD5D1;
+  --ifm-color-emphasis-200: #DCD5D1;
+
+  /* Text colors */
+  --ifm-font-color-base: #000;
+  --ifm-heading-color: #242424;
+  --ifm-link-color: #242424;
+
+  /* Code highlighting */
+  --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.05);
+
+  /* Typography */
   --ifm-font-weight-normal: 400;
+  --ifm-font-weight-semibold: 500;
+  --ifm-font-weight-bold: 600;
+  --ifm-heading-font-weight: 600;
+  --ifm-h1-font-weight: 600;
+  --ifm-h2-font-weight: 500;
+  --ifm-h3-font-weight: 500;
+  --ifm-h4-font-weight: 500;
   --ifm-navbar-height: 3.9rem;
   --ifm-link-color: var(--ifm-color-primary-darker);
   --ifm-heading-color: var(--ifm-color-primary-darkest);
@@ -23,58 +56,360 @@
   --ifm-heading-h2-font-size: 26px;
   --ifm-heading-h3-font-size: 20px;
   --ifm-heading-h4-font-size: 16px;
+
+  /* Spacing scale */
+  --spacing-xs: 4px;
+  --spacing-sm: 8px;
+  --spacing-md: 16px;
+  --spacing-lg: 24px;
+  --spacing-xl: 40px;
+
+  /* Border radius scale */
+  --border-radius-sm: 4px;
+  --border-radius-md: 6px;
+  --border-radius-lg: 8px;
+
+  /* UI element colors - Light mode */
+  --navbar-link-color: var(--brand-secondary-text);
+  --navbar-link-active-color: var(--brand-heading-black);
+  --sidebar-link-color: var(--brand-secondary-text);
+  --sidebar-link-active-color: var(--brand-heading-black);
+  --toc-link-color: var(--brand-secondary-text);
 }
 
 /* For readability concerns, you should choose a lighter palette in dark mode. */
 [data-theme="dark"] {
-  --ifm-color-primary: #758391;
-  --ifm-color-primary-dark: #bdc7d1;
-  --ifm-color-primary-darker: #d6dee6;
-  --ifm-color-primary-darkest: #eaeff5;
-  --ifm-color-primary-light: #758391;
-  --ifm-color-primary-lighter: #586673;
-  --ifm-color-primary-lightest: #394857;
-  --docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.3);
+  /* Brand colors - Dark mode */
+  --brand-black-bg: #1E1C1D;
+  --brand-main-text-dark: #FFF;
+  --brand-secondary-text-dark: #787777;
+
+  /* Primary color mapping for dark mode */
+  --ifm-color-primary: #787777;
+  --ifm-color-primary-dark: #787777;
+  --ifm-color-primary-darker: #FFF;
+  --ifm-color-primary-darkest: #FFF;
+  --ifm-color-primary-light: #787777;
+  --ifm-color-primary-lighter: #413F40;
+  --ifm-color-primary-lightest: #2A2A2A;
+
+  /* Background colors */
+  --ifm-background-color: #1E1C1D;
+  --ifm-navbar-background-color: #1E1C1D;
+  --ifm-background-surface-color: #1E1C1D;
+
+  /* Border colors */
+  --ifm-toc-border-color: #413F40;
+  --ifm-color-emphasis-200: #413F40;
+
+  /* Text colors */
+  --ifm-font-color-base: #D0D0D0;
+  --ifm-heading-color: #FFF;
+  --ifm-link-color: #FFF;
+
+  /* Code highlighting */
+  --docusaurus-highlighted-code-line-bg: rgba(255, 255, 255, 0.05);
+
+  /* UI element colors - Dark mode */
+  --navbar-link-color: #C0C0C0;
+  --navbar-link-active-color: var(--brand-main-text-dark);
+  --sidebar-link-color: #C0C0C0;
+  --sidebar-link-active-color: var(--brand-main-text-dark);
+  --toc-link-color: #C0C0C0;
 }
 
 *:not(code *):not(pre) {
   font-family: "Inter", sans-serif !important;
 }
+
 body {
-  color: var(--ifm-color-primary-darker);
+  color: var(--ifm-font-color-base);
   font-size: 16px;
+  line-height: 1.6;
 }
+
+/* Lighter bold text weight for better readability */
+strong,
+b {
+  font-weight: var(--ifm-font-weight-bold) !important;
+}
+
 code *,
 pre {
   font-family: "IBM Plex Mono", monospace !important;
 }
-h1 {
+
+/* Code block typography improvements */
+pre {
+  font-size: 14px;
+  line-height: 1.7;
+  padding: 1.25rem;
+}
+
+pre code {
+  font-size: 14px;
+  letter-spacing: -0.01em;
+}
+
+/* Inline code */
+code:not(pre code) {
+  font-size: 0.9em;
+  font-weight: 500;
+  letter-spacing: -0.02em;
+  padding: 0.2em 0.4em;
+  background-color: rgba(0, 0, 0, 0.05);
+  border-radius: 3px;
+}
+
+[data-theme="dark"] code:not(pre code) {
+  background-color: rgba(255, 255, 255, 0.1);
+}
+
+/* Clean 2D code block styling - remove shadows, add border */
+pre,
+div[class*="codeBlock"],
+div[class*="prism"] {
+  box-shadow: none !important;
+  border: 0.5px solid rgba(203, 203, 203, 0.3) !important;
+  border-radius: var(--border-radius-md);
+  background-color: #FAFAFA !important;
+}
+
+[data-theme="dark"] pre,
+[data-theme="dark"] div[class*="codeBlock"],
+[data-theme="dark"] div[class*="prism"] {
+  border: 0.5px solid rgba(255, 255, 255, 0.1) !important;
+  background-color: #1A1A1A !important;
+}
+
+/* Custom refined syntax highlighting - Light mode */
+[data-theme="light"] .token.comment,
+[data-theme="light"] .token.prolog,
+[data-theme="light"] .token.doctype,
+[data-theme="light"] .token.cdata {
+  color: #6A737D;
+  font-style: italic;
+}
+
+[data-theme="light"] .token.punctuation {
+  color: #5E6687;
+}
+
+[data-theme="light"] .token.property,
+[data-theme="light"] .token.tag,
+[data-theme="light"] .token.boolean,
+[data-theme="light"] .token.number,
+[data-theme="light"] .token.constant,
+[data-theme="light"] .token.symbol,
+[data-theme="light"] .token.deleted {
+  color: #0184BC;
+}
+
+[data-theme="light"] .token.selector,
+[data-theme="light"] .token.attr-name,
+[data-theme="light"] .token.string,
+[data-theme="light"] .token.char,
+[data-theme="light"] .token.builtin,
+[data-theme="light"] .token.inserted {
+  color: #0C7C59;
+}
+
+[data-theme="light"] .token.operator,
+[data-theme="light"] .token.entity,
+[data-theme="light"] .token.url,
+[data-theme="light"] .language-css .token.string,
+[data-theme="light"] .style .token.string {
+  color: #A626A4;
+}
+
+[data-theme="light"] .token.atrule,
+[data-theme="light"] .token.attr-value,
+[data-theme="light"] .token.keyword {
+  color: #D73A49;
+}
+
+[data-theme="light"] .token.function,
+[data-theme="light"] .token.class-name {
+  color: #6F42C1;
+}
+
+[data-theme="light"] .token.regex,
+[data-theme="light"] .token.important,
+[data-theme="light"] .token.variable {
+  color: #E36209;
+}
+
+/* Custom refined syntax highlighting - Dark mode */
+[data-theme="dark"] .token.comment,
+[data-theme="dark"] .token.prolog,
+[data-theme="dark"] .token.doctype,
+[data-theme="dark"] .token.cdata {
+  color: #8B949E;
+  font-style: italic;
+}
+
+[data-theme="dark"] .token.punctuation {
+  color: #C9D1D9;
+}
+
+[data-theme="dark"] .token.property,
+[data-theme="dark"] .token.tag,
+[data-theme="dark"] .token.boolean,
+[data-theme="dark"] .token.number,
+[data-theme="dark"] .token.constant,
+[data-theme="dark"] .token.symbol,
+[data-theme="dark"] .token.deleted {
+  color: #79C0FF;
+}
+
+[data-theme="dark"] .token.selector,
+[data-theme="dark"] .token.attr-name,
+[data-theme="dark"] .token.string,
+[data-theme="dark"] .token.char,
+[data-theme="dark"] .token.builtin,
+[data-theme="dark"] .token.inserted {
+  color: #7EE787;
+}
+
+[data-theme="dark"] .token.operator,
+[data-theme="dark"] .token.entity,
+[data-theme="dark"] .token.url,
+[data-theme="dark"] .language-css .token.string,
+[data-theme="dark"] .style .token.string {
+  color: #D2A8FF;
+}
+
+[data-theme="dark"] .token.atrule,
+[data-theme="dark"] .token.attr-value,
+[data-theme="dark"] .token.keyword {
+  color: #FF7B72;
+}
+
+[data-theme="dark"] .token.function,
+[data-theme="dark"] .token.class-name {
+  color: #D2A8FF;
+}
+
+[data-theme="dark"] .token.regex,
+[data-theme="dark"] .token.important,
+[data-theme="dark"] .token.variable {
+  color: #FFA657;
+}
+
+/* Clean 2D admonition/callout styling - remove left accent and shadows */
+.theme-admonition,
+div[class*="admonition"] {
+  box-shadow: none !important;
+  border-left: none !important;
+  border-radius: var(--border-radius-md);
+}
+
+/* Note admonition - blue */
+.theme-admonition.alert--secondary,
+.theme-admonition-note {
+  border: 1px solid rgba(84, 104, 255, 0.4) !important;
+}
+
+[data-theme="dark"] .theme-admonition.alert--secondary,
+[data-theme="dark"] .theme-admonition-note {
+  border: 1px solid rgba(84, 104, 255, 0.5) !important;
+}
+
+/* Info admonition - cyan */
+.theme-admonition.alert--info,
+.theme-admonition-info {
+  border: 1px solid rgba(84, 199, 236, 0.5) !important;
+}
+
+[data-theme="dark"] .theme-admonition.alert--info,
+[data-theme="dark"] .theme-admonition-info {
+  border: 1px solid rgba(84, 199, 236, 0.6) !important;
+}
+
+/* Tip admonition - green */
+.theme-admonition.alert--success,
+.theme-admonition-tip {
+  border: 1px solid rgba(0, 184, 148, 0.5) !important;
+}
+
+[data-theme="dark"] .theme-admonition.alert--success,
+[data-theme="dark"] .theme-admonition-tip {
+  border: 1px solid rgba(0, 184, 148, 0.6) !important;
+}
+
+/* Caution/Warning admonition - orange */
+.theme-admonition.alert--warning,
+.theme-admonition-caution,
+.theme-admonition-warning {
+  border: 1px solid rgba(255, 165, 0, 0.6) !important;
+}
+
+[data-theme="dark"] .theme-admonition.alert--warning,
+[data-theme="dark"] .theme-admonition-caution,
+[data-theme="dark"] .theme-admonition-warning {
+  border: 1px solid rgba(255, 165, 0, 0.7) !important;
+}
+
+/* Danger admonition - red */
+.theme-admonition.alert--danger,
+.theme-admonition-danger {
+  border: 1px solid rgba(220, 38, 38, 0.6) !important;
+}
+
+[data-theme="dark"] .theme-admonition.alert--danger,
+[data-theme="dark"] .theme-admonition-danger {
+  border: 1px solid rgba(239, 68, 68, 0.7) !important;
+}
+
+h1:not([class*="openapi"]) {
   font-size: var(--ifm-heading-h1-font-size) !important;
-  line-height: 40px;
+  line-height: 1.25;
+  font-weight: var(--ifm-h1-font-weight) !important;
+  letter-spacing: -0.01em;
 }
-h2 {
+
+h2:not([class*="openapi"]) {
   font-size: var(--ifm-heading-h2-font-size) !important;
-  line-height: 34px;
+  line-height: 1.3;
+  font-weight: var(--ifm-h2-font-weight) !important;
+  letter-spacing: -0.01em;
 }
-h3 {
+
+
+h3:not([class*="openapi"]) {
   font-size: var(--ifm-heading-h3-font-size) !important;
-  line-height: 28px;
+  line-height: 1.35;
+  font-weight: var(--ifm-h3-font-weight) !important;
 }
-h4 {
+
+h4:not([class*="openapi"]) {
   font-size: var(--ifm-heading-h4-font-size) !important;
-  line-height: 28px;
+  line-height: 1.5;
+  font-weight: var(--ifm-h4-font-weight) !important;
 }
+
 /* link underline style */
 article a {
   text-decoration: underline !important;
-  text-underline-offset: 4px;
-  text-decoration-thickness: 1px;
-  text-decoration-color: var(--ifm-color-primary);
+  text-underline-offset: 3px !important;
+  text-decoration-thickness: 1px !important;
+  text-decoration-color: rgba(0, 0, 0, 0.22) !important;
+  transition: text-decoration-color 200ms ease, text-underline-offset 200ms ease !important;
+}
+
+[data-theme="dark"] article a {
+  text-decoration-color: rgba(255, 255, 255, 0.25) !important;
 }
+
 article a:hover {
-  text-decoration-thickness: 2px !important;
-  transition-duration: 400ms;
+  text-decoration-color: rgba(0, 0, 0, 0.45) !important;
+  text-underline-offset: 4px !important;
 }
+
+[data-theme="dark"] article a:hover {
+  text-decoration-color: rgba(255, 255, 255, 0.55) !important;
+}
+
 .cardContainer_node_modules-\@docusaurus-theme-classic-lib-theme-DocCard-styles-module,
 .title_node_modules-\@docusaurus-theme-classic-lib-theme-BlogPostItem-Header-Title-styles-module a,
 .cardContainer_fWXF,
@@ -83,29 +418,43 @@ article a:hover {
   text-decoration: none !important;
 }
 
+/* Remove underline from card container links */
+a[class*="cardContainer"],
+a[class*="cardContainer"]:hover {
+  text-decoration: none !important;
+}
+
 /** Navbar **/
+
 @media (min-width: 1851px) {
+
   .main-wrapper,
   .navbar__inner {
     width: 1550px;
     margin: auto;
   }
 }
+
 @media (min-width: 1350px) and (max-width: 1850px) {
+
   .main-wrapper,
   .navbar__inner {
     width: 95%;
     margin: auto;
   }
 }
+
 @media (min-width: 1024px) and (max-width: 1350px) {
+
   .main-wrapper,
   .navbar__inner {
     width: 95%;
     margin: auto;
   }
 }
+
 @media (max-width: 1024px) {
+
   .main-wrapper,
   .navbar__inner {
     width: 100%;
@@ -116,77 +465,87 @@ article a:hover {
 .navbar__link {
   font-size: 14px;
   padding: 0px 8px;
-  color: var(--ifm-color-primary);
-  font-weight: 400;
-}
-[data-theme="dark"] .navbar__link {
-  color: var(--ifm-color-primary-darker);
+  color: var(--navbar-link-color);
+  font-weight: var(--ifm-font-weight-normal);
+  letter-spacing: 0.01em;
 }
+
 .navbar__link--active {
-  color: var(--ifm-color-primary-darkest);
-  font-weight: 500;
-}
-[data-theme="dark"] .navbar__link--active {
-  color: var(--ifm-color-primary-darkest);
+  color: var(--navbar-link-active-color);
+  font-weight: var(--ifm-font-weight-semibold);
 }
+
 .nav_github_icons,
 .nav_slack_icons {
   width: 22px;
   height: 22px;
 }
+
 .navbar__item:has(.nav_github_icons),
 .navbar__item:has(.nav_slack_icons) {
   display: flex;
   justify-content: center;
   align-items: center;
 }
+
 .theme-icon {
   color: var(--ifm-color-primary-darkest);
 }
+
 .navbar__item:has(.nav_slack_icons) {
   margin-right: 3px;
 }
+
 .nav_primary_button {
   background-color: var(--ifm-color-primary-darkest);
   color: var(--ifm-color-primary-lightest);
   margin-right: 25px;
   padding: 8px 16px;
 }
+
 [data-theme="dark"] .nav_primary_button {
   background-color: var(--ifm-color-primary-lightest);
   color: var(--ifm-color-primary-darkest);
 }
+
 .nav_secondary_button {
   background-color: transparent;
   color: var(--ifm-color-primary);
   margin-left: 10px;
   padding: 8px 12px;
 }
+
 [data-theme="dark"] .nav_secondary_button {
-  color: var(--ifm-color-primary-dark);
+  color: var(--navbar-link-color);
 }
+
 .nav_primary_button,
 .nav_secondary_button {
   font-size: 14px;
-  border-radius: 6px;
+  border-radius: var(--border-radius-md);
   outline: none;
   border: none;
   cursor: pointer;
 }
+
 @media (max-width: 995px) {
+
   .navbar__item:has(.nav_github_icons),
   .navbar__item:has(.nav_slack_icons) {
     display: none;
   }
+
   .navbar-sidebar .menu__link:has(.nav_github_icons),
   .navbar-sidebar .menu__link:has(.nav_slack_icons) {
     position: absolute;
     top: -48px;
     left: 172px;
   }
+
   .navbar-sidebar .menu__link:has(.nav_slack_icons) {
     left: 215px;
   }
+
   .navbar-sidebar .menu__link:has(.nav_secondary_button),
   .navbar-sidebar .menu__link:has(.nav_primary_button) {
     padding: 0px;
@@ -196,15 +555,18 @@ article a:hover {
     left: 50%;
     transform: translateX(-50%);
   }
+
   .navbar-sidebar .menu__link:has(.nav_secondary_button) {
     bottom: 65px;
   }
+
   .navbar-sidebar .menu__link:has(.nav_secondary_button):hover,
   .navbar-sidebar .menu__link:has(.nav_primary_button):hover,
   .navbar-sidebar .menu__link:has(.nav_github_icons):hover,
   .navbar-sidebar .menu__link:has(.nav_slack_icons):hover {
     background: transparent;
   }
+
   .nav_primary_button,
   .nav_secondary_button {
     width: 100%;
@@ -217,41 +579,51 @@ article a:hover {
 [data-theme="dark"] .DocSearch-Button {
   background-color: #181818;
 }
+
 [data-theme="dark"] .DocSearch-Button:hover {
   background-color: #1b1b1d;
 }
+
 [data-theme="dark"] .DocSearch-Modal {
   background-color: #1b1b1d;
 }
+
 [data-theme="dark"] .DocSearch-Footer {
   background-color: #242526;
   opacity: 0.9;
 }
+
 [data-theme="dark"] .DocSearch-Form {
   background-color: #181818;
   box-shadow: none;
   border: 2px solid #3d4144;
 }
+
 .DocSearch-Button {
-  border-radius: 6px !important;
+  border-radius: var(--border-radius-md) !important;
   padding: 0 2px 0px 8px !important;
 }
+
 @media (min-width: 1350px) {
   .DocSearch-Button {
     width: 250px;
   }
 }
+
 @media (max-width: 768px) {
   .DocSearch-Button {
     padding: 0 8px 0px 8px !important;
   }
 }
+
 .DocSearch-Search-Icon {
   color: var(--ifm-color-primary) !important;
 }
+
 .DocSearch-Button-Placeholder {
   font-size: 14px !important;
 }
+
 .DocSearch-Button-Key {
   box-shadow: none !important;
   background: transparent !important;
@@ -265,68 +637,126 @@ article a:hover {
     margin-right: 40px;
   }
 }
+
 @media (min-width: 1024px) and (max-width: 1350px) {
   .theme-doc-sidebar-container {
     margin-right: 20px;
   }
 }
-li.sidebar-section-title > div.menu__list-item-collapsible:hover {
+
+li.sidebar-section-title>div.menu__list-item-collapsible:hover {
   background-color: transparent !important;
 }
-li.sidebar-section-title > div.menu__list-item-collapsible {
+
+li.sidebar-section-title>div.menu__list-item-collapsible {
   margin-top: 25px !important;
 }
-li.sidebar-section-title > div.menu__list-item-collapsible a {
-  font-weight: 600 !important;
+
+li.sidebar-section-title>div.menu__list-item-collapsible a {
+  font-weight: var(--ifm-font-weight-bold) !important;
 }
+
 .menu__link {
   font-size: 14px;
   line-height: 22px;
-  font-weight: 400;
-  color: var(--ifm-color-primary);
+  font-weight: var(--ifm-font-weight-normal);
+  color: var(--sidebar-link-color);
   padding: 4px;
+  letter-spacing: 0.01em;
 }
+
 .menu__link--active {
-  font-weight: 500;
-  color: var(--ifm-color-primary-darkest);
+  font-weight: var(--ifm-font-weight-semibold);
+  color: var(--sidebar-link-active-color);
 }
-[data-theme="dark"] .menu__link {
-  color: var(--ifm-color-primary-darker);
+
+/* Flexbox layout for all collapsible items */
+.menu__list-item-collapsible {
+  display: flex !important;
+  align-items: center !important;
+  width: 100% !important;
+  padding: 0 !important;
 }
-[data-theme="dark"] .menu__link--active {
-  color: var(--ifm-color-primary-darkest);
+
+/* Make links take available space */
+.menu__list-item-collapsible>.menu__link {
+  flex: 1 !important;
+  min-width: 0 !important;
+  padding-right: 0 !important;
 }
+
+/* Position separate caret buttons at the right - adjusted to match built-in carets */
+.menu__list-item-collapsible>.menu__caret {
+  margin-left: auto !important;
+  margin-right: 0px !important;
+  flex-shrink: 0 !important;
+  padding: 0 !important;
+}
+
+/* For links with built-in caret (using ::after) */
+.menu__link--sublist-caret {
+  display: flex !important;
+  justify-content: space-between !important;
+  align-items: center !important;
+  padding-right: 4px !important;
+}
+
 .menu__link--sublist-caret:after,
 .menu__caret:before {
-  margin: 0 10px;
+  margin-left: auto !important;
   background: var(--ifm-menu-link-sublist-icon) 50% / 1.4rem 1.4rem;
   height: 10px;
   opacity: 0.8;
+  flex-shrink: 0 !important;
 }
+
 .menu__list {
   padding-right: 5px;
 }
+
+/* Sidebar container - always hide border */
 .docSidebarContainer_node_modules-\@docusaurus-theme-classic-lib-theme-DocRoot-Layout-Sidebar-styles-module,
-.docSidebarContainer_YfHR {
+.docSidebarContainer_YfHR,
+.theme-doc-sidebar-container,
+div[class*="docSidebarContainer"] {
   border-right: none !important;
 }
 
 /**************************** Article section ****************************/
-.container > .row {
+.container>.row {
   justify-content: space-between;
 }
+
 .markdown img {
   margin-bottom: 24px;
 }
+
+/* Ensure IdealImage wrapper divs get bottom margin */
+.markdown div:has(img.medium-zoom-image) {
+  margin-bottom: 24px;
+}
+
+/* Target IdealImage wrapper divs - they have background-size in their inline styles */
+.markdown div[style*="background-size"] {
+  margin-bottom: 24px !important;
+}
+
+/* Override the negative margin on IdealImage img tags */
+.markdown img.medium-zoom-image {
+  margin-bottom: 0 !important;
+}
+
 .theme-doc-markdown h1:first-child,
 .title_node_modules-\@docusaurus-theme-classic-lib-theme-DocCategoryGeneratedIndexPage-styles-module,
 .title_kItE {
   padding-top: 40px;
 }
+
 .docItemCol_node_modules-docusaurus-theme-openapi-docs-lib-theme-ApiItem-Layout-styles-module {
   line-height: 24px;
 }
-.breadcrumbs__item--active > span {
+
+.breadcrumbs__item--active>span {
   padding: 3px 15px;
 }
 
@@ -334,21 +764,23 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
 .col--3 {
   --ifm-col-width: calc(2.8 / 12 * 100%);
 }
+
 @media (max-width: 995px) {
+
   .tocCollapsibleButton_node_modules-\@docusaurus-theme-classic-lib-theme-TOCCollapsible-CollapseButton-styles-module,
   .tocCollapsibleButtonExpanded_MG3E {
     color: var(--ifm-color-primary);
   }
 }
+
 .table-of-contents__link {
   font-size: 14px;
   line-height: 20px;
-  font-weight: 400;
-  color: var(--ifm-color-primary-light);
-}
-[data-theme="dark"] .table-of-contents__link {
-  color: var(--ifm-color-primary-dark);
+  font-weight: var(--ifm-font-weight-normal);
+  color: var(--toc-link-color);
+  letter-spacing: 0.01em;
 }
+
 .table-of-contents__left-border {
   border: none !important;
 }
@@ -357,49 +789,55 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
 [href="/reference/api/agenta-backend"],
 .list_node_modules-\@docusaurus-theme-classic-lib-theme-DocCategoryGeneratedIndexPage-styles-module article:first-child,
 .list_eTzJ article:first-child,
-.generatedIndexPage_node_modules-\@docusaurus-theme-classic-lib-theme-DocCategoryGeneratedIndexPage-styles-module
-  header
-  p,
+.generatedIndexPage_node_modules-\@docusaurus-theme-classic-lib-theme-DocCategoryGeneratedIndexPage-styles-module header p,
 .generatedIndexPage_vN6x header p {
   display: none;
 }
-.api-method > .menu__link {
+
+.api-method>.menu__link {
   align-items: center;
   justify-content: start;
 }
-.api-method > .menu__link::before {
+
+.api-method>.menu__link::before {
   width: 35px;
   height: 18px;
   font-size: 10px;
   line-height: 18px;
   text-transform: uppercase;
   font-weight: 600;
-  border-radius: 0.25rem;
+  border-radius: var(--border-radius-sm);
   margin-right: var(--ifm-spacing-horizontal);
   text-align: center;
   flex-shrink: 0;
   color: white;
 }
-.get > .menu__link::before {
+
+.get>.menu__link::before {
   content: "get";
   background-color: var(--ifm-color-primary);
 }
-.put > .menu__link::before {
+
+.put>.menu__link::before {
   content: "put";
   background-color: var(--openapi-code-blue);
 }
-.post > .menu__link::before {
+
+.post>.menu__link::before {
   content: "post";
   background-color: var(--openapi-code-green);
 }
-.delete > .menu__link::before {
+
+.delete>.menu__link::before {
   content: "del";
   background-color: var(--openapi-code-red);
 }
-.patch > .menu__link::before {
+
+.patch>.menu__link::before {
   content: "patch";
   background-color: var(--openapi-code-orange);
 }
+
 .openapi__method-endpoint-path,
 .openapi-markdown__details-summary-header-params,
 .openapi-markdown__details-summary-header-body {
@@ -420,6 +858,7 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
   margin-right: auto;
   margin-left: auto;
 }
+
 .title_node_modules-\@docusaurus-theme-classic-lib-theme-BlogPostItem-Header-Title-styles-module,
 .title_f1Hy {
   margin-top: 40px;
@@ -428,20 +867,25 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
   margin-right: auto;
   margin-left: auto;
 }
+
 /* center the changelog main header and hide its date */
 .changelog-main header .margin-vert--md {
   display: none;
 }
+
 .changelog-main header h1 {
   text-align: center;
 }
+
 .blog-post-page .col[class*="col--"] {
   flex: 0 !important;
   position: relative;
 }
+
 .blog-post-page .col--2 {
   display: none !important;
 }
+
 .blog-post-page .col {
   --ifm-col-width: 100% !important;
   flex: 0 0;
@@ -452,64 +896,79 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
   margin-right: auto;
   margin-left: auto;
 }
+
 .blog-post-page .row {
   display: block;
 }
+
 .changelog {
   width: 80%;
   margin-left: auto;
 }
+
 .changelog em {
   font-style: normal;
-  font-weight: 500;
+  font-weight: var(--ifm-font-weight-semibold);
   position: absolute;
   left: 15px;
   margin-top: -42px;
-  color: var(--ifm-color-primary-darker);
-}
-[data-theme="dark"] .changelog em {
-  color: var(--ifm-color-primary-light);
+  color: var(--sidebar-link-active-color);
 }
+
 .changelog hr {
   margin-top: 70px;
   width: 97%;
   position: absolute;
   left: 50%;
   transform: translateX(-50%);
-  background-color: var(--ifm-color-primary-lightest);
+  background-color: var(--brand-border-grey);
 }
+
+[data-theme="dark"] .changelog hr {
+  background-color: var(--brand-grey-bg);
+}
+
 .changelog h3 {
   margin-top: 160px;
 }
+
 .changelog h3:first-child {
   margin-top: 60px;
 }
+
 @media (max-width: 995px) {
+
   .title_node_modules-\@docusaurus-theme-classic-lib-theme-BlogPostItem-Header-Title-styles-module,
   .title_f1Hy {
     padding-left: 0px;
   }
+
   .changelog {
     width: 100%;
     margin-left: auto;
   }
+
   .changelog hr {
     margin-top: 50px;
     width: 96%;
   }
+
   .changelog em {
     font-weight: 500;
     position: absolute;
     left: auto;
     margin-top: -100px !important;
   }
+
   .changelog h3 {
     margin-top: 195px;
   }
+
   .changelog h3:first-child {
     margin-top: 110px;
   }
 }
+
 @media (max-width: 520px) {
   .changelog hr {
     margin: 50px auto 28%;
@@ -518,12 +977,14 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
     left: 0;
     transform: translateX(0);
   }
+
   .changelog em {
     font-weight: 500;
     position: absolute;
     left: auto;
     margin-top: -105px !important;
   }
+
   .changelog h3 {
     margin-top: 0px;
     height: auto;
@@ -545,13 +1006,16 @@ li.sidebar-section-title > div.menu__list-item-collapsible a {
 .medium-zoom-overlay {
   background-color: #0000009f !important;
 }
+
 [data-theme="dark"] .medium-zoom-overlay {
   background-color: #000000b4 !important;
 }
+
 .medium-zoom-image--opened,
 .medium-zoom-overlay {
   z-index: 9999999;
 }
+
 [data-theme="dark"] .medium-zoom-image--opened {
   border: 1px solid rgba(255, 255, 255, 0.096);
 }
@@ -612,28 +1076,90 @@ div[class*='scrollbar'] {
 }
 
 [class*="cardTitle"] {
-    font-size: var(--ifm-heading-h4-font-size) !important;
-    font-weight: var(--ifm-font-weight-normal) !important;
-    margin-bottom: 8px;
-    font-variant-emoji: no-emoji !important;
+  font-size: var(--ifm-heading-h4-font-size) !important;
+  font-weight: var(--ifm-font-weight-normal) !important;
+  margin-bottom: 8px;
+  font-variant-emoji: no-emoji !important;
 }
 
 
 [class*="cardDescription"] {
   font-size: 14px !important;
-} 
+}
+
 a[class*="cardContainer"] {
   /* --ifm-link-color: var(--ifm-color-emphasis-100); */
   /* --ifm-link-hover-color: var(--ifm-color-emphasis-100); */
   --ifm-link-hover-decoration: none;
 
   box-shadow: 0 0 0 1px rgba(50, 50, 93, 0.01);
-  border: 1px solid var(--ifm-color-emphasis-200);
-  border-radius: 4px;
+  border: 1px solid var(--brand-border-grey);
+  border-radius: var(--border-radius-sm);
   transition: all var(--ifm-transition-fast) ease-in-out;
   transition-property: border, box-shadow;
   padding: 18px;
   font-variant-emoji: no-emoji !important;
+  position: relative;
+}
+
+[data-theme="dark"] a[class*="cardContainer"] {
+  border: 1px solid var(--brand-grey-bg);
+}
+
+/* Custom icon support for cards using CSS custom properties */
+/* Usage: <CustomDocCard item={{...}} icon="🚀" /> */
+/* Or for no icon: <CustomDocCard item={{...}} noIcon={true} /> */
+
+/* First, hide the text content of the title that contains the emoji */
+.custom-icon [class*="cardTitle"],
+.no-icon [class*="cardTitle"],
+.icon-img [class*="cardTitle"] {
+  font-size: 0 !important;
+}
+
+/* Then restore the text size for actual title text */
+.custom-icon [class*="cardTitle"]::after,
+.no-icon [class*="cardTitle"]::after,
+.icon-img [class*="cardTitle"]::after {
+  content: attr(title) !important;
+  font-size: var(--ifm-heading-h4-font-size) !important;
+  display: inline !important;
+}
+
+/* Replace the default document emoji with custom icon */
+.custom-icon [class*="cardTitle"]::before {
+  content: var(--card-icon) !important;
+  font-size: 18px !important;
+  margin-right: 8px !important;
+  display: inline-block !important;
+  line-height: 1 !important;
+  flex-shrink: 0 !important;
+  vertical-align: text-bottom !important;
+  position: relative !important;
+  top: 2px !important;
+}
+
+/* For no-icon cards, hide the emoji completely by not showing before pseudo */
+.no-icon [class*="cardTitle"]::before {
+  content: '' !important;
+  display: none !important;
+}
+
+/* For SVG/image icons - replace the default emoji with background image */
+.icon-img [class*="cardTitle"]::before {
+  content: '' !important;
+  width: 18px !important;
+  height: 18px !important;
+  margin-right: 8px !important;
+  display: inline-block !important;
+  background-size: contain !important;
+  background-repeat: no-repeat !important;
+  background-position: center !important;
+  background-image: var(--card-icon-bg) !important;
+  flex-shrink: 0 !important;
+  vertical-align: text-bottom !important;
+  position: relative !important;
+  top: 2px !important;
 }
 
 
@@ -642,7 +1168,7 @@ a[class*="cardContainer"] {
   padding: 18px !important;
 }
 
-details {
+details:not([class*="openapi"]) {
   background-color: var(--background-color) !important;
   border: 1px solid var(--ifm-color-emphasis-200) !important;
   --docusaurus-details-decoration-color: grey !important;
diff --git a/docs/static/examples/sentiment140_first50.csv b/docs/static/examples/sentiment140_first50.csv
new file mode 100644
index 0000000000..c8321a8598
--- /dev/null
+++ b/docs/static/examples/sentiment140_first50.csv
@@ -0,0 +1,51 @@
+correct_answer,tweet
+negative,"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D"
+negative,is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!
+negative,@Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds
+negative,my whole body feels itchy and like its on fire 
+negative,"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there. "
+negative,@Kwesidei not the whole crew 
+negative,Need a hug 
+negative,"@LOLTrish hey  long time no see! Yes.. Rains a bit ,only a bit  LOL , I'm fine thanks , how's you ?"
+negative,@Tatiana_K nope they didn't have it 
+negative,@twittera que me muera ? 
+negative,spring break in plain city... it's snowing 
+negative,I just re-pierced my ears 
+negative,@caregiving I couldn't bear to watch it.  And I thought the UA loss was embarrassing . . . . .
+negative,"@octolinz16 It it counts, idk why I did either. you never talk to me anymore "
+negative,"@smarrison i would've been the first, but i didn't have a gun.    not really though, zac snyder's just a doucheclown."
+negative,@iamjazzyfizzle I wish I got to watch it with you!! I miss you and @iamlilnicki  how was the premiere?!
+negative,Hollis' death scene will hurt me severely to watch on film  wry is directors cut not out now?
+negative,about to file taxes 
+negative,@LettyA ahh ive always wanted to see rent  love the soundtrack!!
+negative,@FakerPattyPattz Oh dear. Were you drinking out of the forgotten table drinks? 
+negative,@alydesigns i was out most of the day so didn't get much done 
+negative,"one of my friend called me, and asked to meet with her at Mid Valley today...but i've no time *sigh* "
+negative,@angry_barista I baked you a cake but I ated it 
+negative,this week is not going as i had hoped 
+negative,blagh class at 8 tomorrow 
+negative,I hate when I have to call and wake people up 
+negative,Just going to cry myself to sleep after watching Marley and Me.  
+negative,im sad now  Miss.Lilly
+negative,ooooh.... LOL  that leslie.... and ok I won't do it again so leslie won't  get mad again 
+negative,Meh... Almost Lover is the exception... this track gets me depressed every time. 
+negative,some1 hacked my account on aim  now i have to make a new one
+negative,@alielayus I want to go to promote GEAR AND GROOVE but unfornately no ride there  I may b going to the one in Anaheim in May though
+negative,thought sleeping in was an option tomorrow but realizing that it now is not. evaluations in the morning and work in the afternoon! 
+negative,@julieebaby awe i love you too!!!! 1 am here  i miss you
+negative,@HumpNinja I cry my asian eyes to sleep at night 
+negative,ok I'm sick and spent an hour sitting in the shower cause I was too sick to stand and held back the puke like a champ. BED now 
+negative,@cocomix04 ill tell ya the story later  not a good day and ill be workin for like three more hours...
+negative,@MissXu sorry! bed time came here (GMT+1)   http://is.gd/fNge
+negative,@fleurylis I don't either. Its depressing. I don't think I even want to know about the kids in suitcases. 
+negative,Bed. Class 8-12. Work 12-3. Gym 3-5 or 6. Then class 6-10. Another day that's gonna fly by. I miss my girlfriend 
+negative,really don't feel like getting up today... but got to study to for tomorrows practical exam... 
+negative,He's the reason for the teardrops on my guitar the only one who has enough of me to break my heart 
+negative,"Sad, sad, sad. I don't know why but I hate this feeling  I wanna sleep and I still can't!"
+negative,@JonathanRKnight Awww I soo wish I was there to see you finally comfortable! Im sad that I missed it 
+negative,Falling asleep. Just heard about that Tracy girl's body being found. How sad  My heart breaks for that family.
+negative,@Viennah Yay! I'm happy for you with your job! But that also means less time for me and you... 
+negative,"Just checked my user timeline on my blackberry, it looks like the twanking is still happening  Are ppl still having probs w/ BGs and UIDs?"
+negative,Oh man...was ironing @jeancjumbe's fave top to wear to a meeting. Burnt it 
+negative,is strangely sad about LiLo and SamRo breaking up. 
+negative,@tea oh! i'm so sorry  i didn't think about that before retweeting.
diff --git a/docs/static/images/dark-complete-transparent-CROPPED.png b/docs/static/images/dark-complete-transparent-CROPPED.png
index df1ed7e261..bc73ad84e2 100644
Binary files a/docs/static/images/dark-complete-transparent-CROPPED.png and b/docs/static/images/dark-complete-transparent-CROPPED.png differ
diff --git a/docs/static/images/evaluation/comparing-evaluations.gif b/docs/static/images/evaluation/comparing-evaluations.gif
deleted file mode 100644
index 87cfd4b4f6..0000000000
Binary files a/docs/static/images/evaluation/comparing-evaluations.gif and /dev/null differ
diff --git a/docs/static/images/evaluation/comparing-evaluations.png b/docs/static/images/evaluation/comparing-evaluations.png
new file mode 100644
index 0000000000..6031a4cd6b
Binary files /dev/null and b/docs/static/images/evaluation/comparing-evaluations.png differ
diff --git a/docs/static/images/evaluation/comparison-view-configuration.png b/docs/static/images/evaluation/comparison-view-configuration.png
new file mode 100644
index 0000000000..ae9c692e9e
Binary files /dev/null and b/docs/static/images/evaluation/comparison-view-configuration.png differ
diff --git a/docs/static/images/evaluation/comparison-view-drawer.png b/docs/static/images/evaluation/comparison-view-drawer.png
new file mode 100644
index 0000000000..3bd4b21c40
Binary files /dev/null and b/docs/static/images/evaluation/comparison-view-drawer.png differ
diff --git a/docs/static/images/evaluation/comparison-view-testset.png b/docs/static/images/evaluation/comparison-view-testset.png
new file mode 100644
index 0000000000..e45b8eb4f2
Binary files /dev/null and b/docs/static/images/evaluation/comparison-view-testset.png differ
diff --git a/docs/static/images/evaluation/configure-evaluators-1.png b/docs/static/images/evaluation/configure-evaluators-1.png
index fc61f18144..b202b1e5ec 100644
Binary files a/docs/static/images/evaluation/configure-evaluators-1.png and b/docs/static/images/evaluation/configure-evaluators-1.png differ
diff --git a/docs/static/images/evaluation/configure-evaluators-3.png b/docs/static/images/evaluation/configure-evaluators-3.png
index 5e36236191..5c8f8001ac 100644
Binary files a/docs/static/images/evaluation/configure-evaluators-3.png and b/docs/static/images/evaluation/configure-evaluators-3.png differ
diff --git a/docs/static/images/evaluation/detailed-evaluation-drawer.png b/docs/static/images/evaluation/detailed-evaluation-drawer.png
new file mode 100644
index 0000000000..e4c131ca30
Binary files /dev/null and b/docs/static/images/evaluation/detailed-evaluation-drawer.png differ
diff --git a/docs/static/images/evaluation/detailed-evaluation-results.png b/docs/static/images/evaluation/detailed-evaluation-results.png
index e72e4715bc..0c40be89a2 100644
Binary files a/docs/static/images/evaluation/detailed-evaluation-results.png and b/docs/static/images/evaluation/detailed-evaluation-results.png differ
diff --git a/docs/static/images/evaluation/evaluate-sdk.png b/docs/static/images/evaluation/evaluate-sdk.png
deleted file mode 100644
index dc91bc0379..0000000000
Binary files a/docs/static/images/evaluation/evaluate-sdk.png and /dev/null differ
diff --git a/docs/static/images/evaluation/evaluation-from-ui/01-evaluation-ui-prompt.png b/docs/static/images/evaluation/evaluation-from-ui/01-evaluation-ui-prompt.png
new file mode 100644
index 0000000000..151caefb2e
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-from-ui/01-evaluation-ui-prompt.png differ
diff --git a/docs/static/images/evaluation/evaluation-from-ui/02-running-evaluation.png b/docs/static/images/evaluation/evaluation-from-ui/02-running-evaluation.png
new file mode 100644
index 0000000000..28ff95d629
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-from-ui/02-running-evaluation.png differ
diff --git a/docs/static/images/evaluation/evaluation-from-ui/03-results-overview.png b/docs/static/images/evaluation/evaluation-from-ui/03-results-overview.png
new file mode 100644
index 0000000000..21828bab9d
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-from-ui/03-results-overview.png differ
diff --git a/docs/static/images/evaluation/evaluation-from-ui/04-results-testcase.png b/docs/static/images/evaluation/evaluation-from-ui/04-results-testcase.png
new file mode 100644
index 0000000000..7cacec545e
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-from-ui/04-results-testcase.png differ
diff --git a/docs/static/images/evaluation/evaluation-from-ui/05-results-testcase-drawer.png b/docs/static/images/evaluation/evaluation-from-ui/05-results-testcase-drawer.png
new file mode 100644
index 0000000000..21ae52e5b9
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-from-ui/05-results-testcase-drawer.png differ
diff --git a/docs/static/images/evaluation/evaluation-from-ui/06-comparison-view.png b/docs/static/images/evaluation/evaluation-from-ui/06-comparison-view.png
new file mode 100644
index 0000000000..b3abf9af9b
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-from-ui/06-comparison-view.png differ
diff --git a/docs/static/images/evaluation/evaluation-prompt-config.png b/docs/static/images/evaluation/evaluation-prompt-config.png
new file mode 100644
index 0000000000..6d94eaa57d
Binary files /dev/null and b/docs/static/images/evaluation/evaluation-prompt-config.png differ
diff --git a/docs/static/images/evaluation/evaluators-inout.png b/docs/static/images/evaluation/evaluators-inout.png
deleted file mode 100644
index 1cd8bc76ff..0000000000
Binary files a/docs/static/images/evaluation/evaluators-inout.png and /dev/null differ
diff --git a/docs/static/images/evaluation/new-evaluation-modal.png b/docs/static/images/evaluation/new-evaluation-modal.png
index cb805118e8..309ce88c90 100644
Binary files a/docs/static/images/evaluation/new-evaluation-modal.png and b/docs/static/images/evaluation/new-evaluation-modal.png differ
diff --git a/docs/static/images/evaluation/overview-results.png b/docs/static/images/evaluation/overview-results.png
new file mode 100644
index 0000000000..640bb31abc
Binary files /dev/null and b/docs/static/images/evaluation/overview-results.png differ
diff --git a/docs/static/images/evaluation/start-new-evaluation.png b/docs/static/images/evaluation/start-new-evaluation.png
index a26a72a045..5d77da863d 100644
Binary files a/docs/static/images/evaluation/start-new-evaluation.png and b/docs/static/images/evaluation/start-new-evaluation.png differ
diff --git a/docs/static/images/light-complete-transparent-CROPPED.png b/docs/static/images/light-complete-transparent-CROPPED.png
index 6be2e99e08..de9bbd9aca 100644
Binary files a/docs/static/images/light-complete-transparent-CROPPED.png and b/docs/static/images/light-complete-transparent-CROPPED.png differ
diff --git a/docs/static/images/observability/observability-mockup.png b/docs/static/images/observability/observability-mockup.png
new file mode 100644
index 0000000000..9df9e7c022
Binary files /dev/null and b/docs/static/images/observability/observability-mockup.png differ
diff --git a/docs/static/images/observability/observability_quickstart.png b/docs/static/images/observability/observability_quickstart.png
new file mode 100644
index 0000000000..128a1dee28
Binary files /dev/null and b/docs/static/images/observability/observability_quickstart.png differ
diff --git a/docs/static/images/prompt_management/deploy-api.gif b/docs/static/images/prompt_management/deploy-api.gif
index e7a367b9af..9438605d5e 100644
Binary files a/docs/static/images/prompt_management/deploy-api.gif and b/docs/static/images/prompt_management/deploy-api.gif differ
diff --git a/examples/jupyter/observability/analytics-api-tutorial.ipynb b/examples/jupyter/observability/analytics-api-tutorial.ipynb
new file mode 100644
index 0000000000..d5097dfbf6
--- /dev/null
+++ b/examples/jupyter/observability/analytics-api-tutorial.ipynb
@@ -0,0 +1,714 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Analytics API - Tutorial\n",
+    "\n",
+    "This tutorial demonstrates how to use the Agenta Analytics API to analyze LLM performance metrics. You'll learn how to:\n",
+    "\n",
+    "- Retrieve aggregated metrics over time\n",
+    "- Analyze costs, latency, and token usage\n",
+    "- Filter analytics by status and other attributes\n",
+    "- Track error trends and failure rates\n",
+    "- Compare performance across different time periods\n",
+    "\n",
+    "## What You'll Build\n",
+    "\n",
+    "We'll create analytics queries that:\n",
+    "1. Track daily LLM costs and spending trends\n",
+    "2. Monitor error rates and identify peak error times\n",
+    "3. Analyze token usage patterns\n",
+    "4. Compare performance metrics over time\n",
+    "5. Generate cost reports and visualizations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "Before using the API, you need your Agenta API key. You can create API keys from the Settings page in your Agenta workspace."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import requests\n",
+    "import json\n",
+    "from datetime import datetime, timedelta, timezone\n",
+    "from getpass import getpass\n",
+    "\n",
+    "# Configuration\n",
+    "AGENTA_HOST = os.getenv(\"AGENTA_HOST\", \"https://cloud.agenta.ai\")\n",
+    "api_key = os.getenv(\"AGENTA_API_KEY\")\n",
+    "if not api_key:\n",
+    "    api_key = getpass(\"Enter your Agenta API key: \")\n",
+    "    os.environ[\"AGENTA_API_KEY\"] = api_key\n",
+    "\n",
+    "# Setup base configuration\n",
+    "BASE_URL = f\"{AGENTA_HOST}/api/preview/tracing/spans/analytics\"\n",
+    "HEADERS = {\n",
+    "    \"Authorization\": f\"ApiKey {api_key}\",\n",
+    "    \"Content-Type\": \"application/json\"\n",
+    "}\n",
+    "\n",
+    "print(\"✅ Setup complete!\")\n",
+    "print(f\"API endpoint: {BASE_URL}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 1: Get Recent Metrics\n",
+    "\n",
+    "Let's start by retrieving metrics for the last 7 days with daily buckets. Each bucket contains aggregated metrics for all traces within that day."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get analytics for last 7 days with daily buckets\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=7)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 1440,  # 1440 minutes = daily buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "print(f\"📊 Found {data['count']} daily buckets\\n\")\n",
+    "\n",
+    "# Show all days with activity\n",
+    "for bucket in data['buckets']:\n",
+    "    if bucket['total']['count'] > 0:\n",
+    "        date = bucket['timestamp'][:10]\n",
+    "        print(f\"Date: {date}\")\n",
+    "        print(f\"  Traces: {bucket['total']['count']}\")\n",
+    "        print(f\"  Cost: ${bucket['total']['costs']:.4f}\")\n",
+    "        print(f\"  Tokens: {bucket['total']['tokens']:,.0f}\")\n",
+    "        print(f\"  Errors: {bucket['errors']['count']}\\n\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 2: Track Daily Costs\n",
+    "\n",
+    "Calculate total costs and generate summary statistics over a time period."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get daily metrics for last 30 days\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=30)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 1440,  # Daily buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "# Calculate totals\n",
+    "total_traces = sum(b['total']['count'] for b in data['buckets'])\n",
+    "total_cost = sum(b['total']['costs'] for b in data['buckets'])\n",
+    "total_tokens = sum(b['total']['tokens'] for b in data['buckets'])\n",
+    "total_errors = sum(b['errors']['count'] for b in data['buckets'])\n",
+    "\n",
+    "print(\"💰 Cost Summary (Last 30 Days)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(f\"Total Cost: ${total_cost:.2f}\")\n",
+    "print(f\"Total Requests: {total_traces:,}\")\n",
+    "if total_traces > 0:\n",
+    "    print(f\"Average Cost per Request: ${total_cost/total_traces:.6f}\")\n",
+    "    print(f\"Total Tokens: {total_tokens:,.0f}\")\n",
+    "    print(f\"Average Tokens per Request: {total_tokens/total_traces:.1f}\")\n",
+    "    print(f\"Error Rate: {(total_errors/total_traces)*100:.2f}%\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 3: Analyze Error Trends\n",
+    "\n",
+    "Monitor error rates over time to identify patterns and peak error times."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get hourly metrics for last 7 days\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=7)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 60,  # Hourly buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "print(\"🚨 Error Analysis\")\n",
+    "print(\"=\" * 50)\n",
+    "\n",
+    "# Find hours with high error rates\n",
+    "high_error_periods = []\n",
+    "for bucket in data['buckets']:\n",
+    "    if bucket['total']['count'] > 0:\n",
+    "        error_rate = (bucket['errors']['count'] / bucket['total']['count']) * 100\n",
+    "        if error_rate > 5:  # Flag periods with > 5% errors\n",
+    "            high_error_periods.append({\n",
+    "                'time': bucket['timestamp'],\n",
+    "                'error_rate': error_rate,\n",
+    "                'total': bucket['total']['count'],\n",
+    "                'errors': bucket['errors']['count']\n",
+    "            })\n",
+    "\n",
+    "if high_error_periods:\n",
+    "    print(f\"\\nFound {len(high_error_periods)} periods with high error rates (>5%):\\n\")\n",
+    "    for period in high_error_periods[:10]:  # Show top 10\n",
+    "        print(f\"  {period['time']}\")\n",
+    "        print(f\"    Error Rate: {period['error_rate']:.1f}%\")\n",
+    "        print(f\"    Total: {period['total']}, Errors: {period['errors']}\\n\")\n",
+    "else:\n",
+    "    print(\"✅ No high error rates detected in the last 7 days\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 4: Filter by Status Code\n",
+    "\n",
+    "Analyze only successful traces by filtering on status code."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get successful traces only\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=7)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 1440,  # Daily buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    },\n",
+    "    \"filter\": {\n",
+    "        \"conditions\": [\n",
+    "            {\n",
+    "                \"field\": \"status.code\",\n",
+    "                \"operator\": \"eq\",\n",
+    "                \"value\": \"STATUS_CODE_OK\"\n",
+    "            }\n",
+    "        ]\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "# Calculate success metrics\n",
+    "total_count = sum(b['total']['count'] for b in data['buckets'])\n",
+    "total_cost = sum(b['total']['costs'] for b in data['buckets'])\n",
+    "total_duration = sum(b['total']['duration'] for b in data['buckets'])\n",
+    "\n",
+    "print(\"✅ Successful Traces (Last 7 Days)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(f\"Count: {total_count:,}\")\n",
+    "print(f\"Total Cost: ${total_cost:.4f}\")\n",
+    "if total_count > 0:\n",
+    "    print(f\"Avg Duration: {total_duration/total_count:.0f}ms\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 5: Track Token Usage\n",
+    "\n",
+    "Monitor token consumption patterns over time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get daily token usage for last 7 days\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=7)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 1440,  # Daily buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "print(\"🎯 Token Usage Analysis\")\n",
+    "print(\"=\" * 50)\n",
+    "print(\"\\nDaily Token Usage:\\n\")\n",
+    "\n",
+    "for bucket in data['buckets']:\n",
+    "    if bucket['total']['count'] > 0:\n",
+    "        date = bucket['timestamp'][:10]\n",
+    "        avg_tokens = bucket['total']['tokens'] / bucket['total']['count']\n",
+    "        print(f\"  {date}: {bucket['total']['tokens']:>8,.0f} total ({avg_tokens:>6.0f} avg)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 6: Analyze Performance\n",
+    "\n",
+    "Track latency trends over time to identify performance changes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get hourly performance for last 24 hours\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=1)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 60,  # Hourly buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "print(\"⚡ Performance Analysis (Last 24 Hours)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(\"\\nHourly Average Latency:\\n\")\n",
+    "\n",
+    "latencies = []\n",
+    "for bucket in data['buckets']:\n",
+    "    if bucket['total']['count'] > 0:\n",
+    "        avg_duration = bucket['total']['duration'] / bucket['total']['count']\n",
+    "        latencies.append(avg_duration)\n",
+    "        hour = bucket['timestamp'][11:16]  # Extract HH:MM\n",
+    "        print(f\"  {hour}: {avg_duration:7.0f}ms\")\n",
+    "\n",
+    "if latencies:\n",
+    "    print(f\"\\n📈 Statistics:\")\n",
+    "    print(f\"  Min: {min(latencies):.0f}ms\")\n",
+    "    print(f\"  Max: {max(latencies):.0f}ms\")\n",
+    "    print(f\"  Avg: {sum(latencies)/len(latencies):.0f}ms\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 7: Generate Monthly Cost Report\n",
+    "\n",
+    "Create a comprehensive monthly report with cost breakdown and usage statistics."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get monthly metrics\n",
+    "newest = datetime.now(timezone.utc)\n",
+    "oldest = newest - timedelta(days=30)\n",
+    "\n",
+    "payload = {\n",
+    "    \"focus\": \"trace\",\n",
+    "    \"interval\": 1440,  # Daily buckets\n",
+    "    \"windowing\": {\n",
+    "        \"oldest\": oldest.isoformat(),\n",
+    "        \"newest\": newest.isoformat()\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "data = response.json()\n",
+    "\n",
+    "# Calculate totals\n",
+    "total_traces = sum(b['total']['count'] for b in data['buckets'])\n",
+    "total_cost = sum(b['total']['costs'] for b in data['buckets'])\n",
+    "total_tokens = sum(b['total']['tokens'] for b in data['buckets'])\n",
+    "total_duration = sum(b['total']['duration'] for b in data['buckets'])\n",
+    "total_errors = sum(b['errors']['count'] for b in data['buckets'])\n",
+    "\n",
+    "print(\"📊 MONTHLY COST REPORT\")\n",
+    "print(\"=\" * 60)\n",
+    "print(f\"Period: {oldest.strftime('%Y-%m-%d')} to {newest.strftime('%Y-%m-%d')}\")\n",
+    "print(\"=\" * 60)\n",
+    "\n",
+    "print(\"\\n💰 Cost Summary:\")\n",
+    "print(f\"  Total Cost: ${total_cost:.2f}\")\n",
+    "if total_traces > 0:\n",
+    "    print(f\"  Average Cost per Request: ${total_cost/total_traces:.6f}\")\n",
+    "daily_cost = total_cost / 30\n",
+    "print(f\"  Average Daily Cost: ${daily_cost:.2f}\")\n",
+    "print(f\"  Projected Monthly Cost: ${daily_cost * 30:.2f}\")\n",
+    "\n",
+    "print(\"\\n📊 Usage Statistics:\")\n",
+    "print(f\"  Total Requests: {total_traces:,}\")\n",
+    "successful = total_traces - total_errors\n",
+    "print(f\"  Successful: {successful:,}\")\n",
+    "print(f\"  Failed: {total_errors:,}\")\n",
+    "if total_traces > 0:\n",
+    "    print(f\"  Failure Rate: {(total_errors/total_traces)*100:.2f}%\")\n",
+    "    print(f\"  Average Daily Requests: {total_traces/30:.0f}\")\n",
+    "\n",
+    "print(\"\\n🎯 Performance Metrics:\")\n",
+    "if total_traces > 0:\n",
+    "    print(f\"  Average Latency: {total_duration/total_traces:.0f}ms\")\n",
+    "print(f\"  Total Tokens: {total_tokens:,.0f}\")\n",
+    "if total_traces > 0:\n",
+    "    print(f\"  Average Tokens per Request: {total_tokens/total_traces:.1f}\")\n",
+    "    print(f\"  Average Daily Tokens: {total_tokens/30:,.0f}\")\n",
+    "\n",
+    "# Cost per 1K tokens\n",
+    "if total_tokens > 0:\n",
+    "    cost_per_1k = (total_cost / total_tokens) * 1000\n",
+    "    print(f\"  Cost per 1K Tokens: ${cost_per_1k:.4f}\")\n",
+    "\n",
+    "# Find most expensive days\n",
+    "print(\"\\n📅 Top 5 Most Expensive Days:\")\n",
+    "days_with_data = [(b['timestamp'][:10], b['total']['costs'], b['total']['count']) \n",
+    "                  for b in data['buckets'] if b['total']['count'] > 0]\n",
+    "sorted_days = sorted(days_with_data, key=lambda x: x[1], reverse=True)\n",
+    "for i, (date, cost, count) in enumerate(sorted_days[:5], 1):\n",
+    "    print(f\"  {i}. {date}: ${cost:.4f} ({count} requests)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 8: Compare Week-over-Week Performance\n",
+    "\n",
+    "Analyze how metrics change from one week to the next."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Helper function to get weekly metrics\n",
+    "def get_weekly_metrics(weeks_ago=0):\n",
+    "    newest = datetime.now(timezone.utc) - timedelta(weeks=weeks_ago)\n",
+    "    oldest = newest - timedelta(days=7)\n",
+    "    \n",
+    "    payload = {\n",
+    "        \"focus\": \"trace\",\n",
+    "        \"interval\": 10080,  # Weekly bucket\n",
+    "        \"windowing\": {\n",
+    "            \"oldest\": oldest.isoformat(),\n",
+    "            \"newest\": newest.isoformat()\n",
+    "        }\n",
+    "    }\n",
+    "    \n",
+    "    response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "    data = response.json()\n",
+    "    \n",
+    "    if data['buckets']:\n",
+    "        bucket = data['buckets'][0]\n",
+    "        return {\n",
+    "            'count': bucket['total']['count'],\n",
+    "            'costs': bucket['total']['costs'],\n",
+    "            'duration': bucket['total']['duration'],\n",
+    "            'tokens': bucket['total']['tokens'],\n",
+    "            'errors': bucket['errors']['count']\n",
+    "        }\n",
+    "    return None\n",
+    "\n",
+    "this_week = get_weekly_metrics(0)\n",
+    "last_week = get_weekly_metrics(1)\n",
+    "\n",
+    "def calc_change(current, previous):\n",
+    "    if previous == 0:\n",
+    "        return \"N/A\"\n",
+    "    change = ((current - previous) / previous) * 100\n",
+    "    symbol = \"📈\" if change > 0 else \"📉\" if change < 0 else \"➡️\"\n",
+    "    return f\"{symbol} {change:+.1f}%\"\n",
+    "\n",
+    "print(\"📊 Week-over-Week Comparison\")\n",
+    "print(\"=\" * 60)\n",
+    "\n",
+    "if this_week and last_week:\n",
+    "    print(\"\\n💰 Cost:\")\n",
+    "    print(f\"  Last Week: ${last_week['costs']:.4f}\")\n",
+    "    print(f\"  This Week: ${this_week['costs']:.4f}\")\n",
+    "    print(f\"  Change: {calc_change(this_week['costs'], last_week['costs'])}\")\n",
+    "\n",
+    "    print(\"\\n📊 Volume:\")\n",
+    "    print(f\"  Last Week: {last_week['count']:,} requests\")\n",
+    "    print(f\"  This Week: {this_week['count']:,} requests\")\n",
+    "    print(f\"  Change: {calc_change(this_week['count'], last_week['count'])}\")\n",
+    "\n",
+    "    print(\"\\n⚡ Performance:\")\n",
+    "    last_avg = last_week['duration'] / last_week['count'] if last_week['count'] > 0 else 0\n",
+    "    this_avg = this_week['duration'] / this_week['count'] if this_week['count'] > 0 else 0\n",
+    "    print(f\"  Last Week: {last_avg:.0f}ms\")\n",
+    "    print(f\"  This Week: {this_avg:.0f}ms\")\n",
+    "    print(f\"  Change: {calc_change(this_avg, last_avg)}\")\n",
+    "\n",
+    "    print(\"\\n🚨 Error Rate:\")\n",
+    "    last_err_rate = (last_week['errors'] / last_week['count'] * 100) if last_week['count'] > 0 else 0\n",
+    "    this_err_rate = (this_week['errors'] / this_week['count'] * 100) if this_week['count'] > 0 else 0\n",
+    "    print(f\"  Last Week: {last_err_rate:.2f}%\")\n",
+    "    print(f\"  This Week: {this_err_rate:.2f}%\")\n",
+    "    print(f\"  Change: {calc_change(this_err_rate, last_err_rate)}\")\n",
+    "else:\n",
+    "    print(\"\\n⚠️ Not enough data for comparison\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 9: Create Visualizations\n",
+    "\n",
+    "Visualize cost and usage trends using matplotlib."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    import matplotlib.pyplot as plt\n",
+    "    import matplotlib.dates as mdates\n",
+    "    from datetime import datetime\n",
+    "    \n",
+    "    # Get daily metrics for last 30 days\n",
+    "    newest = datetime.now(timezone.utc)\n",
+    "    oldest = newest - timedelta(days=30)\n",
+    "\n",
+    "    payload = {\n",
+    "        \"focus\": \"trace\",\n",
+    "        \"interval\": 1440,  # Daily buckets\n",
+    "        \"windowing\": {\n",
+    "            \"oldest\": oldest.isoformat(),\n",
+    "            \"newest\": newest.isoformat()\n",
+    "        }\n",
+    "    }\n",
+    "\n",
+    "    response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "    data = response.json()\n",
+    "\n",
+    "    # Extract dates and metrics\n",
+    "    dates = [datetime.fromisoformat(b['timestamp'].replace('Z', '+00:00')) \n",
+    "             for b in data['buckets'] if b['total']['count'] > 0]\n",
+    "    costs = [b['total']['costs'] for b in data['buckets'] if b['total']['count'] > 0]\n",
+    "    counts = [b['total']['count'] for b in data['buckets'] if b['total']['count'] > 0]\n",
+    "\n",
+    "    # Create figure with two subplots\n",
+    "    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))\n",
+    "\n",
+    "    # Plot 1: Daily Cost\n",
+    "    ax1.plot(dates, costs, marker='o', linewidth=2, markersize=4, color='#2563eb')\n",
+    "    ax1.set_title('Daily LLM Costs (Last 30 Days)', fontsize=14, fontweight='bold')\n",
+    "    ax1.set_ylabel('Cost ($)', fontsize=12)\n",
+    "    ax1.grid(True, alpha=0.3)\n",
+    "    ax1.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))\n",
+    "    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)\n",
+    "\n",
+    "    # Plot 2: Daily Request Volume\n",
+    "    ax2.bar(dates, counts, alpha=0.7, color='steelblue')\n",
+    "    ax2.set_title('Daily Request Volume (Last 30 Days)', fontsize=14, fontweight='bold')\n",
+    "    ax2.set_xlabel('Date', fontsize=12)\n",
+    "    ax2.set_ylabel('Requests', fontsize=12)\n",
+    "    ax2.grid(True, alpha=0.3, axis='y')\n",
+    "    ax2.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))\n",
+    "    plt.setp(ax2.xaxis.get_majorticklabels(), rotation=45)\n",
+    "\n",
+    "    plt.tight_layout()\n",
+    "    plt.show()\n",
+    "    \n",
+    "    print(\"✅ Visualizations created successfully!\")\n",
+    "    \n",
+    "except ImportError:\n",
+    "    print(\"⚠️ matplotlib not installed. Run: pip install matplotlib\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 10: Export Data to DataFrame\n",
+    "\n",
+    "Convert analytics data to a pandas DataFrame for further analysis."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    import pandas as pd\n",
+    "    \n",
+    "    # Get daily metrics for last 30 days\n",
+    "    newest = datetime.now(timezone.utc)\n",
+    "    oldest = newest - timedelta(days=30)\n",
+    "\n",
+    "    payload = {\n",
+    "        \"focus\": \"trace\",\n",
+    "        \"interval\": 1440,  # Daily buckets\n",
+    "        \"windowing\": {\n",
+    "            \"oldest\": oldest.isoformat(),\n",
+    "            \"newest\": newest.isoformat()\n",
+    "        }\n",
+    "    }\n",
+    "\n",
+    "    response = requests.post(BASE_URL, headers=HEADERS, json=payload)\n",
+    "    data = response.json()\n",
+    "\n",
+    "    # Convert to DataFrame\n",
+    "    rows = []\n",
+    "    for bucket in data['buckets']:\n",
+    "        if bucket['total']['count'] > 0:  # Only include days with data\n",
+    "            rows.append({\n",
+    "                'timestamp': bucket['timestamp'],\n",
+    "                'total_count': bucket['total']['count'],\n",
+    "                'total_cost': bucket['total']['costs'],\n",
+    "                'total_duration': bucket['total']['duration'],\n",
+    "                'total_tokens': bucket['total']['tokens'],\n",
+    "                'error_count': bucket['errors']['count'],\n",
+    "                'error_duration': bucket['errors']['duration'],\n",
+    "                'avg_duration': bucket['total']['duration'] / bucket['total']['count'],\n",
+    "                'avg_cost': bucket['total']['costs'] / bucket['total']['count'],\n",
+    "                'error_rate': (bucket['errors']['count'] / bucket['total']['count'] * 100)\n",
+    "            })\n",
+    "\n",
+    "    df = pd.DataFrame(rows)\n",
+    "    df['timestamp'] = pd.to_datetime(df['timestamp'])\n",
+    "\n",
+    "    print(\"📊 Analytics Data Summary\\n\")\n",
+    "    print(df.describe())\n",
+    "    \n",
+    "    print(\"\\n📅 Recent Days:\")\n",
+    "    print(df.tail(10).to_string())\n",
+    "    \n",
+    "    # Optional: Save to CSV\n",
+    "    # df.to_csv('analytics_export.csv', index=False)\n",
+    "    # print(\"\\n✅ Data exported to analytics_export.csv\")\n",
+    "    \n",
+    "except ImportError:\n",
+    "    print(\"⚠️ pandas not installed. Run: pip install pandas\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "In this tutorial, you learned how to:\n",
+    "\n",
+    "1. ✅ **Retrieve aggregated metrics** using the Analytics API\n",
+    "2. ✅ **Track daily costs** and generate spending reports\n",
+    "3. ✅ **Analyze error trends** to identify reliability issues\n",
+    "4. ✅ **Filter by status code** to analyze successful vs failed traces\n",
+    "5. ✅ **Track token usage** patterns over time\n",
+    "6. ✅ **Monitor performance** and latency trends\n",
+    "7. ✅ **Generate monthly reports** with comprehensive cost breakdowns\n",
+    "8. ✅ **Compare week-over-week** metrics to identify trends\n",
+    "9. ✅ **Visualize data** using matplotlib\n",
+    "10. ✅ **Export to DataFrame** for further analysis\n",
+    "\n",
+    "## Next Steps\n",
+    "\n",
+    "- Learn about [Query API](/observability/query-data/query-api) for detailed trace analysis\n",
+    "- Explore [Using the UI](/observability/using-the-ui/filtering-traces) for visual analytics\n",
+    "- Check out [Semantic Conventions](/observability/concepts/semantic-conventions) for available metrics\n",
+    "- Read about [Cost Tracking](/observability/trace-with-python-sdk/track-costs) for automatic cost calculation"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/examples/jupyter/observability/annotate-traces-tutorial.ipynb b/examples/jupyter/observability/annotate-traces-tutorial.ipynb
new file mode 100644
index 0000000000..4fccc69fd3
--- /dev/null
+++ b/examples/jupyter/observability/annotate-traces-tutorial.ipynb
@@ -0,0 +1,438 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "intro",
+   "metadata": {},
+   "source": [
+    "# Annotate Traces Tutorial\n",
+    "\n",
+    "Annotations in Agenta let you enrich the traces created by your LLM applications. You can add scores, comments, expected answers and other metrics to help evaluate your application's performance.\n",
+    "\n",
+    "In this tutorial, we'll:\n",
+    "1. Set up the Agenta SDK and create a traced LLM application\n",
+    "2. Run the application to generate traces\n",
+    "3. Add annotations to those traces programmatically\n",
+    "4. Query and view the annotations\n",
+    "\n",
+    "## What You Can Do With Annotations\n",
+    "\n",
+    "- Collect user feedback on LLM responses\n",
+    "- Run custom evaluation workflows\n",
+    "- Measure application performance in real-time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "install",
+   "metadata": {},
+   "source": [
+    "## Step 1: Install Required Packages\n",
+    "\n",
+    "First, install the Agenta SDK, OpenAI, and the OpenTelemetry instrumentor:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "install-deps",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -U agenta openai opentelemetry-instrumentation-openai requests"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "setup",
+   "metadata": {},
+   "source": [
+    "## Step 2: Configure Environment Variables\n",
+    "\n",
+    "To start tracing your application and adding annotations, you'll need an API key:\n",
+    "\n",
+    "1. Visit the Agenta API Keys page under settings\n",
+    "2. Click on **Create New API Key** and follow the prompts\n",
+    "\n",
+    "Then set your environment variables:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "env-setup",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Set your API keys here\n",
+    "os.environ[\"AGENTA_API_KEY\"] = \"\"\n",
+    "os.environ[\"AGENTA_HOST\"] = \"https://cloud.agenta.ai\"  # Change for self-hosted\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "init-sdk",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import agenta as ag\n",
+    "from getpass import getpass\n",
+    "\n",
+    "# Initialize the SDK with your API key\n",
+    "api_key = os.getenv(\"AGENTA_API_KEY\")\n",
+    "if not api_key:\n",
+    "    os.environ[\"AGENTA_API_KEY\"] = getpass(\"Enter your Agenta API key: \")\n",
+    "\n",
+    "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
+    "if not openai_api_key:\n",
+    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n",
+    "\n",
+    "# Initialize Agenta\n",
+    "ag.init()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "instrument",
+   "metadata": {},
+   "source": [
+    "## Step 3: Create and Instrument an LLM Application\n",
+    "\n",
+    "Let's create a simple LLM application that we can trace and annotate:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "setup-openai",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openai\n",
+    "from opentelemetry.instrumentation.openai import OpenAIInstrumentor\n",
+    "\n",
+    "# Instrument OpenAI to automatically capture traces\n",
+    "OpenAIInstrumentor().instrument()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "create-function",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument()\n",
+    "def answer_question(question: str) -> tuple[str, str, str]:\n",
+    "    \"\"\"A simple question-answering function that we'll trace and annotate.\n",
+    "    \n",
+    "    Returns:\n",
+    "        Tuple of (answer, trace_id, span_id)\n",
+    "    \"\"\"\n",
+    "    response = openai.chat.completions.create(\n",
+    "        model=\"gpt-3.5-turbo\",\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a helpful assistant that answers questions concisely.\"},\n",
+    "            {\"role\": \"user\", \"content\": question},\n",
+    "        ],\n",
+    "    )\n",
+    "    \n",
+    "    # Automatically get the trace_id and span_id from the current span\n",
+    "    link = ag.tracing.build_invocation_link()\n",
+    "    \n",
+    "    return response.choices[0].message.content, link.trace_id, link.span_id"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "generate-trace",
+   "metadata": {},
+   "source": [
+    "## Step 4: Generate a Trace\n",
+    "\n",
+    "Let's run our function to generate a trace. The function will automatically capture the trace_id and span_id using `ag.tracing.build_invocation_link()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "run-function",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run the function to create a trace and get the IDs automatically\n",
+    "question = \"What is the capital of France?\"\n",
+    "result, trace_id, span_id = answer_question(question)\n",
+    "\n",
+    "print(f\"Question: {question}\")\n",
+    "print(f\"Answer: {result}\")\n",
+    "print(f\"\\n✅ Trace captured!\")\n",
+    "print(f\"Trace ID: {trace_id}\")\n",
+    "print(f\"Span ID: {span_id}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "create-annotation",
+   "metadata": {},
+   "source": [
+    "## Step 5: Create an Annotation\n",
+    "\n",
+    "Now let's add an annotation to the trace we just created. We'll use the trace_id and span_id that were automatically captured."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "annotate-trace",
+   "metadata": {},
+   "outputs": [],
+   "source": "import requests\n\nbase_url = os.environ.get(\"AGENTA_HOST\", \"https://cloud.agenta.ai\")\napi_key = os.environ[\"AGENTA_API_KEY\"]\n\nheaders = {\n    \"Content-Type\": \"application/json\",\n    \"Authorization\": f\"ApiKey {api_key}\"\n}\n\n# Create an annotation with a score and reasoning\nannotation_data = {\n    \"annotation\": {\n        \"data\": {\n            \"outputs\": {\n                \"score\": 90,\n                \"normalized_score\": 0.9,\n                \"reasoning\": \"The answer is correct and concise\",\n                \"expected_answer\": \"The capital of France is Paris\"\n            }\n        },\n        \"references\": {\n            \"evaluator\": {\n                \"slug\": \"accuracy_evaluator\"\n            }\n        },\n        \"links\": {\n            \"invocation\": {\n                \"trace_id\": trace_id,\n                \"span_id\": span_id\n            }\n        },\n        \"metadata\": {\n            \"annotator\": \"tutorial_user\",\n            \"timestamp\": \"2025-10-30T00:00:00Z\"\n        }\n    }\n}\n\n# Make the API request (note the trailing slash!)\nresponse = requests.post(\n    f\"{base_url}/api/preview/annotations/\",\n    headers=headers,\n    json=annotation_data\n)\n\n# Process the response\nif response.status_code == 200:\n    print(\"✅ Annotation created successfully!\")\n    annotation_response = response.json()\n    print(f\"\\nAnnotation ID: {annotation_response['annotation']['trace_id']}\")\n    print(f\"Span ID: {annotation_response['annotation']['span_id']}\")\n    print(f\"\\nAnnotation data:\")\n    print(annotation_response)\nelse:\n    print(f\"❌ Error: {response.status_code}\")\n    print(response.text)"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "multiple-annotations",
+   "metadata": {},
+   "source": [
+    "## Step 6: Create Additional Annotations\n",
+    "\n",
+    "You can add multiple annotations to the same trace from different evaluators:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "create-second-annotation",
+   "metadata": {},
+   "outputs": [],
+   "source": "# Create another annotation for quality assessment\nquality_annotation = {\n    \"annotation\": {\n        \"data\": {\n            \"outputs\": {\n                \"score\": 85,\n                \"reasoning\": \"Response is helpful and well-formatted\",\n                \"labels\": [\"Helpful\", \"Accurate\", \"Concise\"]\n            }\n        },\n        \"references\": {\n            \"evaluator\": {\n                \"slug\": \"quality_evaluator\"\n            }\n        },\n        \"links\": {\n            \"invocation\": {\n                \"trace_id\": trace_id,\n                \"span_id\": span_id\n            }\n        }\n    }\n}\n\nresponse = requests.post(\n    f\"{base_url}/api/preview/annotations/\",  # Note the trailing slash!\n    headers=headers,\n    json=quality_annotation\n)\n\nif response.status_code == 200:\n    print(\"✅ Quality annotation created successfully!\")\nelse:\n    print(f\"❌ Error: {response.status_code}\")\n    print(response.text)"
+  },
+  {
+   "cell_type": "markdown",
+   "id": "query-annotations",
+   "metadata": {},
+   "source": [
+    "## Step 7: Query Annotations\n",
+    "\n",
+    "Now let's query all annotations for our invocation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "query-by-invocation",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Query all annotations for the invocation\n",
+    "query_data = {\n",
+    "    \"annotation\": {\n",
+    "        \"links\": {\n",
+    "            \"invocation\": {\n",
+    "                \"trace_id\": trace_id,\n",
+    "                \"span_id\": span_id\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "response = requests.post(\n",
+    "    f\"{base_url}/api/preview/annotations/query\",\n",
+    "    headers=headers,\n",
+    "    json=query_data\n",
+    ")\n",
+    "\n",
+    "if response.status_code == 200:\n",
+    "    print(\"✅ Annotations retrieved successfully!\")\n",
+    "    annotations = response.json()\n",
+    "    print(f\"\\nFound {len(annotations.get('annotations', []))} annotation(s)\")\n",
+    "    print(\"\\nAnnotations:\")\n",
+    "    for idx, ann in enumerate(annotations.get('annotations', []), 1):\n",
+    "        print(f\"\\n--- Annotation {idx} ---\")\n",
+    "        print(f\"Evaluator: {ann['references']['evaluator']['slug']}\")\n",
+    "        print(f\"Data: {ann['data']}\")\n",
+    "else:\n",
+    "    print(f\"❌ Error: {response.status_code}\")\n",
+    "    print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "view-ui",
+   "metadata": {},
+   "source": [
+    "## Step 8: View Annotations in the UI\n",
+    "\n",
+    "You can see all annotations for a trace in the Agenta UI:\n",
+    "\n",
+    "1. Log in to your Agenta dashboard\n",
+    "2. Navigate to the **Observability** section\n",
+    "3. Find your trace\n",
+    "4. Check the **Annotations** tab to see detailed information\n",
+    "\n",
+    "The right sidebar will show average metrics for each evaluator."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "understanding-structure",
+   "metadata": {},
+   "source": [
+    "## Understanding Automatic Trace Capture\n",
+    "\n",
+    "The `ag.tracing.build_invocation_link()` function is a helper that automatically:\n",
+    "1. Gets the current span context from the active trace\n",
+    "2. Formats the trace_id and span_id as hex strings\n",
+    "3. Returns a Link object with both IDs ready to use\n",
+    "\n",
+    "This is much more convenient than manually querying the UI for trace IDs!\n",
+    "\n",
+    "**Alternative Method:**\n",
+    "You can also use `ag.tracing.get_span_context()` if you need more control:\n",
+    "\n",
+    "```python\n",
+    "span_ctx = ag.tracing.get_span_context()\n",
+    "trace_id = f\"{span_ctx.trace_id:032x}\"  # Format as hexadecimal\n",
+    "span_id = f\"{span_ctx.span_id:016x}\"    # Format as hexadecimal\n",
+    "```\n",
+    "\n",
+    "## Understanding Annotation Structure\n",
+    "\n",
+    "An annotation has four main parts:\n",
+    "\n",
+    "1. **Data**: The actual evaluation content (scores, comments)\n",
+    "2. **References**: Which evaluator to use (will be created automatically if it doesn't exist)\n",
+    "3. **Links**: Which trace and span you're annotating\n",
+    "4. **Metadata** (optional): Any extra information you want to include\n",
+    "\n",
+    "### Annotation Data Examples\n",
+    "\n",
+    "You can include various types of data in your annotations:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "annotation-examples",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example 1: Simple score\n",
+    "simple_annotation = {\n",
+    "    \"outputs\": {\n",
+    "        \"score\": 3\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "# Example 2: Score with explanation\n",
+    "detailed_annotation = {\n",
+    "    \"outputs\": {\n",
+    "        \"score\": 3,\n",
+    "        \"comment\": \"The response is not grounded\"\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "# Example 3: Multiple metrics with reference information\n",
+    "comprehensive_annotation = {\n",
+    "    \"outputs\": {\n",
+    "        \"score\": 3,\n",
+    "        \"normalized_score\": 0.5,\n",
+    "        \"comment\": \"The response is not grounded\",\n",
+    "        \"expected_answer\": \"The capital of France is Paris\",\n",
+    "        \"labels\": [\"factual\", \"concise\"]\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "print(\"Annotation data can include:\")\n",
+    "print(\"- Numbers (scores, ratings)\")\n",
+    "print(\"- Categories (labels, classifications)\")\n",
+    "print(\"- Text (comments, reasoning)\")\n",
+    "print(\"- Booleans (true/false values)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cleanup",
+   "metadata": {},
+   "source": [
+    "## Optional: Remove an Annotation\n",
+    "\n",
+    "If you need to remove an annotation, you can delete it by its trace_id and span_id:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "delete-annotation",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Uncomment and replace with your annotation's trace_id and span_id to delete\n",
+    "# annotation_trace_id = \"your_annotation_trace_id\"\n",
+    "# annotation_span_id = \"your_annotation_span_id\"\n",
+    "\n",
+    "# response = requests.delete(\n",
+    "#     f\"{base_url}/api/preview/annotations/{annotation_trace_id}/{annotation_span_id}\",\n",
+    "#     headers=headers\n",
+    "# )\n",
+    "\n",
+    "# if response.status_code == 200:\n",
+    "#     print(\"✅ Annotation deleted successfully\")\n",
+    "# else:\n",
+    "#     print(f\"❌ Error: {response.status_code}\")\n",
+    "#     print(response.text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "summary",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "In this tutorial, we've covered:\n",
+    "\n",
+    "1. ✅ Setting up the Agenta SDK and instrumenting an LLM application\n",
+    "2. ✅ Generating traces by running the application\n",
+    "3. ✅ Creating annotations with scores, reasoning, and metadata\n",
+    "4. ✅ Adding multiple annotations from different evaluators\n",
+    "5. ✅ Querying annotations programmatically\n",
+    "6. ✅ Understanding annotation structure and capabilities\n",
+    "\n",
+    "## Next Steps\n",
+    "\n",
+    "Now that you know how to annotate traces, you can:\n",
+    "\n",
+    "- Integrate annotation creation into your evaluation workflows\n",
+    "- Build custom evaluators that automatically annotate traces\n",
+    "- Use annotations to track user feedback in production\n",
+    "- Analyze annotation data to improve your LLM applications"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/examples/jupyter/observability/query-data-api-tutorial.ipynb b/examples/jupyter/observability/query-data-api-tutorial.ipynb
new file mode 100644
index 0000000000..09ad0a7ecc
--- /dev/null
+++ b/examples/jupyter/observability/query-data-api-tutorial.ipynb
@@ -0,0 +1,643 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Query Data API - Tutorial\n",
+        "\n",
+        "This tutorial shows you how to use the Agenta Query Data API to retrieve and analyze your LLM traces. You'll learn how to:\n",
+        "\n",
+        "- Set up the API client with authentication\n",
+        "- Query spans and traces with filters\n",
+        "- Filter by attributes, time ranges, and status codes\n",
+        "- Use advanced filters with logical operators\n",
+        "- Analyze trace data to calculate costs and latencies\n",
+        "\n",
+        "## What You'll Build\n",
+        "\n",
+        "We'll create scripts that:\n",
+        "1. Query recent traces from your applications\n",
+        "2. Filter traces by type, status, and custom attributes\n",
+        "3. Analyze cost and performance metrics\n",
+        "4. Find problematic traces (errors, slow responses)\n",
+        "5. Export trace data for further analysis"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Install Dependencies"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "pip install -U requests pandas"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": "## Setup\n\nBefore using the API, you need your Agenta API key. You can create API keys from the Settings page in your Agenta workspace."
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "os.environ[\"AGENTA_HOST\"] = \"https://cloud.agenta.ai\"  # Default value, change for self-hosted\n",
+        "os.environ[\"AGENTA_API_KEY\"] = \"\""
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "import requests\n",
+        "from getpass import getpass\n",
+        "from datetime import datetime, timedelta, timezone\n",
+        "import json\n",
+        "\n",
+        "# Get API credentials\n",
+        "AGENTA_HOST = os.getenv(\"AGENTA_HOST\", \"https://cloud.agenta.ai\")\n",
+        "api_key = os.getenv(\"AGENTA_API_KEY\")\n",
+        "if not api_key:\n",
+        "    api_key = getpass(\"Enter your Agenta API key: \")\n",
+        "    os.environ[\"AGENTA_API_KEY\"] = api_key\n",
+        "\n",
+        "# Setup base configuration\n",
+        "BASE_URL = f\"{AGENTA_HOST}/api/preview/tracing/spans/query\"\n",
+        "HEADERS = {\n",
+        "    \"Authorization\": f\"ApiKey {api_key}\",\n",
+        "    \"Content-Type\": \"application/json\"\n",
+        "}\n",
+        "\n",
+        "print(f\"\u2713 Connected to {AGENTA_HOST}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 1: Query Recent Traces\n",
+        "\n",
+        "Let's start by querying traces from the last 7 days. We'll use the `focus=trace` parameter to get complete trace trees instead of individual spans."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Query traces from the last 7 days\n",
+        "now = datetime.now(timezone.utc)\n",
+        "week_ago = now - timedelta(days=7)\n",
+        "\n",
+        "query = {\n",
+        "    \"focus\": \"trace\",\n",
+        "    \"oldest\": week_ago.isoformat(),\n",
+        "    \"newest\": now.isoformat(),\n",
+        "    \"limit\": 5\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Found {data['count']} traces\")\n",
+        "print(f\"Trace IDs: {list(data.get('traces', {}).keys())}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 2: Query Spans with Filters\n",
+        "\n",
+        "Now let's query individual spans and filter by type. We'll look for LLM spans specifically."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Query LLM spans\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"limit\": 10,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.type.span\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"llm\"\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Found {data['count']} LLM spans\")\n",
+        "\n",
+        "# Display first span details\n",
+        "if data.get('spans'):\n",
+        "    span = data['spans'][0]\n",
+        "    print(f\"\\nFirst span:\")\n",
+        "    print(f\"  Name: {span.get('span_name')}\")\n",
+        "    print(f\"  Status: {span.get('status_code')}\")\n",
+        "    print(f\"  Start: {span.get('start_time')}\")\n",
+        "    print(f\"  End: {span.get('end_time')}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 3: Filter by Status Code\n",
+        "\n",
+        "Let's find traces that encountered errors. This is useful for debugging and monitoring application health."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Find error traces\n",
+        "query = {\n",
+        "    \"focus\": \"trace\",\n",
+        "    \"limit\": 10,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"status_code\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"STATUS_CODE_ERROR\"\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Found {data['count']} traces with errors\")\n",
+        "\n",
+        "if data.get('traces'):\n",
+        "    for trace_id in list(data['traces'].keys())[:3]:\n",
+        "        print(f\"\\nError trace: {trace_id}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 4: Advanced Filtering with Multiple Conditions\n",
+        "\n",
+        "Let's use multiple filters to find specific traces. We'll look for successful LLM calls that took longer than 2 seconds."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Find slow but successful LLM calls\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"limit\": 10,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.type.span\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"llm\"\n",
+        "            },\n",
+        "            {\n",
+        "                \"field\": \"status_code\",\n",
+        "                \"operator\": \"is_not\",\n",
+        "                \"value\": \"STATUS_CODE_ERROR\"\n",
+        "            },\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.metrics.unit.duration\",\n",
+        "                \"operator\": \"gt\",\n",
+        "                \"value\": 2000  # milliseconds\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Found {data['count']} slow LLM spans (>2s)\")\n",
+        "\n",
+        "if data.get('spans'):\n",
+        "    for span in data['spans'][:3]:\n",
+        "        duration = span.get('attributes', {}).get('ag', {}).get('metrics', {}).get('duration', {}).get('cumulative', 'N/A')\n",
+        "        print(f\"  {span.get('span_name')}: {duration}ms\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 5: Nested Logical Operators\n",
+        "\n",
+        "Let's create a more complex query using nested logical operators. We'll find spans that are either errors OR slow responses."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Find problematic spans (errors OR slow)\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"limit\": 10,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.type.span\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"llm\"\n",
+        "            },\n",
+        "            {\n",
+        "                \"operator\": \"or\",\n",
+        "                \"conditions\": [\n",
+        "                    {\n",
+        "                        \"field\": \"status_code\",\n",
+        "                        \"value\": \"STATUS_CODE_ERROR\"\n",
+        "                    },\n",
+        "                    {\n",
+        "                        \"field\": \"attributes\",\n",
+        "                        \"key\": \"ag.metrics.unit.duration\",\n",
+        "                        \"operator\": \"gt\",\n",
+        "                        \"value\": 5000\n",
+        "                    }\n",
+        "                ]\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Found {data['count']} problematic spans\")\n",
+        "\n",
+        "if data.get('spans'):\n",
+        "    for span in data['spans'][:5]:\n",
+        "        status = span.get('status_code')\n",
+        "        duration = span.get('attributes', {}).get('ag', {}).get('metrics', {}).get('duration', {}).get('cumulative', 'N/A')\n",
+        "        print(f\"  {span.get('span_name')}: Status={status}, Duration={duration}ms\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 6: Analyze Cost and Token Usage\n",
+        "\n",
+        "Let's query LLM spans and analyze their costs and token usage. This helps you understand your LLM spending."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Query LLM spans with cost tracking\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"limit\": 50,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.type.span\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"llm\"\n",
+        "            },\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.metrics.unit.cost\",\n",
+        "                \"operator\": \"exists\"\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Analyzing {data['count']} LLM spans with cost data\\n\")\n",
+        "\n",
+        "if data.get('spans'):\n",
+        "    total_cost = 0\n",
+        "    total_tokens = 0\n",
+        "    total_duration = 0\n",
+        "    \n",
+        "    for span in data['spans']:\n",
+        "        metrics = span.get('attributes', {}).get('ag', {}).get('metrics', {})\n",
+        "        \n",
+        "        # Extract cost\n",
+        "        cost = metrics.get('costs', {}).get('cumulative', {}).get('total', 0)\n",
+        "        total_cost += cost\n",
+        "        \n",
+        "        # Extract tokens\n",
+        "        tokens = metrics.get('tokens', {}).get('cumulative', {}).get('total', 0)\n",
+        "        total_tokens += tokens\n",
+        "        \n",
+        "        # Extract duration\n",
+        "        duration = metrics.get('duration', {}).get('cumulative', 0)\n",
+        "        total_duration += duration\n",
+        "    \n",
+        "    print(f\"Summary:\")\n",
+        "    print(f\"  Total Cost: ${total_cost:.4f}\")\n",
+        "    print(f\"  Total Tokens: {total_tokens:,}\")\n",
+        "    print(f\"  Average Cost per Span: ${(total_cost/len(data['spans'])):.4f}\")\n",
+        "    print(f\"  Average Tokens per Span: {int(total_tokens/len(data['spans']))}\")\n",
+        "    print(f\"  Average Duration: {int(total_duration/len(data['spans']))}ms\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 7: Filter by Span Name Pattern\n",
+        "\n",
+        "Let's use string matching operators to find specific types of operations. We'll search for OpenAI-related spans."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Find OpenAI spans using pattern matching\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"limit\": 10,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"span_name\",\n",
+        "                \"operator\": \"contains\",\n",
+        "                \"value\": \"openai\"\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Found {data['count']} OpenAI spans\")\n",
+        "\n",
+        "if data.get('spans'):\n",
+        "    span_names = set(span.get('span_name') for span in data['spans'])\n",
+        "    print(f\"\\nUnique span names:\")\n",
+        "    for name in span_names:\n",
+        "        print(f\"  - {name}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 8: Export Trace Data to DataFrame\n",
+        "\n",
+        "Let's export our trace data to a pandas DataFrame for further analysis and visualization."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "\n",
+        "# Query recent spans\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"limit\": 100,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.type.span\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"llm\"\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "# Convert to DataFrame\n",
+        "records = []\n",
+        "for span in data.get('spans', []):\n",
+        "    metrics = span.get('attributes', {}).get('ag', {}).get('metrics', {})\n",
+        "    \n",
+        "    record = {\n",
+        "        'trace_id': span.get('trace_id'),\n",
+        "        'span_id': span.get('span_id'),\n",
+        "        'span_name': span.get('span_name'),\n",
+        "        'status': span.get('status_code'),\n",
+        "        'start_time': span.get('start_time'),\n",
+        "        'end_time': span.get('end_time'),\n",
+        "        'duration_ms': metrics.get('duration', {}).get('cumulative', 0),\n",
+        "        'cost': metrics.get('costs', {}).get('cumulative', {}).get('total', 0),\n",
+        "        'total_tokens': metrics.get('tokens', {}).get('cumulative', {}).get('total', 0),\n",
+        "        'prompt_tokens': metrics.get('tokens', {}).get('cumulative', {}).get('prompt', 0),\n",
+        "        'completion_tokens': metrics.get('tokens', {}).get('cumulative', {}).get('completion', 0),\n",
+        "    }\n",
+        "    records.append(record)\n",
+        "\n",
+        "df = pd.DataFrame(records)\n",
+        "\n",
+        "print(f\"Created DataFrame with {len(df)} rows\\n\")\n",
+        "print(\"First 5 rows:\")\n",
+        "print(df.head())\n",
+        "\n",
+        "print(\"\\nBasic statistics:\")\n",
+        "print(df[['duration_ms', 'cost', 'total_tokens']].describe())"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 9: Time-Based Analysis\n",
+        "\n",
+        "Let's analyze how your costs and latencies change over time."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Convert timestamps to datetime\n",
+        "df['start_time'] = pd.to_datetime(df['start_time'])\n",
+        "df['end_time'] = pd.to_datetime(df['end_time'])\n",
+        "\n",
+        "# Group by hour\n",
+        "df['hour'] = df['start_time'].dt.floor('H')\n",
+        "hourly_stats = df.groupby('hour').agg({\n",
+        "    'span_id': 'count',\n",
+        "    'duration_ms': 'mean',\n",
+        "    'cost': 'sum',\n",
+        "    'total_tokens': 'sum'\n",
+        "}).rename(columns={'span_id': 'num_calls'})\n",
+        "\n",
+        "print(\"Hourly statistics:\")\n",
+        "print(hourly_stats)\n",
+        "\n",
+        "print(f\"\\nPeak usage hour: {hourly_stats['num_calls'].idxmax()}\")\n",
+        "print(f\"Highest cost hour: {hourly_stats['cost'].idxmax()} (${hourly_stats['cost'].max():.4f})\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Part 10: Filter by Time Range\n",
+        "\n",
+        "Let's query traces from a specific time window. This is useful for analyzing specific incidents or time periods."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Query last 24 hours\n",
+        "now = datetime.now(timezone.utc)\n",
+        "yesterday = now - timedelta(days=1)\n",
+        "\n",
+        "query = {\n",
+        "    \"focus\": \"span\",\n",
+        "    \"oldest\": yesterday.isoformat(),\n",
+        "    \"newest\": now.isoformat(),\n",
+        "    \"limit\": 100,\n",
+        "    \"filter\": {\n",
+        "        \"operator\": \"and\",\n",
+        "        \"conditions\": [\n",
+        "            {\n",
+        "                \"field\": \"attributes\",\n",
+        "                \"key\": \"ag.type.span\",\n",
+        "                \"operator\": \"is\",\n",
+        "                \"value\": \"llm\"\n",
+        "            }\n",
+        "        ]\n",
+        "    }\n",
+        "}\n",
+        "\n",
+        "response = requests.post(BASE_URL, headers=HEADERS, json=query)\n",
+        "data = response.json()\n",
+        "\n",
+        "print(f\"Last 24 hours: {data['count']} LLM spans\")\n",
+        "\n",
+        "if data.get('spans'):\n",
+        "    # Calculate totals\n",
+        "    total_cost = sum(\n",
+        "        span.get('attributes', {}).get('ag', {}).get('metrics', {}).get('costs', {}).get('cumulative', {}).get('total', 0)\n",
+        "        for span in data['spans']\n",
+        "    )\n",
+        "    \n",
+        "    error_count = sum(\n",
+        "        1 for span in data['spans']\n",
+        "        if span.get('status_code') == 'STATUS_CODE_ERROR'\n",
+        "    )\n",
+        "    \n",
+        "    print(f\"\\nSummary for last 24 hours:\")\n",
+        "    print(f\"  Total Cost: ${total_cost:.4f}\")\n",
+        "    print(f\"  Error Rate: {(error_count/len(data['spans'])*100):.2f}%\")\n",
+        "    print(f\"  Success Rate: {((len(data['spans'])-error_count)/len(data['spans'])*100):.2f}%\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Summary\n",
+        "\n",
+        "In this tutorial, you learned how to:\n",
+        "\n",
+        "1. \u2713 Set up the Agenta Query Data API client\n",
+        "2. \u2713 Query traces and spans with filters\n",
+        "3. \u2713 Filter by attributes, status codes, and time ranges\n",
+        "4. \u2713 Use advanced filters with logical operators\n",
+        "5. \u2713 Analyze cost and performance metrics\n",
+        "6. \u2713 Export trace data to pandas DataFrames\n",
+        "7. \u2713 Perform time-based analysis\n",
+        "\n",
+        "## Next Steps\n",
+        "\n",
+        "- Learn about the [Analytics Data API](/observability/query-data/analytics-data) for aggregated metrics\n",
+        "- Explore [filtering in the UI](/observability/using-the-ui/filtering-traces) for visual query building\n",
+        "- Check out [trace annotations](/observability/trace-with-python-sdk/annotate-traces) for adding feedback data\n",
+        "- Read the complete [API reference](/reference/api) for all available endpoints"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.9.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/examples/jupyter/observability/quickstart.ipynb b/examples/jupyter/observability/quickstart.ipynb
new file mode 100644
index 0000000000..a16891c606
--- /dev/null
+++ b/examples/jupyter/observability/quickstart.ipynb
@@ -0,0 +1,328 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a1b2c3d4",
+   "metadata": {},
+   "source": [
+    "# Quick Start: Observability with Agenta\n",
+    "\n",
+    "Agenta enables you to capture all inputs, outputs, and metadata from your LLM applications, **whether they're hosted within Agenta or running in your own environment**.\n",
+    "\n",
+    "This guide will walk you through setting up observability for an OpenAI application running locally.\n",
+    "\n",
+    "**Note:** If you create an application through the Agenta UI, tracing is enabled by default. No additional setup is required—simply go to the observability view to see all your requests."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2c3d4e5",
+   "metadata": {},
+   "source": [
+    "## Step 1: Install Required Packages\n",
+    "\n",
+    "First, install the Agenta SDK, OpenAI, and the OpenTelemetry instrumentor for OpenAI:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "c3d4e5f6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: agenta in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (0.51.6)\n",
+      "Collecting agenta\n",
+      "  Downloading agenta-0.59.6-py3-none-any.whl.metadata (31 kB)\n",
+      "Requirement already satisfied: openai in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (1.107.1)\n",
+      "Collecting openai\n",
+      "  Downloading openai-2.6.1-py3-none-any.whl.metadata (29 kB)\n",
+      "Requirement already satisfied: opentelemetry-instrumentation-openai in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (0.46.2)\n",
+      "Collecting opentelemetry-instrumentation-openai\n",
+      "  Downloading opentelemetry_instrumentation_openai-0.47.5-py3-none-any.whl.metadata (2.2 kB)\n",
+      "Requirement already satisfied: decorator<6.0.0,>=5.2.1 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (5.2.1)\n",
+      "Requirement already satisfied: fastapi<0.117.0,>=0.116.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.116.1)\n",
+      "Requirement already satisfied: google-auth<3,>=2.23 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (2.40.3)\n",
+      "Requirement already satisfied: h11<0.17.0,>=0.16.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.16.0)\n",
+      "Requirement already satisfied: httpx<0.29.0,>=0.28.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.28.1)\n",
+      "Requirement already satisfied: huggingface-hub<0.31.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.30.2)\n",
+      "Requirement already satisfied: importlib-metadata<9.0,>=8.0.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (8.7.0)\n",
+      "Requirement already satisfied: jinja2<4.0.0,>=3.1.6 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (3.1.6)\n",
+      "Collecting litellm==1.78.7 (from agenta)\n",
+      "  Downloading litellm-1.78.7-py3-none-any.whl.metadata (42 kB)\n",
+      "Collecting openai\n",
+      "  Downloading openai-1.109.1-py3-none-any.whl.metadata (29 kB)\n",
+      "Requirement already satisfied: opentelemetry-api<2.0.0,>=1.27.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (1.36.0)\n",
+      "Requirement already satisfied: opentelemetry-exporter-otlp-proto-http<2.0.0,>=1.27.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (1.36.0)\n",
+      "Requirement already satisfied: opentelemetry-instrumentation>=0.56b0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.57b0)\n",
+      "Requirement already satisfied: opentelemetry-sdk<2.0.0,>=1.27.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (1.36.0)\n",
+      "Requirement already satisfied: pydantic<3,>=2 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (2.11.7)\n",
+      "Requirement already satisfied: python-dotenv<2.0.0,>=1.0.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (1.1.1)\n",
+      "Requirement already satisfied: pyyaml<7.0.0,>=6.0.2 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (6.0.2)\n",
+      "Requirement already satisfied: starlette<0.48.0,>=0.47.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.47.3)\n",
+      "Requirement already satisfied: structlog<26.0.0,>=25.2.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (25.4.0)\n",
+      "Requirement already satisfied: tiktoken==0.11.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.11.0)\n",
+      "Requirement already satisfied: toml<0.11.0,>=0.10.2 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from agenta) (0.10.2)\n",
+      "Requirement already satisfied: aiohttp>=3.10 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from litellm==1.78.7->agenta) (3.12.15)\n",
+      "Requirement already satisfied: click in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from litellm==1.78.7->agenta) (8.2.1)\n",
+      "Collecting fastuuid>=0.13.0 (from litellm==1.78.7->agenta)\n",
+      "  Downloading fastuuid-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.1 kB)\n",
+      "Requirement already satisfied: jsonschema<5.0.0,>=4.22.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from litellm==1.78.7->agenta) (4.25.1)\n",
+      "Requirement already satisfied: tokenizers in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from litellm==1.78.7->agenta) (0.22.0)\n",
+      "Requirement already satisfied: regex>=2022.1.18 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from tiktoken==0.11.0->agenta) (2025.9.1)\n",
+      "Requirement already satisfied: requests>=2.26.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from tiktoken==0.11.0->agenta) (2.32.5)\n",
+      "Requirement already satisfied: anyio<5,>=3.5.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from openai) (4.10.0)\n",
+      "Requirement already satisfied: distro<2,>=1.7.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from openai) (1.9.0)\n",
+      "Requirement already satisfied: jiter<1,>=0.4.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from openai) (0.10.0)\n",
+      "Requirement already satisfied: sniffio in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from openai) (1.3.1)\n",
+      "Requirement already satisfied: tqdm>4 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from openai) (4.67.1)\n",
+      "Requirement already satisfied: typing-extensions<5,>=4.11 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from openai) (4.15.0)\n",
+      "Requirement already satisfied: idna>=2.8 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from anyio<5,>=3.5.0->openai) (3.10)\n",
+      "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from google-auth<3,>=2.23->agenta) (5.5.2)\n",
+      "Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from google-auth<3,>=2.23->agenta) (0.4.2)\n",
+      "Requirement already satisfied: rsa<5,>=3.1.4 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from google-auth<3,>=2.23->agenta) (4.9.1)\n",
+      "Requirement already satisfied: certifi in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from httpx<0.29.0,>=0.28.0->agenta) (2025.8.3)\n",
+      "Requirement already satisfied: httpcore==1.* in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from httpx<0.29.0,>=0.28.0->agenta) (1.0.9)\n",
+      "Requirement already satisfied: filelock in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from huggingface-hub<0.31.0->agenta) (3.19.1)\n",
+      "Requirement already satisfied: fsspec>=2023.5.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from huggingface-hub<0.31.0->agenta) (2025.7.0)\n",
+      "Requirement already satisfied: packaging>=20.9 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from huggingface-hub<0.31.0->agenta) (25.0)\n",
+      "Requirement already satisfied: zipp>=3.20 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from importlib-metadata<9.0,>=8.0.0->agenta) (3.23.0)\n",
+      "Requirement already satisfied: MarkupSafe>=2.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from jinja2<4.0.0,>=3.1.6->agenta) (3.0.2)\n",
+      "Requirement already satisfied: attrs>=22.2.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm==1.78.7->agenta) (25.3.0)\n",
+      "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm==1.78.7->agenta) (2025.4.1)\n",
+      "Requirement already satisfied: referencing>=0.28.4 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm==1.78.7->agenta) (0.36.2)\n",
+      "Requirement already satisfied: rpds-py>=0.7.1 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm==1.78.7->agenta) (0.27.1)\n",
+      "Requirement already satisfied: googleapis-common-protos~=1.52 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-exporter-otlp-proto-http<2.0.0,>=1.27.0->agenta) (1.70.0)\n",
+      "Requirement already satisfied: opentelemetry-exporter-otlp-proto-common==1.36.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-exporter-otlp-proto-http<2.0.0,>=1.27.0->agenta) (1.36.0)\n",
+      "Requirement already satisfied: opentelemetry-proto==1.36.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-exporter-otlp-proto-http<2.0.0,>=1.27.0->agenta) (1.36.0)\n",
+      "Requirement already satisfied: protobuf<7.0,>=5.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-proto==1.36.0->opentelemetry-exporter-otlp-proto-http<2.0.0,>=1.27.0->agenta) (6.32.0)\n",
+      "Requirement already satisfied: opentelemetry-semantic-conventions==0.57b0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-sdk<2.0.0,>=1.27.0->agenta) (0.57b0)\n",
+      "Requirement already satisfied: annotated-types>=0.6.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from pydantic<3,>=2->agenta) (0.7.0)\n",
+      "Requirement already satisfied: pydantic-core==2.33.2 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from pydantic<3,>=2->agenta) (2.33.2)\n",
+      "Requirement already satisfied: typing-inspection>=0.4.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from pydantic<3,>=2->agenta) (0.4.1)\n",
+      "Requirement already satisfied: charset_normalizer<4,>=2 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken==0.11.0->agenta) (3.4.3)\n",
+      "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken==0.11.0->agenta) (2.5.0)\n",
+      "Requirement already satisfied: pyasn1>=0.1.3 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from rsa<5,>=3.1.4->google-auth<3,>=2.23->agenta) (0.6.1)\n",
+      "Requirement already satisfied: opentelemetry-semantic-conventions-ai<0.5.0,>=0.4.13 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-instrumentation-openai) (0.4.13)\n",
+      "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from aiohttp>=3.10->litellm==1.78.7->agenta) (2.6.1)\n",
+      "Requirement already satisfied: aiosignal>=1.4.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from aiohttp>=3.10->litellm==1.78.7->agenta) (1.4.0)\n",
+      "Requirement already satisfied: frozenlist>=1.1.1 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from aiohttp>=3.10->litellm==1.78.7->agenta) (1.7.0)\n",
+      "Requirement already satisfied: multidict<7.0,>=4.5 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from aiohttp>=3.10->litellm==1.78.7->agenta) (6.6.4)\n",
+      "Requirement already satisfied: propcache>=0.2.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from aiohttp>=3.10->litellm==1.78.7->agenta) (0.3.2)\n",
+      "Requirement already satisfied: yarl<2.0,>=1.17.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from aiohttp>=3.10->litellm==1.78.7->agenta) (1.20.1)\n",
+      "Requirement already satisfied: wrapt<2.0.0,>=1.0.0 in /home/mahmoud/code/agenta_cloud/.venv/lib/python3.12/site-packages (from opentelemetry-instrumentation>=0.56b0->agenta) (1.17.3)\n",
+      "Downloading agenta-0.59.6-py3-none-any.whl (339 kB)\n",
+      "Downloading litellm-1.78.7-py3-none-any.whl (9.9 MB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.9/9.9 MB\u001b[0m \u001b[31m80.7 MB/s\u001b[0m  \u001b[33m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading openai-1.109.1-py3-none-any.whl (948 kB)\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m948.6/948.6 kB\u001b[0m \u001b[31m26.3 MB/s\u001b[0m  \u001b[33m0:00:00\u001b[0m\n",
+      "\u001b[?25hDownloading opentelemetry_instrumentation_openai-0.47.5-py3-none-any.whl (35 kB)\n",
+      "Downloading fastuuid-0.14.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (278 kB)\n",
+      "Installing collected packages: fastuuid, openai, litellm, opentelemetry-instrumentation-openai, agenta\n",
+      "\u001b[2K  Attempting uninstall: openai\n",
+      "\u001b[2K    Found existing installation: openai 1.107.1\n",
+      "\u001b[2K    Uninstalling openai-1.107.1:\n",
+      "\u001b[2K      Successfully uninstalled openai-1.107.1\n",
+      "\u001b[2K  Attempting uninstall: litellm[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1/5\u001b[0m [openai]\n",
+      "\u001b[2K    Found existing installation: litellm 1.76.0━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1/5\u001b[0m [openai]\n",
+      "\u001b[2K    Uninstalling litellm-1.76.0:m╺\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K      Successfully uninstalled litellm-1.76.0━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K  Attempting uninstall: opentelemetry-instrumentation-openai━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K    Found existing installation: opentelemetry-instrumentation-openai 0.46.2[0m [litellm]\n",
+      "\u001b[2K    Uninstalling opentelemetry-instrumentation-openai-0.46.2:━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K      Successfully uninstalled opentelemetry-instrumentation-openai-0.46.25\u001b[0m [litellm]\n",
+      "\u001b[2K  Attempting uninstall: agenta\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K    Found existing installation: agenta 0.51.6━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K    Uninstalling agenta-0.51.6:[0m\u001b[90m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K      Successfully uninstalled agenta-0.51.6━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2/5\u001b[0m [litellm]\n",
+      "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5/5\u001b[0m [agenta]2m4/5\u001b[0m [agenta]\n",
+      "\u001b[1A\u001b[2KSuccessfully installed agenta-0.59.6 fastuuid-0.14.0 litellm-1.78.7 openai-1.109.1 opentelemetry-instrumentation-openai-0.47.5\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "pip install -U agenta openai opentelemetry-instrumentation-openai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4e5f6g7",
+   "metadata": {},
+   "source": [
+    "## Step 2: Configure Environment Variables\n",
+    "\n",
+    "To start tracing your application, you'll need an API key:\n",
+    "\n",
+    "1. Visit the Agenta API Keys page under settings.\n",
+    "2. Click on **Create New API Key** and follow the prompts.\n",
+    "\n",
+    "Then set your environment variables:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e5f6g7h8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Set your API key here\n",
+    "os.environ[\"AGENTA_API_KEY\"] = \"\"\n",
+    "os.environ[\"AGENTA_HOST\"] = \"https://cloud.agenta.ai\"  # Change for self-hosted\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6g7h8i9",
+   "metadata": {},
+   "source": [
+    "## Step 3: Instrument Your Application\n",
+    "\n",
+    "Below is a sample script to instrument an OpenAI application:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "g7h8i9j0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-10-28T10:24:18.411Z \u001b[38;5;70m[INFO.]\u001b[0m Agenta - SDK version: 0.59.6 \u001b[38;5;245m[agenta.sdk.agenta_init]\u001b[0m \n",
+      "2025-10-28T10:24:18.411Z \u001b[38;5;70m[INFO.]\u001b[0m Agenta - Host: https://cloud.agenta.ai \u001b[38;5;245m[agenta.sdk.agenta_init]\u001b[0m \n",
+      "2025-10-28T10:24:18.412Z \u001b[38;5;70m[INFO.]\u001b[0m Agenta - OLTP URL: https://cloud.agenta.ai/api/otlp/v1/traces \u001b[38;5;245m[agenta.sdk.tracing.tracing]\u001b[0m \n"
+     ]
+    }
+   ],
+   "source": [
+    "import agenta as ag\n",
+    "from opentelemetry.instrumentation.openai import OpenAIInstrumentor\n",
+    "import openai\n",
+    "\n",
+    "# Initialize Agenta\n",
+    "ag.init()\n",
+    "\n",
+    "# Instrument OpenAI\n",
+    "OpenAIInstrumentor().instrument()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "h8i9j0k1",
+   "metadata": {},
+   "source": [
+    "## Step 4: Create an Instrumented Function\n",
+    "\n",
+    "Decorate your function with `@ag.instrument()` to enable tracing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "i9j0k1l2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument()\n",
+    "def generate():\n",
+    "    response = openai.chat.completions.create(\n",
+    "        model=\"gpt-3.5-turbo\",\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "            {\"role\": \"user\", \"content\": \"Write a short story about AI Engineering.\"},\n",
+    "        ],\n",
+    "    )\n",
+    "    return response.choices[0].message.content"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "j0k1l2m3",
+   "metadata": {},
+   "source": [
+    "## Step 5: Run Your Application\n",
+    "\n",
+    "Call your instrumented function to generate a trace:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "k1l2m3n4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "In a future where artificial intelligence had revolutionized society, a young engineer named Maya was fascinated by the endless possibilities of AI technology. She dedicated her life to mastering the complexities of AI engineering, pushing the boundaries of what was thought possible.\n",
+      "\n",
+      "Maya worked tirelessly, constantly experimenting and innovating to create AI systems that could change the world for the better. Her creations helped streamline businesses, improve healthcare outcomes, and enhance everyday tasks for people around the globe.\n",
+      "\n",
+      "However, Maya soon realized that not all AI technology was being used ethically. Some companies exploited AI for profit, using it to manipulate data or invade people's privacy. Troubled by this misuse of technology, Maya made it her mission to advocate for responsible AI engineering practices.\n",
+      "\n",
+      "Through her dedication and leadership, Maya became a voice for ethical AI engineering, pushing for regulations and guidelines to ensure that AI technology was used for the greater good of humanity. Her passion inspired others in the field to prioritize ethics and social responsibility in their work.\n",
+      "\n",
+      "As the world grappled with the emerging challenges of AI technology, Maya stood at the forefront, a shining example of how innovation and integrity could go hand in hand in the exciting world of AI engineering.\n"
+     ]
+    }
+   ],
+   "source": [
+    "if __name__ == \"__main__\":\n",
+    "    result = generate()\n",
+    "    print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "l2m3n4o5",
+   "metadata": {},
+   "source": [
+    "## Step 6: View Traces in the Agenta UI\n",
+    "\n",
+    "After running your application, you can view the captured traces in Agenta:\n",
+    "\n",
+    "1. Log in to your Agenta dashboard.\n",
+    "2. Navigate to the **Observability** section.\n",
+    "3. You'll see a list of traces corresponding to your application's requests.\n",
+    "\n",
+    "Each trace will show you the inputs, outputs, and metadata from your LLM application, including:\n",
+    "- Function execution time\n",
+    "- OpenAI API calls and responses\n",
+    "- Token usage\n",
+    "- Any errors or exceptions"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb b/examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb
new file mode 100644
index 0000000000..d40d9dc9af
--- /dev/null
+++ b/examples/jupyter/observability/trace-with-python-sdk-tutorial.ipynb
@@ -0,0 +1,524 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Trace with Python SDK - Tutorial\n",
+    "\n",
+    "This tutorial demonstrates how to use the Agenta Python SDK to trace your LLM applications. You'll learn how to:\n",
+    "\n",
+    "- Set up tracing with the Agenta SDK\n",
+    "- Instrument functions and OpenAI calls automatically\n",
+    "- Start and end spans manually to capture internals\n",
+    "- Reference prompt versions in your traces\n",
+    "- Redact sensitive data from traces\n",
+    "\n",
+    "## What You'll Build\n",
+    "\n",
+    "We'll create a simple LLM application that:\n",
+    "1. Uses OpenAI auto-instrumentation to trace API calls\n",
+    "2. Instruments custom functions to capture workflow steps\n",
+    "3. Stores internal data like retrieved context\n",
+    "4. Links traces to deployed prompt versions\n",
+    "5. Redacts sensitive information from traces"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pip install -U agenta openai opentelemetry-instrumentation-openai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "Before using the SDK, we need to initialize it with your API keys. The SDK requires:\n",
+    "- **Agenta API Key**: For sending traces to Agenta\n",
+    "- **OpenAI API Key**: For making LLM calls\n",
+    "\n",
+    "You can get your Agenta API key from the [API Keys page](https://cloud.agenta.ai/settings?tab=apiKeys)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "os.environ[\"AGENTA_HOST\"] = \"https://cloud.agenta.ai/\"  # Default value, change for self-hosted\n",
+    "os.environ[\"AGENTA_API_KEY\"] = \"\"\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import agenta as ag\n",
+    "from getpass import getpass\n",
+    "\n",
+    "# Initialize the SDK with your API key\n",
+    "api_key = os.getenv(\"AGENTA_API_KEY\")\n",
+    "if not api_key:\n",
+    "    os.environ[\"AGENTA_API_KEY\"] = getpass(\"Enter your Agenta API key: \")\n",
+    "\n",
+    "openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
+    "if not openai_api_key:\n",
+    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\n",
+    "\n",
+    "# Initialize the SDK\n",
+    "ag.init()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 1: Setup Tracing with OpenAI Auto-Instrumentation\n",
+    "\n",
+    "The Agenta SDK provides two powerful mechanisms for tracing:\n",
+    "\n",
+    "1. **Auto-instrumentation**: Automatically traces third-party libraries like OpenAI\n",
+    "2. **Function decorators**: Manually instrument your custom functions\n",
+    "\n",
+    "Let's start by setting up OpenAI auto-instrumentation, which will capture all OpenAI API calls automatically."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from opentelemetry.instrumentation.openai import OpenAIInstrumentor\n",
+    "import openai\n",
+    "\n",
+    "# Instrument OpenAI to automatically trace all API calls\n",
+    "OpenAIInstrumentor().instrument()\n",
+    "\n",
+    "print(\"OpenAI auto-instrumentation enabled!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 2: Instrument Functions\n",
+    "\n",
+    "Now let's create a simple function and instrument it using the `@ag.instrument()` decorator. This will create a span for the function and automatically capture its inputs and outputs.\n",
+    "\n",
+    "The decorator accepts a `spankind` parameter to categorize the span. Available types include: `agent`, `chain`, `workflow`, `tool`, `embedding`, `query`, `completion`, `chat`, `rerank`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument(spankind=\"workflow\")\n",
+    "def generate_story(topic: str):\n",
+    "    \"\"\"Generate a short story about the given topic.\"\"\"\n",
+    "    response = openai.chat.completions.create(\n",
+    "        model=\"gpt-3.5-turbo\",\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a creative storyteller.\"},\n",
+    "            {\"role\": \"user\", \"content\": f\"Write a short story about {topic}.\"},\n",
+    "        ],\n",
+    "    )\n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "# Test the instrumented function\n",
+    "story = generate_story(\"AI Engineering\")\n",
+    "print(\"Generated story:\")\n",
+    "print(story)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 3: Starting Spans and Storing Internals\n",
+    "\n",
+    "Sometimes you need to capture intermediate data that isn't part of the function's inputs or outputs. The SDK provides two methods:\n",
+    "\n",
+    "- `ag.tracing.store_meta()`: Add metadata to a span (saved under `ag.meta`)\n",
+    "- `ag.tracing.store_internals()`: Store internal data (saved under `ag.data.internals`)\n",
+    "\n",
+    "Internals are especially useful because they:\n",
+    "1. Are searchable using plain text queries\n",
+    "2. Appear in the overview tab of the observability drawer\n",
+    "\n",
+    "Let's create a RAG (Retrieval-Augmented Generation) example that captures the retrieved context:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument(spankind=\"tool\")\n",
+    "def retrieve_context(query: str):\n",
+    "    \"\"\"Simulate retrieving context from a knowledge base.\"\"\"\n",
+    "    # In a real application, this would query a vector database\n",
+    "    context = [\n",
+    "        \"Agenta is an open-source LLM developer platform.\",\n",
+    "        \"Agenta provides tools for prompt management, evaluation, and observability.\",\n",
+    "        \"The Agenta SDK supports tracing with OpenTelemetry.\",\n",
+    "    ]\n",
+    "    \n",
+    "    # Store metadata about the retrieval\n",
+    "    ag.tracing.store_meta({\n",
+    "        \"retrieval_method\": \"vector_search\",\n",
+    "        \"num_results\": len(context)\n",
+    "    })\n",
+    "    \n",
+    "    return context\n",
+    "\n",
+    "@ag.instrument(spankind=\"workflow\")\n",
+    "def rag_workflow(query: str):\n",
+    "    \"\"\"Answer a question using retrieved context.\"\"\"\n",
+    "    # Retrieve context\n",
+    "    context = retrieve_context(query)\n",
+    "    \n",
+    "    # Store the retrieved context as internals\n",
+    "    # This makes it visible in the UI and searchable\n",
+    "    ag.tracing.store_internals({\"retrieved_context\": context})\n",
+    "    \n",
+    "    # Generate answer using context\n",
+    "    context_str = \"\\n\".join(context)\n",
+    "    prompt = f\"Answer the following question based on the context:\\n\\nContext:\\n{context_str}\\n\\nQuestion: {query}\"\n",
+    "    \n",
+    "    response = openai.chat.completions.create(\n",
+    "        model=\"gpt-3.5-turbo\",\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a helpful assistant. Answer questions based only on the provided context.\"},\n",
+    "            {\"role\": \"user\", \"content\": prompt},\n",
+    "        ],\n",
+    "    )\n",
+    "    \n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "# Test the RAG workflow\n",
+    "answer = rag_workflow(\"What is Agenta?\")\n",
+    "print(\"Answer:\")\n",
+    "print(answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 4: Reference Prompt Versions\n",
+    "\n",
+    "One of Agenta's powerful features is linking traces to specific prompt versions. This allows you to:\n",
+    "- Filter traces by application, variant, or environment\n",
+    "- Compare performance across different variants\n",
+    "- Track production behavior\n",
+    "\n",
+    "Let's create a prompt using the SDK, deploy it, and then reference it in our traces."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create and Deploy a Prompt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "from agenta.sdk.types import PromptTemplate, Message, ModelConfig\nfrom pydantic import BaseModel\n\n# Define the prompt configuration\nclass Config(BaseModel):\n    prompt: PromptTemplate\n\nconfig = Config(\n    prompt=PromptTemplate(\n        messages=[\n            Message(role=\"system\", content=\"You are a helpful assistant that explains topics clearly.\"),\n            Message(role=\"user\", content=\"Explain {{topic}} in simple terms.\"),\n        ],\n        llm_config=ModelConfig(\n            model=\"gpt-3.5-turbo\",\n            max_tokens=200,\n            temperature=0.7,\n            top_p=1.0,\n            frequency_penalty=0.0,\n            presence_penalty=0.0,\n        ),\n        template_format=\"curly\"\n    )\n)\n\n# Create an application and variant\napp = ag.AppManager.create(\n    app_slug=\"topic-explainer-traced\",\n    template_key=\"SERVICE:completion\",\n)\n\nprint(f\"Created application: {app.app_name}\")\n\n# Create a variant with the prompt configuration\nvariant = ag.VariantManager.create(\n    parameters=config.model_dump(),\n    app_slug=\"topic-explainer-traced\",\n    variant_slug=\"production-variant\"\n)\n\nprint(f\"Created variant: {variant.variant_slug} (version {variant.variant_version})\")\n\n# Deploy to production environment\ndeployment = ag.DeploymentManager.deploy(\n    app_slug=\"topic-explainer-traced\",\n    variant_slug=\"production-variant\",\n    environment_slug=\"production\",\n)\n\nprint(f\"Deployed to {deployment.environment_slug} environment\")"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Reference the Prompt in Traces\n",
+    "\n",
+    "Now we'll create a function that uses the deployed prompt and links its traces to the application and environment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument(spankind=\"workflow\")\n",
+    "def explain_topic_with_prompt(topic: str):\n",
+    "    \"\"\"Explain a topic using the deployed prompt configuration.\"\"\"\n",
+    "    \n",
+    "    # Fetch the prompt configuration from production\n",
+    "    prompt_config = ag.ConfigManager.get_from_registry(\n",
+    "        app_slug=\"topic-explainer-traced\",\n",
+    "        environment_slug=\"production\"\n",
+    "    )\n",
+    "    \n",
+    "    # Format the prompt with the topic\n",
+    "    prompt_template = PromptTemplate(**prompt_config[\"prompt\"])\n",
+    "    formatted_prompt = prompt_template.format(topic=topic)\n",
+    "    \n",
+    "    # Make the OpenAI call\n",
+    "    response = openai.chat.completions.create(\n",
+    "        **formatted_prompt.to_openai_kwargs()\n",
+    "    )\n",
+    "    \n",
+    "    # Link this trace to the application and environment\n",
+    "    ag.tracing.store_refs({\n",
+    "        \"application.slug\": \"topic-explainer-traced\",\n",
+    "        \"variant.slug\": \"production-variant\",\n",
+    "        \"environment.slug\": \"production\",\n",
+    "    })\n",
+    "    \n",
+    "    return response.choices[0].message.content\n",
+    "\n",
+    "# Test the function\n",
+    "explanation = explain_topic_with_prompt(\"machine learning\")\n",
+    "print(\"Explanation:\")\n",
+    "print(explanation)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Part 5: Redact Sensitive Data\n",
+    "\n",
+    "When working with production data, you often need to exclude sensitive information from traces. The Agenta SDK provides several ways to redact data:\n",
+    "\n",
+    "1. **Simple redaction**: Ignore all inputs/outputs\n",
+    "2. **Selective redaction**: Ignore specific fields\n",
+    "3. **Custom redaction**: Use a callback function for fine-grained control\n",
+    "4. **Global redaction**: Apply rules across all instrumented functions\n",
+    "\n",
+    "Let's explore these different approaches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Simple Redaction: Ignore All Inputs/Outputs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument(\n",
+    "    spankind=\"workflow\",\n",
+    "    ignore_inputs=True,\n",
+    "    ignore_outputs=True\n",
+    ")\n",
+    "def process_sensitive_data(user_email: str, credit_card: str):\n",
+    "    \"\"\"Process sensitive data without logging inputs/outputs.\"\"\"\n",
+    "    # The function inputs and outputs won't be captured in the trace\n",
+    "    result = f\"Processed data for {user_email}\"\n",
+    "    return result\n",
+    "\n",
+    "# This trace will not contain inputs or outputs\n",
+    "result = process_sensitive_data(\"user@example.com\", \"4111-1111-1111-1111\")\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Selective Redaction: Ignore Specific Fields"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@ag.instrument(\n",
+    "    spankind=\"workflow\",\n",
+    "    ignore_inputs=[\"api_key\", \"password\"],\n",
+    "    ignore_outputs=[\"internal_token\"]\n",
+    ")\n",
+    "def authenticate_user(username: str, password: str, api_key: str):\n",
+    "    \"\"\"Authenticate a user (password and api_key will be redacted).\"\"\"\n",
+    "    # Simulate authentication\n",
+    "    return {\n",
+    "        \"username\": username,\n",
+    "        \"authenticated\": True,\n",
+    "        \"internal_token\": \"secret-token-12345\",  # This will be redacted\n",
+    "    }\n",
+    "\n",
+    "# The trace will show username but not password or api_key\n",
+    "auth_result = authenticate_user(\"john_doe\", \"secret123\", \"sk-abc123\")\n",
+    "print(f\"Authenticated: {auth_result['authenticated']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom Redaction: Use a Callback Function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re\n",
+    "\n",
+    "def redact_pii(name: str, field: str, data: dict):\n",
+    "    \"\"\"Custom redaction function that removes PII.\"\"\"\n",
+    "    if field == \"inputs\":\n",
+    "        # Redact email addresses\n",
+    "        if \"email\" in data:\n",
+    "            data[\"email\"] = \"[REDACTED]\"\n",
+    "        # Redact phone numbers\n",
+    "        if \"phone\" in data:\n",
+    "            data[\"phone\"] = \"[REDACTED]\"\n",
+    "    \n",
+    "    if field == \"outputs\":\n",
+    "        # Redact any credit card patterns\n",
+    "        if isinstance(data, dict):\n",
+    "            for key, value in data.items():\n",
+    "                if isinstance(value, str):\n",
+    "                    # Simple credit card pattern\n",
+    "                    data[key] = re.sub(r'\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}', '[CARD-REDACTED]', value)\n",
+    "    \n",
+    "    return data\n",
+    "\n",
+    "@ag.instrument(\n",
+    "    spankind=\"workflow\",\n",
+    "    redact=redact_pii,\n",
+    "    redact_on_error=False  # Don't apply redaction if it raises an error\n",
+    ")\n",
+    "def process_customer_order(name: str, email: str, phone: str, card_number: str):\n",
+    "    \"\"\"Process a customer order with PII redaction.\"\"\"\n",
+    "    return {\n",
+    "        \"status\": \"processed\",\n",
+    "        \"customer\": name,\n",
+    "        \"payment_info\": f\"Charged card ending in {card_number[-4:]}\",\n",
+    "        \"full_card\": card_number  # This will be redacted\n",
+    "    }\n",
+    "\n",
+    "# Test with sample data\n",
+    "order = process_customer_order(\n",
+    "    name=\"Jane Smith\",\n",
+    "    email=\"jane@example.com\",  # Will be redacted\n",
+    "    phone=\"555-1234\",  # Will be redacted\n",
+    "    card_number=\"4111-1111-1111-1111\"  # Will be redacted in output\n",
+    ")\n",
+    "print(f\"Order status: {order['status']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Global Redaction: Apply Rules Across All Functions\n",
+    "\n",
+    "For organization-wide policies, you can set up global redaction rules during initialization. Note: Since we already called `ag.init()`, this is just for demonstration. In a real application, you would set this during the initial setup."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example of global redaction setup (would be done during ag.init())\n",
+    "from typing import Dict, Any\n",
+    "\n",
+    "def global_redact_function(name: str, field: str, data: Dict[str, Any]):\n",
+    "    \"\"\"Global redaction that applies to all instrumented functions.\"\"\"\n",
+    "    # Remove any field containing 'api_key' or 'secret'\n",
+    "    if isinstance(data, dict):\n",
+    "        keys_to_redact = [k for k in data.keys() if 'api_key' in k.lower() or 'secret' in k.lower()]\n",
+    "        for key in keys_to_redact:\n",
+    "            data[key] = \"[REDACTED]\"\n",
+    "    \n",
+    "    return data\n",
+    "\n",
+    "# In production, you would initialize like this:\n",
+    "# ag.init(\n",
+    "#     redact=global_redact_function,\n",
+    "#     redact_on_error=True\n",
+    "# )\n",
+    "\n",
+    "print(\"Global redaction would be configured during ag.init()\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "In this tutorial, you learned how to:\n",
+    "\n",
+    "1. ✅ **Set up tracing** with the Agenta SDK and OpenAI auto-instrumentation\n",
+    "2. ✅ **Instrument functions** using the `@ag.instrument()` decorator\n",
+    "3. ✅ **Store internals and metadata** to capture intermediate data in your workflows\n",
+    "4. ✅ **Reference prompt versions** by creating, deploying, and linking traces to applications\n",
+    "5. ✅ **Redact sensitive data** using multiple approaches for privacy protection\n",
+    "\n",
+    "## Next Steps\n",
+    "\n",
+    "- Explore [distributed tracing](/observability/trace-with-python-sdk/distributed-tracing) for multi-service applications\n",
+    "- Learn about [cost tracking](/observability/trace-with-python-sdk/track-costs) to monitor LLM expenses\n",
+    "- Understand [trace annotations](/observability/trace-with-python-sdk/annotate-traces) for collecting feedback\n",
+    "- Check out the [Agenta UI guide](/observability/using-the-ui/filtering-traces) for filtering and analyzing traces"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.0"
+  },
+  "colab": {
+   "provenance": []
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/examples/jupyter/observability_langchain.ipynb b/examples/jupyter/observability_langchain.ipynb
index c41c77c996..0fa0e070e0 100644
--- a/examples/jupyter/observability_langchain.ipynb
+++ b/examples/jupyter/observability_langchain.ipynb
@@ -184,7 +184,7 @@
     "\n",
     "loader = WebBaseLoader(\n",
     "    web_paths=(\n",
-    "        \"https://docs.agenta.ai/prompt-engineering/prompt-management/prompt-management-sdk\",\n",
+    "        \"https://docs.agenta.ai/prompt-engineering/managing-prompts-programatically/create-and-commit\",\n",
     "    ),\n",
     "    bs_kwargs=dict(parse_only=bs4.SoupStrainer(\"article\")),  # Only parse the core\n",
     ")\n",
diff --git a/examples/node/observability-opentelemetry/.gitignore b/examples/node/observability-opentelemetry/.gitignore
new file mode 100644
index 0000000000..2cd0f44931
--- /dev/null
+++ b/examples/node/observability-opentelemetry/.gitignore
@@ -0,0 +1,4 @@
+node_modules/
+.env
+*.log
+
diff --git a/examples/node/observability-opentelemetry/README.md b/examples/node/observability-opentelemetry/README.md
new file mode 100644
index 0000000000..9d61b4984f
--- /dev/null
+++ b/examples/node/observability-opentelemetry/README.md
@@ -0,0 +1,68 @@
+# OpenTelemetry Quick Start Example
+
+This example demonstrates how to instrument a Node.js application with OpenTelemetry and send traces to Agenta.
+
+## Prerequisites
+
+- Node.js 18+ installed
+- Agenta API key ([get one here](https://cloud.agenta.ai))
+- OpenAI API key
+
+## Version Compatibility
+
+This example uses the latest stable versions:
+- **OpenAI SDK** (`latest` - v6.x)
+- **OpenInference Instrumentation** (`latest` - v3.x)
+- **OpenInference Semantic Conventions** (`latest` - v2.x)
+
+## Setup
+
+1. Install dependencies:
+```bash
+npm install
+```
+
+2. Create a `.env` file with your credentials:
+```bash
+cp .env.example .env
+# Edit .env and add your API keys
+```
+
+3. Run the example:
+```bash
+npm start
+```
+
+## What's Happening?
+
+1. **instrumentation.js** - Configures OpenTelemetry to:
+   - Send traces to Agenta via OTLP
+   - Automatically instrument OpenAI calls using OpenInference
+   - Use SimpleSpanProcessor for immediate export (ideal for short scripts)
+
+2. **app.js** - A simple application that:
+   - Creates a manual span using Agenta's semantic conventions
+   - Calls OpenAI's chat completion API (auto-instrumented)
+   - Demonstrates proper use of `ag.data.inputs`, `ag.data.outputs`, and `ag.data.internals`
+
+3. All traces are sent to Agenta where you can:
+   - View the complete trace timeline
+   - See inputs/outputs with proper formatting
+   - Monitor costs and latency
+   - Debug issues
+
+## Semantic Conventions
+
+This example follows Agenta's semantic conventions for proper trace display:
+
+- **`ag.type.node`** - Defines the operation type (workflow, task, tool, etc.)
+- **`ag.data.inputs`** - Stores input parameters as JSON
+- **`ag.data.outputs`** - Stores output results as JSON
+- **`ag.data.internals`** - Stores intermediate values and metadata
+
+See [SEMANTIC_CONVENTIONS.md](./SEMANTIC_CONVENTIONS.md) for detailed documentation.
+
+## View Your Traces
+
+After running the example, log in to [Agenta](https://cloud.agenta.ai) and navigate to the Observability section to see your traces!
+
diff --git a/examples/node/observability-opentelemetry/app.js b/examples/node/observability-opentelemetry/app.js
new file mode 100644
index 0000000000..b3dce7e794
--- /dev/null
+++ b/examples/node/observability-opentelemetry/app.js
@@ -0,0 +1,80 @@
+// app.js
+import OpenAI from "openai";
+import { trace } from "@opentelemetry/api";
+
+const openai = new OpenAI({
+    apiKey: process.env.OPENAI_API_KEY,
+});
+
+const tracer = trace.getTracer("test-app", "1.0.0");
+
+async function generate() {
+    // Create a manual span using Agenta's semantic conventions
+    // This demonstrates how to manually instrument functions with proper attributes
+    return tracer.startActiveSpan("generate", async (span) => {
+        try {
+            // Define the messages for the chat completion
+            const messages = [
+                { role: "system", content: "You are a helpful assistant." },
+                { role: "user", content: "Write a short story about AI Engineering." },
+            ];
+
+            // Agenta Semantic Convention: ag.type.node
+            // Defines the type of operation (workflow, task, tool, etc.)
+            span.setAttribute("ag.type.node", "workflow");
+
+            // Agenta Semantic Convention: ag.data.inputs
+            // Stores the input parameters as JSON
+            span.setAttribute("ag.data.inputs", JSON.stringify({
+                messages: messages,
+                model: "gpt-3.5-turbo"
+            }));
+
+            const response = await openai.chat.completions.create({
+                model: "gpt-3.5-turbo",
+                messages: messages,
+            });
+
+            const content = response.choices[0].message.content;
+
+            // Agenta Semantic Convention: ag.data.internals
+            // Stores intermediate values and metadata (optional)
+            span.setAttribute("ag.data.internals", JSON.stringify({
+                response_length: content.length
+            }));
+
+            // Agenta Semantic Convention: ag.data.outputs
+            // Stores the output results as JSON
+            span.setAttribute("ag.data.outputs", JSON.stringify({
+                content: content
+            }));
+
+            return content;
+        } finally {
+            span.end();
+        }
+    });
+}
+
+async function main() {
+    try {
+        const result = await generate();
+        console.log("\n" + result);
+
+        console.log("\n⏳ Flushing traces...");
+        // Ensure traces are flushed before exit
+        const tracerProvider = trace.getTracerProvider();
+        if (tracerProvider && typeof tracerProvider.forceFlush === 'function') {
+            await tracerProvider.forceFlush();
+        }
+        // Extra wait to ensure export completes
+        await new Promise(resolve => setTimeout(resolve, 1000));
+        console.log("✅ Done!");
+    } catch (error) {
+        console.error("❌ Error:", error.message);
+        process.exit(1);
+    }
+}
+
+main();
+
diff --git a/examples/node/observability-opentelemetry/instrumentation.js b/examples/node/observability-opentelemetry/instrumentation.js
new file mode 100644
index 0000000000..4921de9bf7
--- /dev/null
+++ b/examples/node/observability-opentelemetry/instrumentation.js
@@ -0,0 +1,74 @@
+// instrumentation.js
+import { registerInstrumentations } from "@opentelemetry/instrumentation";
+import { OpenAIInstrumentation } from "@arizeai/openinference-instrumentation-openai";
+import { diag, DiagConsoleLogger, DiagLogLevel } from "@opentelemetry/api";
+import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-proto";
+import { Resource } from "@opentelemetry/resources";
+import { BatchSpanProcessor, SimpleSpanProcessor } from "@opentelemetry/sdk-trace-base";
+import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
+import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
+import { SEMRESATTRS_PROJECT_NAME } from "@arizeai/openinference-semantic-conventions";
+import OpenAI from "openai";
+
+// For troubleshooting, set the log level to DiagLogLevel.DEBUG
+diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);
+
+// Get Agenta configuration from environment variables
+const AGENTA_HOST = process.env.AGENTA_HOST || "https://cloud.agenta.ai";
+const AGENTA_API_KEY = process.env.AGENTA_API_KEY;
+
+if (!AGENTA_API_KEY) {
+    console.error("❌ AGENTA_API_KEY environment variable is required");
+    process.exit(1);
+}
+
+// Configure the OTLP exporter to send traces to Agenta
+const otlpExporter = new OTLPTraceExporter({
+    url: `${AGENTA_HOST}/api/otlp/v1/traces`,
+    headers: {
+        Authorization: `ApiKey ${AGENTA_API_KEY}`,
+    },
+    timeoutMillis: 5000, // 5 second timeout
+});
+
+// Add logging to the exporter for debugging
+const originalExport = otlpExporter.export.bind(otlpExporter);
+otlpExporter.export = function (spans, resultCallback) {
+    console.log(`📤 Exporting ${spans.length} span(s)...`);
+    originalExport(spans, (result) => {
+        if (result.code === 0) {
+            console.log('✅ Spans exported successfully');
+        } else {
+            console.error('❌ Export failed:', result.error);
+        }
+        resultCallback(result);
+    });
+};
+
+// Create and configure the tracer provider
+const tracerProvider = new NodeTracerProvider({
+    resource: new Resource({
+        [ATTR_SERVICE_NAME]: "openai-quickstart",
+        // Project name in Agenta, defaults to "default"
+        [SEMRESATTRS_PROJECT_NAME]: "openai-quickstart",
+    }),
+});
+
+// Use SimpleSpanProcessor for immediate export (better for short-lived scripts)
+// For long-running services, use: new BatchSpanProcessor(otlpExporter)
+tracerProvider.addSpanProcessor(new SimpleSpanProcessor(otlpExporter));
+
+// Register the tracer provider
+tracerProvider.register();
+
+// Register OpenAI instrumentation with manual instrumentation
+// This is required for OpenInference to properly instrument the OpenAI client
+const instrumentation = new OpenAIInstrumentation();
+instrumentation.manuallyInstrument(OpenAI);
+
+registerInstrumentations({
+    instrumentations: [instrumentation],
+});
+
+console.log("✅ OpenTelemetry instrumentation initialized");
+
diff --git a/examples/node/observability-opentelemetry/package-lock.json b/examples/node/observability-opentelemetry/package-lock.json
new file mode 100644
index 0000000000..c9c0ef0ae5
--- /dev/null
+++ b/examples/node/observability-opentelemetry/package-lock.json
@@ -0,0 +1,935 @@
+{
+    "name": "agenta-opentelemetry-quickstart",
+    "version": "1.0.0",
+    "lockfileVersion": 3,
+    "requires": true,
+    "packages": {
+        "": {
+            "name": "agenta-opentelemetry-quickstart",
+            "version": "1.0.0",
+            "dependencies": {
+                "@arizeai/openinference-instrumentation-openai": "^3.2.3",
+                "@arizeai/openinference-semantic-conventions": "^2.1.2",
+                "@opentelemetry/api": "^1.9.0",
+                "@opentelemetry/exporter-trace-otlp-proto": "^0.54.0",
+                "@opentelemetry/instrumentation": "^0.54.0",
+                "@opentelemetry/resources": "^1.28.0",
+                "@opentelemetry/sdk-trace-base": "^1.28.0",
+                "@opentelemetry/sdk-trace-node": "^1.28.0",
+                "@opentelemetry/semantic-conventions": "^1.28.0",
+                "openai": "^6.7.0"
+            },
+            "devDependencies": {
+                "@types/node": "^22.0.0",
+                "typescript": "^5.7.0"
+            }
+        },
+        "node_modules/@arizeai/openinference-core": {
+            "version": "1.0.7",
+            "resolved": "https://registry.npmjs.org/@arizeai/openinference-core/-/openinference-core-1.0.7.tgz",
+            "integrity": "sha512-O9WYkrHNh/0mGTV+T9SWC3tkxVrT16gBrFiByG3aukBsqdOfSzoRj6QINk+Oi+VEDNIoQUzVQPFh81/gEL/thA==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@arizeai/openinference-semantic-conventions": "2.1.2",
+                "@opentelemetry/api": "^1.9.0",
+                "@opentelemetry/core": "^1.25.1"
+            }
+        },
+        "node_modules/@arizeai/openinference-instrumentation-openai": {
+            "version": "3.2.3",
+            "resolved": "https://registry.npmjs.org/@arizeai/openinference-instrumentation-openai/-/openinference-instrumentation-openai-3.2.3.tgz",
+            "integrity": "sha512-4GiUyUIafNkAf7milnsS4Xjh0rZwjPvhHu+xkcj2MQ19a0Teu6PcFDa7DCt7YV/xw2o+wcxeaLfVURmGGLAHOg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@arizeai/openinference-core": "1.0.7",
+                "@arizeai/openinference-semantic-conventions": "2.1.2",
+                "@opentelemetry/api": "^1.9.0",
+                "@opentelemetry/core": "^1.25.1",
+                "@opentelemetry/instrumentation": "^0.46.0"
+            }
+        },
+        "node_modules/@arizeai/openinference-instrumentation-openai/node_modules/@opentelemetry/instrumentation": {
+            "version": "0.46.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/instrumentation/-/instrumentation-0.46.0.tgz",
+            "integrity": "sha512-a9TijXZZbk0vI5TGLZl+0kxyFfrXHhX6Svtz7Pp2/VBlCSKrazuULEyoJQrOknJyFWNMEmbbJgOciHCCpQcisw==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@types/shimmer": "^1.0.2",
+                "import-in-the-middle": "1.7.1",
+                "require-in-the-middle": "^7.1.1",
+                "semver": "^7.5.2",
+                "shimmer": "^1.2.1"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": "^1.3.0"
+            }
+        },
+        "node_modules/@arizeai/openinference-instrumentation-openai/node_modules/import-in-the-middle": {
+            "version": "1.7.1",
+            "resolved": "https://registry.npmjs.org/import-in-the-middle/-/import-in-the-middle-1.7.1.tgz",
+            "integrity": "sha512-1LrZPDtW+atAxH42S6288qyDFNQ2YCty+2mxEPRtfazH6Z5QwkaBSTS2ods7hnVJioF6rkRfNoA6A/MstpFXLg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "acorn": "^8.8.2",
+                "acorn-import-assertions": "^1.9.0",
+                "cjs-module-lexer": "^1.2.2",
+                "module-details-from-path": "^1.0.3"
+            }
+        },
+        "node_modules/@arizeai/openinference-semantic-conventions": {
+            "version": "2.1.2",
+            "resolved": "https://registry.npmjs.org/@arizeai/openinference-semantic-conventions/-/openinference-semantic-conventions-2.1.2.tgz",
+            "integrity": "sha512-u7UeuU9bJ1LxzHk0MPWb+1ZcotCcJwPnKDXi7Rl2cPs1pWMFg9Ogq7zzYZX+sDcibD2AEa1U+ElyOD8DwZc9gw==",
+            "license": "Apache-2.0"
+        },
+        "node_modules/@opentelemetry/api": {
+            "version": "1.9.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/api/-/api-1.9.0.tgz",
+            "integrity": "sha512-3giAOQvZiH5F9bMlMiv8+GSPMeqg0dbaeo58/0SlA9sxSqZhnUtxzX9/2FzyhS9sWQf5S0GJE0AKBrFqjpeYcg==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=8.0.0"
+            }
+        },
+        "node_modules/@opentelemetry/api-logs": {
+            "version": "0.54.2",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/api-logs/-/api-logs-0.54.2.tgz",
+            "integrity": "sha512-4MTVwwmLgUh5QrJnZpYo6YRO5IBLAggf2h8gWDblwRagDStY13aEvt7gGk3jewrMaPlHiF83fENhIx0HO97/cQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/api": "^1.3.0"
+            },
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/context-async-hooks": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/context-async-hooks/-/context-async-hooks-1.30.1.tgz",
+            "integrity": "sha512-s5vvxXPVdjqS3kTLKMeBMvop9hbWkwzBpu+mUO2M7sZtlkyDJGwFe33wRKnbaYDo8ExRVBIIdwIGrqpxHuKttA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/core": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/core/-/core-1.27.0.tgz",
+            "integrity": "sha512-yQPKnK5e+76XuiqUH/gKyS8wv/7qITd5ln56QkBTf3uggr0VkXOXfcaAuG330UfdYu83wsyoBwqwxigpIG+Jkg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/core/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.27.0.tgz",
+            "integrity": "sha512-sAay1RrB+ONOem0OZanAR1ZI/k7yDpnOQSQmTMuGImUQb2y8EbSaCJ94FQluM74xoU03vlb2d2U90hZluL6nQg==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/exporter-trace-otlp-proto": {
+            "version": "0.54.2",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/exporter-trace-otlp-proto/-/exporter-trace-otlp-proto-0.54.2.tgz",
+            "integrity": "sha512-XSmm1N2wAhoWDXP1q/N6kpLebWaxl6VIADv4WA5QWKHLRpF3gLz5NAWNJBR8ygsvv8jQcrwnXgwfnJ18H3v1fg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/otlp-exporter-base": "0.54.2",
+                "@opentelemetry/otlp-transformer": "0.54.2",
+                "@opentelemetry/resources": "1.27.0",
+                "@opentelemetry/sdk-trace-base": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": "^1.3.0"
+            }
+        },
+        "node_modules/@opentelemetry/exporter-trace-otlp-proto/node_modules/@opentelemetry/resources": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/resources/-/resources-1.27.0.tgz",
+            "integrity": "sha512-jOwt2VJ/lUD5BLc+PMNymDrUCpm5PKi1E9oSVYAvz01U/VdndGmrtV3DU1pG4AwlYhJRHbHfOUIlpBeXCPw6QQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/exporter-trace-otlp-proto/node_modules/@opentelemetry/sdk-trace-base": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/sdk-trace-base/-/sdk-trace-base-1.27.0.tgz",
+            "integrity": "sha512-btz6XTQzwsyJjombpeqCX6LhiMQYpzt2pIYNPnw0IPO/3AhT6yjnf8Mnv3ZC2A4eRYOjqrg+bfaXg9XHDRJDWQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/resources": "1.27.0",
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/exporter-trace-otlp-proto/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.27.0.tgz",
+            "integrity": "sha512-sAay1RrB+ONOem0OZanAR1ZI/k7yDpnOQSQmTMuGImUQb2y8EbSaCJ94FQluM74xoU03vlb2d2U90hZluL6nQg==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/instrumentation": {
+            "version": "0.54.2",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/instrumentation/-/instrumentation-0.54.2.tgz",
+            "integrity": "sha512-go6zpOVoZVztT9r1aPd79Fr3OWiD4N24bCPJsIKkBses8oyFo12F/Ew3UBTdIu6hsW4HC4MVEJygG6TEyJI/lg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/api-logs": "0.54.2",
+                "@types/shimmer": "^1.2.0",
+                "import-in-the-middle": "^1.8.1",
+                "require-in-the-middle": "^7.1.1",
+                "semver": "^7.5.2",
+                "shimmer": "^1.2.1"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": "^1.3.0"
+            }
+        },
+        "node_modules/@opentelemetry/otlp-exporter-base": {
+            "version": "0.54.2",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/otlp-exporter-base/-/otlp-exporter-base-0.54.2.tgz",
+            "integrity": "sha512-NrNyxu6R/bGAwanhz1HI0aJWKR6xUED4TjCH4iWMlAfyRukGbI9Kt/Akd2sYLwRKNhfS+sKetKGCUQPMDyYYMA==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/otlp-transformer": "0.54.2"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": "^1.3.0"
+            }
+        },
+        "node_modules/@opentelemetry/otlp-transformer": {
+            "version": "0.54.2",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/otlp-transformer/-/otlp-transformer-0.54.2.tgz",
+            "integrity": "sha512-2tIjahJlMRRUz0A2SeE+qBkeBXBFkSjR0wqJ08kuOqaL8HNGan5iZf+A8cfrfmZzPUuMKCyY9I+okzFuFs6gKQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/api-logs": "0.54.2",
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/resources": "1.27.0",
+                "@opentelemetry/sdk-logs": "0.54.2",
+                "@opentelemetry/sdk-metrics": "1.27.0",
+                "@opentelemetry/sdk-trace-base": "1.27.0",
+                "protobufjs": "^7.3.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": "^1.3.0"
+            }
+        },
+        "node_modules/@opentelemetry/otlp-transformer/node_modules/@opentelemetry/resources": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/resources/-/resources-1.27.0.tgz",
+            "integrity": "sha512-jOwt2VJ/lUD5BLc+PMNymDrUCpm5PKi1E9oSVYAvz01U/VdndGmrtV3DU1pG4AwlYhJRHbHfOUIlpBeXCPw6QQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/otlp-transformer/node_modules/@opentelemetry/sdk-trace-base": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/sdk-trace-base/-/sdk-trace-base-1.27.0.tgz",
+            "integrity": "sha512-btz6XTQzwsyJjombpeqCX6LhiMQYpzt2pIYNPnw0IPO/3AhT6yjnf8Mnv3ZC2A4eRYOjqrg+bfaXg9XHDRJDWQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/resources": "1.27.0",
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/otlp-transformer/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.27.0.tgz",
+            "integrity": "sha512-sAay1RrB+ONOem0OZanAR1ZI/k7yDpnOQSQmTMuGImUQb2y8EbSaCJ94FQluM74xoU03vlb2d2U90hZluL6nQg==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/propagator-b3": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/propagator-b3/-/propagator-b3-1.30.1.tgz",
+            "integrity": "sha512-oATwWWDIJzybAZ4pO76ATN5N6FFbOA1otibAVlS8v90B4S1wClnhRUk7K+2CHAwN1JKYuj4jh/lpCEG5BAqFuQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.30.1"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/propagator-b3/node_modules/@opentelemetry/core": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/core/-/core-1.30.1.tgz",
+            "integrity": "sha512-OOCM2C/QIURhJMuKaekP3TRBxBKxG/TWWA0TL2J6nXUtDnuCtccy49LUJF8xPFXMX+0LMcxFpCo8M9cGY1W6rQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/propagator-b3/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.28.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.28.0.tgz",
+            "integrity": "sha512-lp4qAiMTD4sNWW4DbKLBkfiMZ4jbAboJIGOQr5DvciMRI494OapieI9qiODpOt0XBr1LjIDy1xAGAnVs5supTA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/propagator-jaeger": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/propagator-jaeger/-/propagator-jaeger-1.30.1.tgz",
+            "integrity": "sha512-Pj/BfnYEKIOImirH76M4hDaBSx6HyZ2CXUqk+Kj02m6BB80c/yo4BdWkn/1gDFfU+YPY+bPR2U0DKBfdxCKwmg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.30.1"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/propagator-jaeger/node_modules/@opentelemetry/core": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/core/-/core-1.30.1.tgz",
+            "integrity": "sha512-OOCM2C/QIURhJMuKaekP3TRBxBKxG/TWWA0TL2J6nXUtDnuCtccy49LUJF8xPFXMX+0LMcxFpCo8M9cGY1W6rQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/propagator-jaeger/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.28.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.28.0.tgz",
+            "integrity": "sha512-lp4qAiMTD4sNWW4DbKLBkfiMZ4jbAboJIGOQr5DvciMRI494OapieI9qiODpOt0XBr1LjIDy1xAGAnVs5supTA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/resources": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/resources/-/resources-1.30.1.tgz",
+            "integrity": "sha512-5UxZqiAgLYGFjS4s9qm5mBVo433u+dSPUFWVWXmLAD4wB65oMCoXaJP1KJa9DIYYMeHu3z4BZcStG3LC593cWA==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.30.1",
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/resources/node_modules/@opentelemetry/core": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/core/-/core-1.30.1.tgz",
+            "integrity": "sha512-OOCM2C/QIURhJMuKaekP3TRBxBKxG/TWWA0TL2J6nXUtDnuCtccy49LUJF8xPFXMX+0LMcxFpCo8M9cGY1W6rQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/resources/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.28.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.28.0.tgz",
+            "integrity": "sha512-lp4qAiMTD4sNWW4DbKLBkfiMZ4jbAboJIGOQr5DvciMRI494OapieI9qiODpOt0XBr1LjIDy1xAGAnVs5supTA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-logs": {
+            "version": "0.54.2",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/sdk-logs/-/sdk-logs-0.54.2.tgz",
+            "integrity": "sha512-yIbYqDLS/AtBbPjCjh6eSToGNRMqW2VR8RrKEy+G+J7dFG7pKoptTH5T+XlKPleP9NY8JZYIpgJBlI+Osi0rFw==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/api-logs": "0.54.2",
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/resources": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.4.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-logs/node_modules/@opentelemetry/resources": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/resources/-/resources-1.27.0.tgz",
+            "integrity": "sha512-jOwt2VJ/lUD5BLc+PMNymDrUCpm5PKi1E9oSVYAvz01U/VdndGmrtV3DU1pG4AwlYhJRHbHfOUIlpBeXCPw6QQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-logs/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.27.0.tgz",
+            "integrity": "sha512-sAay1RrB+ONOem0OZanAR1ZI/k7yDpnOQSQmTMuGImUQb2y8EbSaCJ94FQluM74xoU03vlb2d2U90hZluL6nQg==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-metrics": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/sdk-metrics/-/sdk-metrics-1.27.0.tgz",
+            "integrity": "sha512-JzWgzlutoXCydhHWIbLg+r76m+m3ncqvkCcsswXAQ4gqKS+LOHKhq+t6fx1zNytvLuaOUBur7EvWxECc4jPQKg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/resources": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.3.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-metrics/node_modules/@opentelemetry/resources": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/resources/-/resources-1.27.0.tgz",
+            "integrity": "sha512-jOwt2VJ/lUD5BLc+PMNymDrUCpm5PKi1E9oSVYAvz01U/VdndGmrtV3DU1pG4AwlYhJRHbHfOUIlpBeXCPw6QQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.27.0",
+                "@opentelemetry/semantic-conventions": "1.27.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-metrics/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.27.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.27.0.tgz",
+            "integrity": "sha512-sAay1RrB+ONOem0OZanAR1ZI/k7yDpnOQSQmTMuGImUQb2y8EbSaCJ94FQluM74xoU03vlb2d2U90hZluL6nQg==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-trace-base": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/sdk-trace-base/-/sdk-trace-base-1.30.1.tgz",
+            "integrity": "sha512-jVPgBbH1gCy2Lb7X0AVQ8XAfgg0pJ4nvl8/IiQA6nxOsPvS+0zMJaFSs2ltXe0J6C8dqjcnpyqINDJmU30+uOg==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/core": "1.30.1",
+                "@opentelemetry/resources": "1.30.1",
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-trace-base/node_modules/@opentelemetry/core": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/core/-/core-1.30.1.tgz",
+            "integrity": "sha512-OOCM2C/QIURhJMuKaekP3TRBxBKxG/TWWA0TL2J6nXUtDnuCtccy49LUJF8xPFXMX+0LMcxFpCo8M9cGY1W6rQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-trace-base/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.28.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.28.0.tgz",
+            "integrity": "sha512-lp4qAiMTD4sNWW4DbKLBkfiMZ4jbAboJIGOQr5DvciMRI494OapieI9qiODpOt0XBr1LjIDy1xAGAnVs5supTA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-trace-node": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/sdk-trace-node/-/sdk-trace-node-1.30.1.tgz",
+            "integrity": "sha512-cBjYOINt1JxXdpw1e5MlHmFRc5fgj4GW/86vsKFxJCJ8AL4PdVtYH41gWwl4qd4uQjqEL1oJVrXkSy5cnduAnQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/context-async-hooks": "1.30.1",
+                "@opentelemetry/core": "1.30.1",
+                "@opentelemetry/propagator-b3": "1.30.1",
+                "@opentelemetry/propagator-jaeger": "1.30.1",
+                "@opentelemetry/sdk-trace-base": "1.30.1",
+                "semver": "^7.5.2"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-trace-node/node_modules/@opentelemetry/core": {
+            "version": "1.30.1",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/core/-/core-1.30.1.tgz",
+            "integrity": "sha512-OOCM2C/QIURhJMuKaekP3TRBxBKxG/TWWA0TL2J6nXUtDnuCtccy49LUJF8xPFXMX+0LMcxFpCo8M9cGY1W6rQ==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "@opentelemetry/semantic-conventions": "1.28.0"
+            },
+            "engines": {
+                "node": ">=14"
+            },
+            "peerDependencies": {
+                "@opentelemetry/api": ">=1.0.0 <1.10.0"
+            }
+        },
+        "node_modules/@opentelemetry/sdk-trace-node/node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.28.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.28.0.tgz",
+            "integrity": "sha512-lp4qAiMTD4sNWW4DbKLBkfiMZ4jbAboJIGOQr5DvciMRI494OapieI9qiODpOt0XBr1LjIDy1xAGAnVs5supTA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@opentelemetry/semantic-conventions": {
+            "version": "1.37.0",
+            "resolved": "https://registry.npmjs.org/@opentelemetry/semantic-conventions/-/semantic-conventions-1.37.0.tgz",
+            "integrity": "sha512-JD6DerIKdJGmRp4jQyX5FlrQjA4tjOw1cvfsPAZXfOOEErMUHjPcPSICS+6WnM0nB0efSFARh0KAZss+bvExOA==",
+            "license": "Apache-2.0",
+            "engines": {
+                "node": ">=14"
+            }
+        },
+        "node_modules/@protobufjs/aspromise": {
+            "version": "1.1.2",
+            "resolved": "https://registry.npmjs.org/@protobufjs/aspromise/-/aspromise-1.1.2.tgz",
+            "integrity": "sha512-j+gKExEuLmKwvz3OgROXtrJ2UG2x8Ch2YZUxahh+s1F2HZ+wAceUNLkvy6zKCPVRkU++ZWQrdxsUeQXmcg4uoQ==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/base64": {
+            "version": "1.1.2",
+            "resolved": "https://registry.npmjs.org/@protobufjs/base64/-/base64-1.1.2.tgz",
+            "integrity": "sha512-AZkcAA5vnN/v4PDqKyMR5lx7hZttPDgClv83E//FMNhR2TMcLUhfRUBHCmSl0oi9zMgDDqRUJkSxO3wm85+XLg==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/codegen": {
+            "version": "2.0.4",
+            "resolved": "https://registry.npmjs.org/@protobufjs/codegen/-/codegen-2.0.4.tgz",
+            "integrity": "sha512-YyFaikqM5sH0ziFZCN3xDC7zeGaB/d0IUb9CATugHWbd1FRFwWwt4ld4OYMPWu5a3Xe01mGAULCdqhMlPl29Jg==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/eventemitter": {
+            "version": "1.1.0",
+            "resolved": "https://registry.npmjs.org/@protobufjs/eventemitter/-/eventemitter-1.1.0.tgz",
+            "integrity": "sha512-j9ednRT81vYJ9OfVuXG6ERSTdEL1xVsNgqpkxMsbIabzSo3goCjDIveeGv5d03om39ML71RdmrGNjG5SReBP/Q==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/fetch": {
+            "version": "1.1.0",
+            "resolved": "https://registry.npmjs.org/@protobufjs/fetch/-/fetch-1.1.0.tgz",
+            "integrity": "sha512-lljVXpqXebpsijW71PZaCYeIcE5on1w5DlQy5WH6GLbFryLUrBD4932W/E2BSpfRJWseIL4v/KPgBFxDOIdKpQ==",
+            "license": "BSD-3-Clause",
+            "dependencies": {
+                "@protobufjs/aspromise": "^1.1.1",
+                "@protobufjs/inquire": "^1.1.0"
+            }
+        },
+        "node_modules/@protobufjs/float": {
+            "version": "1.0.2",
+            "resolved": "https://registry.npmjs.org/@protobufjs/float/-/float-1.0.2.tgz",
+            "integrity": "sha512-Ddb+kVXlXst9d+R9PfTIxh1EdNkgoRe5tOX6t01f1lYWOvJnSPDBlG241QLzcyPdoNTsblLUdujGSE4RzrTZGQ==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/inquire": {
+            "version": "1.1.0",
+            "resolved": "https://registry.npmjs.org/@protobufjs/inquire/-/inquire-1.1.0.tgz",
+            "integrity": "sha512-kdSefcPdruJiFMVSbn801t4vFK7KB/5gd2fYvrxhuJYg8ILrmn9SKSX2tZdV6V+ksulWqS7aXjBcRXl3wHoD9Q==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/path": {
+            "version": "1.1.2",
+            "resolved": "https://registry.npmjs.org/@protobufjs/path/-/path-1.1.2.tgz",
+            "integrity": "sha512-6JOcJ5Tm08dOHAbdR3GrvP+yUUfkjG5ePsHYczMFLq3ZmMkAD98cDgcT2iA1lJ9NVwFd4tH/iSSoe44YWkltEA==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/pool": {
+            "version": "1.1.0",
+            "resolved": "https://registry.npmjs.org/@protobufjs/pool/-/pool-1.1.0.tgz",
+            "integrity": "sha512-0kELaGSIDBKvcgS4zkjz1PeddatrjYcmMWOlAuAPwAeccUrPHdUqo/J6LiymHHEiJT5NrF1UVwxY14f+fy4WQw==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@protobufjs/utf8": {
+            "version": "1.1.0",
+            "resolved": "https://registry.npmjs.org/@protobufjs/utf8/-/utf8-1.1.0.tgz",
+            "integrity": "sha512-Vvn3zZrhQZkkBE8LSuW3em98c0FwgO4nxzv6OdSxPKJIEKY2bGbHn+mhGIPerzI4twdxaP8/0+06HBpwf345Lw==",
+            "license": "BSD-3-Clause"
+        },
+        "node_modules/@types/node": {
+            "version": "22.18.12",
+            "resolved": "https://registry.npmjs.org/@types/node/-/node-22.18.12.tgz",
+            "integrity": "sha512-BICHQ67iqxQGFSzfCFTT7MRQ5XcBjG5aeKh5Ok38UBbPe5fxTyE+aHFxwVrGyr8GNlqFMLKD1D3P2K/1ks8tog==",
+            "license": "MIT",
+            "dependencies": {
+                "undici-types": "~6.21.0"
+            }
+        },
+        "node_modules/@types/shimmer": {
+            "version": "1.2.0",
+            "resolved": "https://registry.npmjs.org/@types/shimmer/-/shimmer-1.2.0.tgz",
+            "integrity": "sha512-UE7oxhQLLd9gub6JKIAhDq06T0F6FnztwMNRvYgjeQSBeMc1ZG/tA47EwfduvkuQS8apbkM/lpLpWsaCeYsXVg==",
+            "license": "MIT"
+        },
+        "node_modules/acorn": {
+            "version": "8.15.0",
+            "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.15.0.tgz",
+            "integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
+            "license": "MIT",
+            "bin": {
+                "acorn": "bin/acorn"
+            },
+            "engines": {
+                "node": ">=0.4.0"
+            }
+        },
+        "node_modules/acorn-import-assertions": {
+            "version": "1.9.0",
+            "resolved": "https://registry.npmjs.org/acorn-import-assertions/-/acorn-import-assertions-1.9.0.tgz",
+            "integrity": "sha512-cmMwop9x+8KFhxvKrKfPYmN6/pKTYYHBqLa0DfvVZcKMJWNyWLnaqND7dx/qn66R7ewM1UX5XMaDVP5wlVTaVA==",
+            "deprecated": "package has been renamed to acorn-import-attributes",
+            "license": "MIT",
+            "peerDependencies": {
+                "acorn": "^8"
+            }
+        },
+        "node_modules/acorn-import-attributes": {
+            "version": "1.9.5",
+            "resolved": "https://registry.npmjs.org/acorn-import-attributes/-/acorn-import-attributes-1.9.5.tgz",
+            "integrity": "sha512-n02Vykv5uA3eHGM/Z2dQrcD56kL8TyDb2p1+0P83PClMnC/nc+anbQRhIOWnSq4Ke/KvDPrY3C9hDtC/A3eHnQ==",
+            "license": "MIT",
+            "peerDependencies": {
+                "acorn": "^8"
+            }
+        },
+        "node_modules/cjs-module-lexer": {
+            "version": "1.4.3",
+            "resolved": "https://registry.npmjs.org/cjs-module-lexer/-/cjs-module-lexer-1.4.3.tgz",
+            "integrity": "sha512-9z8TZaGM1pfswYeXrUpzPrkx8UnWYdhJclsiYMm6x/w5+nN+8Tf/LnAgfLGQCm59qAOxU8WwHEq2vNwF6i4j+Q==",
+            "license": "MIT"
+        },
+        "node_modules/debug": {
+            "version": "4.4.3",
+            "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz",
+            "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==",
+            "license": "MIT",
+            "dependencies": {
+                "ms": "^2.1.3"
+            },
+            "engines": {
+                "node": ">=6.0"
+            },
+            "peerDependenciesMeta": {
+                "supports-color": {
+                    "optional": true
+                }
+            }
+        },
+        "node_modules/function-bind": {
+            "version": "1.1.2",
+            "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
+            "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
+            "license": "MIT",
+            "funding": {
+                "url": "https://github.com/sponsors/ljharb"
+            }
+        },
+        "node_modules/hasown": {
+            "version": "2.0.2",
+            "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
+            "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
+            "license": "MIT",
+            "dependencies": {
+                "function-bind": "^1.1.2"
+            },
+            "engines": {
+                "node": ">= 0.4"
+            }
+        },
+        "node_modules/import-in-the-middle": {
+            "version": "1.15.0",
+            "resolved": "https://registry.npmjs.org/import-in-the-middle/-/import-in-the-middle-1.15.0.tgz",
+            "integrity": "sha512-bpQy+CrsRmYmoPMAE/0G33iwRqwW4ouqdRg8jgbH3aKuCtOc8lxgmYXg2dMM92CRiGP660EtBcymH/eVUpCSaA==",
+            "license": "Apache-2.0",
+            "dependencies": {
+                "acorn": "^8.14.0",
+                "acorn-import-attributes": "^1.9.5",
+                "cjs-module-lexer": "^1.2.2",
+                "module-details-from-path": "^1.0.3"
+            }
+        },
+        "node_modules/is-core-module": {
+            "version": "2.16.1",
+            "resolved": "https://registry.npmjs.org/is-core-module/-/is-core-module-2.16.1.tgz",
+            "integrity": "sha512-UfoeMA6fIJ8wTYFEUjelnaGI67v6+N7qXJEvQuIGa99l4xsCruSYOVSQ0uPANn4dAzm8lkYPaKLrrijLq7x23w==",
+            "license": "MIT",
+            "dependencies": {
+                "hasown": "^2.0.2"
+            },
+            "engines": {
+                "node": ">= 0.4"
+            },
+            "funding": {
+                "url": "https://github.com/sponsors/ljharb"
+            }
+        },
+        "node_modules/long": {
+            "version": "5.3.2",
+            "resolved": "https://registry.npmjs.org/long/-/long-5.3.2.tgz",
+            "integrity": "sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==",
+            "license": "Apache-2.0"
+        },
+        "node_modules/module-details-from-path": {
+            "version": "1.0.4",
+            "resolved": "https://registry.npmjs.org/module-details-from-path/-/module-details-from-path-1.0.4.tgz",
+            "integrity": "sha512-EGWKgxALGMgzvxYF1UyGTy0HXX/2vHLkw6+NvDKW2jypWbHpjQuj4UMcqQWXHERJhVGKikolT06G3bcKe4fi7w==",
+            "license": "MIT"
+        },
+        "node_modules/ms": {
+            "version": "2.1.3",
+            "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
+            "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==",
+            "license": "MIT"
+        },
+        "node_modules/openai": {
+            "version": "6.7.0",
+            "resolved": "https://registry.npmjs.org/openai/-/openai-6.7.0.tgz",
+            "integrity": "sha512-mgSQXa3O/UXTbA8qFzoa7aydbXBJR5dbLQXCRapAOtoNT+v69sLdKMZzgiakpqhclRnhPggPAXoniVGn2kMY2A==",
+            "license": "Apache-2.0",
+            "bin": {
+                "openai": "bin/cli"
+            },
+            "peerDependencies": {
+                "ws": "^8.18.0",
+                "zod": "^3.25 || ^4.0"
+            },
+            "peerDependenciesMeta": {
+                "ws": {
+                    "optional": true
+                },
+                "zod": {
+                    "optional": true
+                }
+            }
+        },
+        "node_modules/path-parse": {
+            "version": "1.0.7",
+            "resolved": "https://registry.npmjs.org/path-parse/-/path-parse-1.0.7.tgz",
+            "integrity": "sha512-LDJzPVEEEPR+y48z93A0Ed0yXb8pAByGWo/k5YYdYgpY2/2EsOsksJrq7lOHxryrVOn1ejG6oAp8ahvOIQD8sw==",
+            "license": "MIT"
+        },
+        "node_modules/protobufjs": {
+            "version": "7.5.4",
+            "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.5.4.tgz",
+            "integrity": "sha512-CvexbZtbov6jW2eXAvLukXjXUW1TzFaivC46BpWc/3BpcCysb5Vffu+B3XHMm8lVEuy2Mm4XGex8hBSg1yapPg==",
+            "hasInstallScript": true,
+            "license": "BSD-3-Clause",
+            "dependencies": {
+                "@protobufjs/aspromise": "^1.1.2",
+                "@protobufjs/base64": "^1.1.2",
+                "@protobufjs/codegen": "^2.0.4",
+                "@protobufjs/eventemitter": "^1.1.0",
+                "@protobufjs/fetch": "^1.1.0",
+                "@protobufjs/float": "^1.0.2",
+                "@protobufjs/inquire": "^1.1.0",
+                "@protobufjs/path": "^1.1.2",
+                "@protobufjs/pool": "^1.1.0",
+                "@protobufjs/utf8": "^1.1.0",
+                "@types/node": ">=13.7.0",
+                "long": "^5.0.0"
+            },
+            "engines": {
+                "node": ">=12.0.0"
+            }
+        },
+        "node_modules/require-in-the-middle": {
+            "version": "7.5.2",
+            "resolved": "https://registry.npmjs.org/require-in-the-middle/-/require-in-the-middle-7.5.2.tgz",
+            "integrity": "sha512-gAZ+kLqBdHarXB64XpAe2VCjB7rIRv+mU8tfRWziHRJ5umKsIHN2tLLv6EtMw7WCdP19S0ERVMldNvxYCHnhSQ==",
+            "license": "MIT",
+            "dependencies": {
+                "debug": "^4.3.5",
+                "module-details-from-path": "^1.0.3",
+                "resolve": "^1.22.8"
+            },
+            "engines": {
+                "node": ">=8.6.0"
+            }
+        },
+        "node_modules/resolve": {
+            "version": "1.22.11",
+            "resolved": "https://registry.npmjs.org/resolve/-/resolve-1.22.11.tgz",
+            "integrity": "sha512-RfqAvLnMl313r7c9oclB1HhUEAezcpLjz95wFH4LVuhk9JF/r22qmVP9AMmOU4vMX7Q8pN8jwNg/CSpdFnMjTQ==",
+            "license": "MIT",
+            "dependencies": {
+                "is-core-module": "^2.16.1",
+                "path-parse": "^1.0.7",
+                "supports-preserve-symlinks-flag": "^1.0.0"
+            },
+            "bin": {
+                "resolve": "bin/resolve"
+            },
+            "engines": {
+                "node": ">= 0.4"
+            },
+            "funding": {
+                "url": "https://github.com/sponsors/ljharb"
+            }
+        },
+        "node_modules/semver": {
+            "version": "7.7.3",
+            "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.3.tgz",
+            "integrity": "sha512-SdsKMrI9TdgjdweUSR9MweHA4EJ8YxHn8DFaDisvhVlUOe4BF1tLD7GAj0lIqWVl+dPb/rExr0Btby5loQm20Q==",
+            "license": "ISC",
+            "bin": {
+                "semver": "bin/semver.js"
+            },
+            "engines": {
+                "node": ">=10"
+            }
+        },
+        "node_modules/shimmer": {
+            "version": "1.2.1",
+            "resolved": "https://registry.npmjs.org/shimmer/-/shimmer-1.2.1.tgz",
+            "integrity": "sha512-sQTKC1Re/rM6XyFM6fIAGHRPVGvyXfgzIDvzoq608vM+jeyVD0Tu1E6Np0Kc2zAIFWIj963V2800iF/9LPieQw==",
+            "license": "BSD-2-Clause"
+        },
+        "node_modules/supports-preserve-symlinks-flag": {
+            "version": "1.0.0",
+            "resolved": "https://registry.npmjs.org/supports-preserve-symlinks-flag/-/supports-preserve-symlinks-flag-1.0.0.tgz",
+            "integrity": "sha512-ot0WnXS9fgdkgIcePe6RHNk1WA8+muPa6cSjeR3V8K27q9BB1rTE3R1p7Hv0z1ZyAc8s6Vvv8DIyWf681MAt0w==",
+            "license": "MIT",
+            "engines": {
+                "node": ">= 0.4"
+            },
+            "funding": {
+                "url": "https://github.com/sponsors/ljharb"
+            }
+        },
+        "node_modules/typescript": {
+            "version": "5.9.3",
+            "resolved": "https://registry.npmjs.org/typescript/-/typescript-5.9.3.tgz",
+            "integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
+            "dev": true,
+            "license": "Apache-2.0",
+            "bin": {
+                "tsc": "bin/tsc",
+                "tsserver": "bin/tsserver"
+            },
+            "engines": {
+                "node": ">=14.17"
+            }
+        },
+        "node_modules/undici-types": {
+            "version": "6.21.0",
+            "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz",
+            "integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==",
+            "license": "MIT"
+        }
+    }
+}
diff --git a/examples/node/observability-opentelemetry/package.json b/examples/node/observability-opentelemetry/package.json
new file mode 100644
index 0000000000..bc74d98dae
--- /dev/null
+++ b/examples/node/observability-opentelemetry/package.json
@@ -0,0 +1,25 @@
+{
+    "name": "agenta-opentelemetry-quickstart",
+    "version": "1.0.0",
+    "description": "Quick start example for using OpenTelemetry with Agenta",
+    "type": "module",
+    "scripts": {
+        "start": "node --import ./instrumentation.js app.js"
+    },
+    "dependencies": {
+        "@arizeai/openinference-instrumentation-openai": "^3.2.3",
+        "@arizeai/openinference-semantic-conventions": "^2.1.2",
+        "@opentelemetry/api": "^1.9.0",
+        "@opentelemetry/exporter-trace-otlp-proto": "^0.54.0",
+        "@opentelemetry/instrumentation": "^0.54.0",
+        "@opentelemetry/resources": "^1.28.0",
+        "@opentelemetry/sdk-trace-base": "^1.28.0",
+        "@opentelemetry/sdk-trace-node": "^1.28.0",
+        "@opentelemetry/semantic-conventions": "^1.28.0",
+        "openai": "^6.7.0"
+    },
+    "devDependencies": {
+        "@types/node": "^22.0.0",
+        "typescript": "^5.7.0"
+    }
+}
diff --git a/examples/node/observability-opentelemetry/test.sh b/examples/node/observability-opentelemetry/test.sh
new file mode 100755
index 0000000000..56fb36d592
--- /dev/null
+++ b/examples/node/observability-opentelemetry/test.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+# Test script for OpenTelemetry example
+
+# Check if required environment variables are set
+if [ -z "$AGENTA_API_KEY" ]; then
+    echo "❌ Error: AGENTA_API_KEY is not set"
+    echo "   Set it with: export AGENTA_API_KEY='your_key_here'"
+    exit 1
+fi
+
+if [ -z "$OPENAI_API_KEY" ]; then
+    echo "❌ Error: OPENAI_API_KEY is not set"
+    echo "   Set it with: export OPENAI_API_KEY='your_key_here'"
+    exit 1
+fi
+
+# Set default AGENTA_HOST if not provided
+if [ -z "$AGENTA_HOST" ]; then
+    export AGENTA_HOST="https://cloud.staging.agenta.ai"
+    echo "ℹ️  Using default AGENTA_HOST: $AGENTA_HOST"
+fi
+
+echo "✅ Environment variables configured"
+echo ""
+echo "🚀 Running example..."
+echo ""
+
+npm start
+
diff --git a/sdk/README.md b/sdk/README.md
index 4424f046d2..822ce18c0a 100644
--- a/sdk/README.md
+++ b/sdk/README.md
@@ -83,11 +83,11 @@ Agenta is a platform for building production-grade LLM applications. It helps **
 Collaborate with Subject Matter Experts (SMEs) on prompt engineering and make sure nothing breaks in production.
 
 - **Interactive Playground**: Compare prompts side by side against your test cases
-- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/adding-custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
+- **Multi-Model Support**: Experiment with 50+ LLM models or [bring-your-own models](https://docs.agenta.ai/prompt-engineering/playground/custom-providers?utm_source=github&utm_medium=referral&utm_campaign=readme)
 - **Version Control**: Version prompts and configurations with branching and environments
 - **Complex Configurations**: Enable SMEs to collaborate on [complex configuration schemas](https://docs.agenta.ai/custom-workflows/overview?utm_source=github&utm_medium=referral&utm_campaign=readme) beyond simple prompts
 
-[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/overview?utm_source=github&utm_medium=referral&utm_campaign=readme)
+[Explore prompt management →](https://docs.agenta.ai/prompt-engineering/concepts?utm_source=github&utm_medium=referral&utm_campaign=readme)
 
 ### 📊 Evaluation & Testing
 Evaluate your LLM applications systematically with both human and automated feedback.
diff --git a/sdk/pyproject.toml b/sdk/pyproject.toml
index 513d95a1a3..5aa52439e8 100644
--- a/sdk/pyproject.toml
+++ b/sdk/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "agenta"
-version = "0.59.11"
+version = "0.59.12"
 description = "The SDK for agenta is an open-source LLMOps platform."
 readme = "README.md"
 authors = [
diff --git a/web/ee/package.json b/web/ee/package.json
index f7599d7d1d..75562561fe 100644
--- a/web/ee/package.json
+++ b/web/ee/package.json
@@ -1,6 +1,6 @@
 {
     "name": "@agenta/ee",
-    "version": "0.59.11",
+    "version": "0.59.12",
     "private": true,
     "engines": {
         "node": ">=18"
diff --git a/web/oss/package.json b/web/oss/package.json
index ae1c659ae2..4eca96aa3d 100644
--- a/web/oss/package.json
+++ b/web/oss/package.json
@@ -1,6 +1,6 @@
 {
     "name": "@agenta/oss",
-    "version": "0.59.11",
+    "version": "0.59.12",
     "private": true,
     "engines": {
         "node": ">=18"
diff --git a/web/package.json b/web/package.json
index ce1ef87cde..a107e3ca2d 100644
--- a/web/package.json
+++ b/web/package.json
@@ -1,6 +1,6 @@
 {
     "name": "agenta-web",
-    "version": "0.59.11",
+    "version": "0.59.12",
     "workspaces": [
         "ee",
         "oss",