From 9a54d96dcc49738029a59ed2e43b632f1f37a1df Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Wed, 19 Feb 2025 18:33:48 +0000 Subject: [PATCH 01/22] Guardails docs --- .../docs/ai-gateway/guardrails/index.mdx | 17 +++++++ .../guardrails/set-up-guardrail.mdx | 48 +++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 src/content/docs/ai-gateway/guardrails/index.mdx create mode 100644 src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx new file mode 100644 index 000000000000000..ced111a0ee28cec --- /dev/null +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -0,0 +1,17 @@ +--- +title: Guardrails in AI Gateway +pcx_content_type: navigation +order: 1 +sidebar: + order: 8 + group: + badge: Beta +--- + +Guardrails in AI Gateway help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and model providers (such as OpenAI, Anthropic, DeepSeek, and others), Guardrails ensures a consistent and secure experience across your entire AI ecosystem. + +Guardrails proactively monitor interactions between users and AI models, allowing you to: + +Enhance safety: Protect users by detecting and mitigating harmful content. +Improve compliance: Meet evolving regulatory standards. +Reduce costs: Prevent unnecessary processing by blocking harmful requests early. diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx new file mode 100644 index 000000000000000..8a71f4b61d44dfe --- /dev/null +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -0,0 +1,48 @@ +--- +pcx_content_type: how-to +title: How Guardrails works +sidebar: + order: 3 +--- + +AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Here’s a breakdown of the process: + +1. **Intercepting interactions:** + AI Gateway sits between the user and the AI model, intercepting every prompt and response. + +2. **Evaluating content:** + + - **User prompts:** When a user sends a prompt, AI Gateway checks it against safety parameters (for example violence, hate, or sexual content). Based on your configuration, the system can either flag the prompt or block it before it reaches the AI model. + - **Model responses:** After processing, the AI model’s response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. + +3. **Model-specific behavior:** + + - **Text generation models:** Both prompts and responses are evaluated. + - **Embedding models:** Only the prompt is evaluated, and the response is passed directly back to the user. + - **Catalogued models:** If the model type is identifiable, only the prompt is evaluated; the response bypasses Guardrails and is delivered directly. + +4. **Real-time observability:** + Detailed logs provide visibility into user queries and model outputs, allowing you to monitor interactions continuously and adjust safety parameters as needed. + +## Configuration + +Within AI Gateway settings, you can tailor the Guardrails feature to your requirements: + +- **Guardrails:** Enable or disable content moderation. +- **Evaluation scope:** Choose to analyse user prompts, model responses, or both. +- **Hazard categories:** Define specific categories (such as violence, hate, or sexual content) to monitor, and set actions for each category (ignore, flag, or block). + +## Leveraging Llama Guard on Workers AI + +Guardrails is powered by [**Llama Guard**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), Meta’s open-source content moderation tool designed for real-time safety monitoring. AI Gateway uses the [**Llama Guard 3 8B model**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), hosted on [**Workers AI**](/workers-ai/) to drive its safety features. This model is continuously updated to adapt to emerging safety challenges. + +## Additional considerations + +- **Workers AI usage:** + Enabling Guardrails incurs usage on Workers AI. Monitor your usage through the Workers AI Dashboard. + +- **Latency impact:** + Evaluating both the request and the response introduces extra latency. Factor this into your deployment planning. + +- **Model availability:** + If the underlying model is unavailable, requests that are flagged will proceed; however, requests set to be blocked will result in an error. From befd14a12dd784aa590eac2b39f3e53307fee0ba Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Fri, 21 Feb 2025 17:27:51 +0000 Subject: [PATCH 02/22] minor fixes --- .../docs/ai-gateway/guardrails/index.mdx | 10 +- .../guardrails/set-up-guardrail.mdx | 107 ++++++++++++++---- 2 files changed, 88 insertions(+), 29 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx index ced111a0ee28cec..810584f1bed3210 100644 --- a/src/content/docs/ai-gateway/guardrails/index.mdx +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -1,5 +1,5 @@ --- -title: Guardrails in AI Gateway +title: Guardrails pcx_content_type: navigation order: 1 sidebar: @@ -8,10 +8,10 @@ sidebar: badge: Beta --- -Guardrails in AI Gateway help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and model providers (such as OpenAI, Anthropic, DeepSeek, and others), Guardrails ensures a consistent and secure experience across your entire AI ecosystem. +Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers)(such as OpenAI, Anthropic, DeepSeek, and others), Guardrails ensures a consistent and secure experience across your entire AI ecosystem. Guardrails proactively monitor interactions between users and AI models, allowing you to: -Enhance safety: Protect users by detecting and mitigating harmful content. -Improve compliance: Meet evolving regulatory standards. -Reduce costs: Prevent unnecessary processing by blocking harmful requests early. +- Protect users by detecting and mitigating harmful content. +- Meet compliance requirements by aligning with evolving regulatory standards. +- Optimize costs by preventing unnecessary processing of harmful requests early. diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 8a71f4b61d44dfe..e99971d556f6306 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -5,44 +5,103 @@ sidebar: order: 3 --- -AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Here’s a breakdown of the process: +AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: -1. **Intercepting interactions:** - AI Gateway sits between the user and the AI model, intercepting every prompt and response. +1. Intercepting interactions: + AI Gateway proxies requests and responses, sitting between the user and the AI model. -2. **Evaluating content:** +2. Inspecting content: - - **User prompts:** When a user sends a prompt, AI Gateway checks it against safety parameters (for example violence, hate, or sexual content). Based on your configuration, the system can either flag the prompt or block it before it reaches the AI model. - - **Model responses:** After processing, the AI model’s response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. + - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. + - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. -3. **Model-specific behavior:** +3. Applying actions: + Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. - - **Text generation models:** Both prompts and responses are evaluated. - - **Embedding models:** Only the prompt is evaluated, and the response is passed directly back to the user. - - **Catalogued models:** If the model type is identifiable, only the prompt is evaluated; the response bypasses Guardrails and is delivered directly. +## Supported model types -4. **Real-time observability:** - Detailed logs provide visibility into user queries and model outputs, allowing you to monitor interactions continuously and adjust safety parameters as needed. +Guardrails determines the type of AI model being used and applies safety checks accordingly: + +- Text generation models: Both prompts and responses are evaluated. +- Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user. +- Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails. + +If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed. + +## Configuration + +Within AI Gateway settings, you can customize Guardrails: + +- Enable or disable content moderation. +- Choose evaluation scope: Analyze user prompts, model responses, or both. +- Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). + +## Workers AI and Guardrails + +Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. + +Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. + +## Additional considerations + +- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. + +:::note + +Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. + +## ::: + +pcx_content_type: how-to +title: How Guardrails works +sidebar: +order: 3 + +--- + +AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: + +1. Intercepting interactions: + AI Gateway proxies requests and responses, sitting between the user and the AI model. + +2. Inspecting content: + + - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. + - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. + +3. Applying actions: + Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. + +## Supported model types + +Guardrails determines the type of AI model being used and applies safety checks accordingly: + +- Text generation models: Both prompts and responses are evaluated. +- Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user. +- Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails. + +If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed. ## Configuration -Within AI Gateway settings, you can tailor the Guardrails feature to your requirements: +Within AI Gateway settings, you can customize Guardrails: -- **Guardrails:** Enable or disable content moderation. -- **Evaluation scope:** Choose to analyse user prompts, model responses, or both. -- **Hazard categories:** Define specific categories (such as violence, hate, or sexual content) to monitor, and set actions for each category (ignore, flag, or block). +- Enable or disable content moderation. +- Choose evaluation scope: Analyze user prompts, model responses, or both. +- Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). -## Leveraging Llama Guard on Workers AI +## Workers AI and Guardrails -Guardrails is powered by [**Llama Guard**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), Meta’s open-source content moderation tool designed for real-time safety monitoring. AI Gateway uses the [**Llama Guard 3 8B model**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), hosted on [**Workers AI**](/workers-ai/) to drive its safety features. This model is continuously updated to adapt to emerging safety challenges. +Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. + +Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. ## Additional considerations -- **Workers AI usage:** - Enabling Guardrails incurs usage on Workers AI. Monitor your usage through the Workers AI Dashboard. +- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. + +:::note -- **Latency impact:** - Evaluating both the request and the response introduces extra latency. Factor this into your deployment planning. +Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. -- **Model availability:** - If the underlying model is unavailable, requests that are flagged will proceed; however, requests set to be blocked will result in an error. +::: From 04ee35741b3c63990531c4eb2c7ff7a6adb9f93f Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Mon, 24 Feb 2025 09:36:46 -0800 Subject: [PATCH 03/22] Update index.mdx updated wording --- src/content/docs/ai-gateway/guardrails/index.mdx | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx index 810584f1bed3210..504831d5474e736 100644 --- a/src/content/docs/ai-gateway/guardrails/index.mdx +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -8,10 +8,11 @@ sidebar: badge: Beta --- -Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers)(such as OpenAI, Anthropic, DeepSeek, and others), Guardrails ensures a consistent and secure experience across your entire AI ecosystem. +Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers) (such as OpenAI, Anthropic, DeepSeek, and others), AI Gateway's Guardrails ensure a consistent and secure experience across your entire AI ecosystem. -Guardrails proactively monitor interactions between users and AI models, allowing you to: +Guardrails proactively monitor interactions between users and AI models, giving you: -- Protect users by detecting and mitigating harmful content. -- Meet compliance requirements by aligning with evolving regulatory standards. -- Optimize costs by preventing unnecessary processing of harmful requests early. +- **Consistent moderation**: Uniform moderation layer that works across models and providers. +- **Enhanced safety and user trust**: Proactively protect users from harmful or inappropriate interactions. +- **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking +- **Auditing and compliance capabilities**: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails.Protect users by detecting and mitigating harmful content. From 842d0d3b2667a471fd356d005c6af9e36baaab79 Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Mon, 24 Feb 2025 10:52:47 -0800 Subject: [PATCH 04/22] Update index.mdx deleted extra sentence --- src/content/docs/ai-gateway/guardrails/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx index 504831d5474e736..7a36a780b960334 100644 --- a/src/content/docs/ai-gateway/guardrails/index.mdx +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -15,4 +15,4 @@ Guardrails proactively monitor interactions between users and AI models, giving - **Consistent moderation**: Uniform moderation layer that works across models and providers. - **Enhanced safety and user trust**: Proactively protect users from harmful or inappropriate interactions. - **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking -- **Auditing and compliance capabilities**: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails.Protect users by detecting and mitigating harmful content. +- **Auditing and compliance capabilities**: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails. From 0f907b4939bb3bf2acb62a4d679768eb4f4e7c2c Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Mon, 24 Feb 2025 13:48:24 -0800 Subject: [PATCH 05/22] Update set-up-guardrail.mdx updated wording in supported model types section --- .../docs/ai-gateway/guardrails/set-up-guardrail.mdx | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index e99971d556f6306..2c51a0d39300faf 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -20,13 +20,11 @@ AI Gateway inspects all interactions in real time by evaluating content against ## Supported model types -Guardrails determines the type of AI model being used and applies safety checks accordingly: - -- Text generation models: Both prompts and responses are evaluated. -- Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user. -- Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails. +AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: -If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed. +- **Text generation models**: Both prompts and responses are evaluated. +- **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. +- **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. ## Configuration From 86d8f6b981b5d8fe5e4b533f841324613d5f1adc Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 12:00:39 +0000 Subject: [PATCH 06/22] Update src/content/docs/ai-gateway/guardrails/index.mdx Co-authored-by: Kody Jackson --- src/content/docs/ai-gateway/guardrails/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx index 7a36a780b960334..233953b909a9402 100644 --- a/src/content/docs/ai-gateway/guardrails/index.mdx +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -8,7 +8,7 @@ sidebar: badge: Beta --- -Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers) (such as OpenAI, Anthropic, DeepSeek, and others), AI Gateway's Guardrails ensure a consistent and secure experience across your entire AI ecosystem. +Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers/) (such as OpenAI, Anthropic, DeepSeek, and others), AI Gateway's Guardrails ensure a consistent and secure experience across your entire AI ecosystem. Guardrails proactively monitor interactions between users and AI models, giving you: From 451be8f6b8cd1b8c1bbc510f921b2975c95aa324 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 12:06:18 +0000 Subject: [PATCH 07/22] removed duplicate --- .../guardrails/set-up-guardrail.mdx | 67 ------------------- 1 file changed, 67 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 2c51a0d39300faf..9ed0b221e1bfb84 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -5,19 +5,6 @@ sidebar: order: 3 --- -AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: - -1. Intercepting interactions: - AI Gateway proxies requests and responses, sitting between the user and the AI model. - -2. Inspecting content: - - - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. - - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. - -3. Applying actions: - Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. - ## Supported model types AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: @@ -48,58 +35,4 @@ Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. -## ::: - -pcx_content_type: how-to -title: How Guardrails works -sidebar: -order: 3 - ---- - -AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: - -1. Intercepting interactions: - AI Gateway proxies requests and responses, sitting between the user and the AI model. - -2. Inspecting content: - - - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. - - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. - -3. Applying actions: - Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. - -## Supported model types - -Guardrails determines the type of AI model being used and applies safety checks accordingly: - -- Text generation models: Both prompts and responses are evaluated. -- Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user. -- Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails. - -If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed. - -## Configuration - -Within AI Gateway settings, you can customize Guardrails: - -- Enable or disable content moderation. -- Choose evaluation scope: Analyze user prompts, model responses, or both. -- Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). - -## Workers AI and Guardrails - -Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. - -Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. - -## Additional considerations - -- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. - -:::note - -Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. - ::: From ba3ab973aad3939be0c6801bdae4ac336c6c9316 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 12:15:37 +0000 Subject: [PATCH 08/22] moved details --- src/content/docs/ai-gateway/guardrails/index.mdx | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx index 233953b909a9402..5582475f78e08bc 100644 --- a/src/content/docs/ai-gateway/guardrails/index.mdx +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -16,3 +16,18 @@ Guardrails proactively monitor interactions between users and AI models, giving - **Enhanced safety and user trust**: Proactively protect users from harmful or inappropriate interactions. - **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking - **Auditing and compliance capabilities**: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails. + +## How Guardrails work + +AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: + +1. Intercepting interactions: + AI Gateway proxies requests and responses, sitting between the user and the AI model. + +2. Inspecting content: + + - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. + - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. + +3. Applying actions: + Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. From c1378e96f734a245e73890d801dd8ae6b01658f1 Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Tue, 25 Feb 2025 09:59:20 -0800 Subject: [PATCH 09/22] Update set-up-guardrail.mdx 3 changes 1) moved configuration first, before supported model types 2) moved note about llamaguard to llamaguard section 3) added link to workers ai dashboard --- .../guardrails/set-up-guardrail.mdx | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 9ed0b221e1bfb84..26acd6c8371e160 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -5,14 +5,6 @@ sidebar: order: 3 --- -## Supported model types - -AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: - -- **Text generation models**: Both prompts and responses are evaluated. -- **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. -- **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. - ## Configuration Within AI Gateway settings, you can customize Guardrails: @@ -21,18 +13,27 @@ Within AI Gateway settings, you can customize Guardrails: - Choose evaluation scope: Analyze user prompts, model responses, or both. - Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). -## Workers AI and Guardrails +## Supported model types -Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. +AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: -Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. +- **Text generation models**: Both prompts and responses are evaluated. +- **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. +- **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. -## Additional considerations +## Workers AI and Guardrails -- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. +Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. + +Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the [Workers AI Dashboard](https://dash.cloudflare.com/?to=/:account/ai/workers-ai). :::note + Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. ::: + +## Additional considerations + +- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. From 728c906c075b31adeb9d4c832d434ec38ec5da8c Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 18:09:25 +0000 Subject: [PATCH 10/22] changes to docs --- .../guardrails/set-up-guardrail.mdx | 8 ------- .../guardrails/supported-model-types.mdx | 14 ++++++++++++ .../guardrails/usage-considerations.mdx | 22 +++++++++++++++++++ 3 files changed, 36 insertions(+), 8 deletions(-) create mode 100644 src/content/docs/ai-gateway/guardrails/supported-model-types.mdx create mode 100644 src/content/docs/ai-gateway/guardrails/usage-considerations.mdx diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 9ed0b221e1bfb84..26ffbefec4f393f 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -5,14 +5,6 @@ sidebar: order: 3 --- -## Supported model types - -AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: - -- **Text generation models**: Both prompts and responses are evaluated. -- **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. -- **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. - ## Configuration Within AI Gateway settings, you can customize Guardrails: diff --git a/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx b/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx new file mode 100644 index 000000000000000..b8270da1a7cf0a1 --- /dev/null +++ b/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx @@ -0,0 +1,14 @@ +--- +pcx_content_type: reference +title: Supported model types +sidebar: + order: 3 +--- + +## Supported model types + +AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: + +- **Text generation models**: Both prompts and responses are evaluated. +- **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. +- **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx new file mode 100644 index 000000000000000..1ad6d6df6afb5df --- /dev/null +++ b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx @@ -0,0 +1,22 @@ +--- +pcx_content_type: reference +title: Supported model types +sidebar: + order: 4 +--- + +## Workers AI and Guardrails + +Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. + +Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. + +## Additional considerations + +- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. + +:::note + +Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. + +::: From 766a73645dff236e09f03217e492c22f4bac473e Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 18:12:18 +0000 Subject: [PATCH 11/22] Merged --- .../guardrails/set-up-guardrail.mdx | 25 ------------------- 1 file changed, 25 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 26acd6c8371e160..c92653977a5af30 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -12,28 +12,3 @@ Within AI Gateway settings, you can customize Guardrails: - Enable or disable content moderation. - Choose evaluation scope: Analyze user prompts, model responses, or both. - Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). - -## Supported model types - -AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: - -- **Text generation models**: Both prompts and responses are evaluated. -- **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. -- **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. - -## Workers AI and Guardrails - -Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. - -Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the [Workers AI Dashboard](https://dash.cloudflare.com/?to=/:account/ai/workers-ai). - -:::note - - -Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. - -::: - -## Additional considerations - -- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. From 2a6fcdb8e3182ad55f65104b4e4b2170052b8d36 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 18:30:00 +0000 Subject: [PATCH 12/22] title --- src/content/docs/ai-gateway/guardrails/usage-considerations.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx index 1ad6d6df6afb5df..279e74849196051 100644 --- a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx +++ b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx @@ -1,6 +1,6 @@ --- pcx_content_type: reference -title: Supported model types +title: Usage considerations sidebar: order: 4 --- From 1f051a4e5830709954da115675137a4f177a5576 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 18:30:47 +0000 Subject: [PATCH 13/22] title --- src/content/docs/ai-gateway/guardrails/usage-considerations.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx index 279e74849196051..af269dd35ee6aaf 100644 --- a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx +++ b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx @@ -5,8 +5,6 @@ sidebar: order: 4 --- -## Workers AI and Guardrails - Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. From 21f09561f1068c7016a1be11f165cfbcb5f88f79 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 19:09:06 +0000 Subject: [PATCH 14/22] add setup details --- .../guardrails/set-up-guardrail.mdx | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index c92653977a5af30..74877c840d02168 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -1,6 +1,6 @@ --- pcx_content_type: how-to -title: How Guardrails works +title: Setting up Guardrails sidebar: order: 3 --- @@ -12,3 +12,24 @@ Within AI Gateway settings, you can customize Guardrails: - Enable or disable content moderation. - Choose evaluation scope: Analyze user prompts, model responses, or both. - Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). + +This tutorial will guide you through the process of setting up and customizing Guardrails in your AI Gateway using the Cloudflare dashboard. + +## 1. Log in to the dashboard + +1. Log into the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account. +2. Go to AI > AI Gateway. + +## 2. Access the Settings Tab + +In the AI Gateway section, click on the Settings tab. +Confirm that Guardrails is enabled. + +## 3. Set Security Hazard on Prompt or Response + +Within the Guardrails settings, you can choose where to apply security hazards: + +- On Prompt: Guardrails will evaluate and transform incoming prompts based on your security policies. +- On Response: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. + +Select the option that best fits your use case. You can modify this setting at any time according to your needs. From e05221e656f9d5554326a41cef108459c8b73061 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 19:10:57 +0000 Subject: [PATCH 15/22] spelling --- .../docs/ai-gateway/guardrails/set-up-guardrail.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 74877c840d02168..ff8cbae798768e1 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -20,16 +20,16 @@ This tutorial will guide you through the process of setting up and customizing G 1. Log into the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account. 2. Go to AI > AI Gateway. -## 2. Access the Settings Tab +## 2. Access the Settings tab In the AI Gateway section, click on the Settings tab. Confirm that Guardrails is enabled. -## 3. Set Security Hazard on Prompt or Response +## 3. Set security hazard on prompt or response Within the Guardrails settings, you can choose where to apply security hazards: -- On Prompt: Guardrails will evaluate and transform incoming prompts based on your security policies. -- On Response: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. +- On prompt: Guardrails will evaluate and transform incoming prompts based on your security policies. +- On response: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. Select the option that best fits your use case. You can modify this setting at any time according to your needs. From cc073bcceea16f6499f74b29179d210c8b6a9b72 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 22:20:06 +0000 Subject: [PATCH 16/22] Update set-up-guardrail.mdx Removed heading --- src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index ff8cbae798768e1..e5b2ea369fedd3c 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -5,8 +5,6 @@ sidebar: order: 3 --- -## Configuration - Within AI Gateway settings, you can customize Guardrails: - Enable or disable content moderation. From bf61f4bc2160d4d4fa851e1812884a466c083eea Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 22:20:42 +0000 Subject: [PATCH 17/22] Update supported-model-types.mdx Removed heading --- .../docs/ai-gateway/guardrails/supported-model-types.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx b/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx index b8270da1a7cf0a1..a23e11b8ece1202 100644 --- a/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx +++ b/src/content/docs/ai-gateway/guardrails/supported-model-types.mdx @@ -5,8 +5,6 @@ sidebar: order: 3 --- -## Supported model types - AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: - **Text generation models**: Both prompts and responses are evaluated. From 22eeb30cbdb50412d8eda21f86b8c2ba1c8e4995 Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Tue, 25 Feb 2025 15:28:59 -0800 Subject: [PATCH 18/22] Update usage-considerations.mdx added info on what happens if Workers AI is down --- src/content/docs/ai-gateway/guardrails/usage-considerations.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx index af269dd35ee6aaf..19422b4d181acd6 100644 --- a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx +++ b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx @@ -11,6 +11,7 @@ Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You ## Additional considerations +- Model availability: If at least one hazard category is set to `block`, but AI Gateway is unable to receive a response from Workers AI, the request will be blocked. - Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. :::note From add251e2dcc9ea53ebebfee00c5564a417247f12 Mon Sep 17 00:00:00 2001 From: daisyfaithauma Date: Tue, 25 Feb 2025 23:38:53 +0000 Subject: [PATCH 19/22] Update set-up-guardrail.mdx changed tab --- src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index e5b2ea369fedd3c..907879f41a84cb3 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -18,10 +18,9 @@ This tutorial will guide you through the process of setting up and customizing G 1. Log into the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account. 2. Go to AI > AI Gateway. -## 2. Access the Settings tab +## 2. Access the Guardrails tab -In the AI Gateway section, click on the Settings tab. -Confirm that Guardrails is enabled. +Confirm that Guardrails is enabled from the toggle. ## 3. Set security hazard on prompt or response From f2fb2cbde56eb6e6ed4aa4f5c21477f72251e797 Mon Sep 17 00:00:00 2001 From: Kody Jackson Date: Wed, 26 Feb 2025 07:26:45 -0600 Subject: [PATCH 20/22] Apply suggestions from code review Co-authored-by: Maddy <130055405+Maddy-Cloudflare@users.noreply.github.com> --- src/content/docs/ai-gateway/guardrails/index.mdx | 7 +++---- .../docs/ai-gateway/guardrails/set-up-guardrail.mdx | 10 +++++----- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/index.mdx b/src/content/docs/ai-gateway/guardrails/index.mdx index 5582475f78e08bc..91db4c8678474a4 100644 --- a/src/content/docs/ai-gateway/guardrails/index.mdx +++ b/src/content/docs/ai-gateway/guardrails/index.mdx @@ -14,18 +14,17 @@ Guardrails proactively monitor interactions between users and AI models, giving - **Consistent moderation**: Uniform moderation layer that works across models and providers. - **Enhanced safety and user trust**: Proactively protect users from harmful or inappropriate interactions. -- **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking -- **Auditing and compliance capabilities**: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails. +- **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking. +- **Auditing and compliance capabilities**: Receive updates on evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails. ## How Guardrails work -AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: +AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Guardrails work by: 1. Intercepting interactions: AI Gateway proxies requests and responses, sitting between the user and the AI model. 2. Inspecting content: - - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 907879f41a84cb3..95e6538ac329be4 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -1,11 +1,11 @@ --- pcx_content_type: how-to -title: Setting up Guardrails +title: Set up Guardrails sidebar: order: 3 --- -Within AI Gateway settings, you can customize Guardrails: +Within AI Gateway settings, you can customize Guardrails to: - Enable or disable content moderation. - Choose evaluation scope: Analyze user prompts, model responses, or both. @@ -16,7 +16,7 @@ This tutorial will guide you through the process of setting up and customizing G ## 1. Log in to the dashboard 1. Log into the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account. -2. Go to AI > AI Gateway. +2. Go to **AI** > **AI Gateway**. ## 2. Access the Guardrails tab @@ -26,7 +26,7 @@ Confirm that Guardrails is enabled from the toggle. Within the Guardrails settings, you can choose where to apply security hazards: -- On prompt: Guardrails will evaluate and transform incoming prompts based on your security policies. -- On response: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. +- **On prompt**: Guardrails will evaluate and transform incoming prompts based on your security policies. +- **On response**: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. Select the option that best fits your use case. You can modify this setting at any time according to your needs. From 748a6f6f4c85879e4025fdb89382a903c67b559d Mon Sep 17 00:00:00 2001 From: kodster28 Date: Wed, 26 Feb 2025 07:34:18 -0600 Subject: [PATCH 21/22] Update UI instructions --- .../guardrails/set-up-guardrail.mdx | 31 ++++++------------- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 95e6538ac329be4..5461c96cde06ead 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -5,28 +5,15 @@ sidebar: order: 3 --- -Within AI Gateway settings, you can customize Guardrails to: - -- Enable or disable content moderation. -- Choose evaluation scope: Analyze user prompts, model responses, or both. -- Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). - -This tutorial will guide you through the process of setting up and customizing Guardrails in your AI Gateway using the Cloudflare dashboard. - -## 1. Log in to the dashboard +Add Guardrails to any gateway to start evaluating and potentially modifying responses. 1. Log into the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account. 2. Go to **AI** > **AI Gateway**. - -## 2. Access the Guardrails tab - -Confirm that Guardrails is enabled from the toggle. - -## 3. Set security hazard on prompt or response - -Within the Guardrails settings, you can choose where to apply security hazards: - -- **On prompt**: Guardrails will evaluate and transform incoming prompts based on your security policies. -- **On response**: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. - -Select the option that best fits your use case. You can modify this setting at any time according to your needs. +3. Select a gateway. +4. Go to **Guardrails**. +5. Switch the toggle to **On**. +6. To customize categories, select **Change** > **Configure specific categories**. +7. Update your choices for how Guardrails works on specific prompts or responses (**Flag**, **Ignore**, **Block**). + - For **Prompts**: Guardrails will evaluate and transform incoming prompts based on your security policies. + - For **Responses**: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. +8. Select **Save**. From 8a73f7f312659a0af9394dfd953909687f64204d Mon Sep 17 00:00:00 2001 From: kodster28 Date: Wed, 26 Feb 2025 07:38:34 -0600 Subject: [PATCH 22/22] Added note --- src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx index 5461c96cde06ead..217cfc50e61c331 100644 --- a/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx +++ b/src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx @@ -17,3 +17,7 @@ Add Guardrails to any gateway to start evaluating and potentially modifying resp - For **Prompts**: Guardrails will evaluate and transform incoming prompts based on your security policies. - For **Responses**: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. 8. Select **Save**. + +:::note[Header] +For additional details about how to implement Guardrails, refer to [Usage considerations](/ai-gateway/guardrails/usage-considerations/). +:::