Skip to content

Commit befd14a

Browse files
minor fixes
1 parent 9a54d96 commit befd14a

File tree

2 files changed

+88
-29
lines changed

2 files changed

+88
-29
lines changed
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Guardrails in AI Gateway
2+
title: Guardrails
33
pcx_content_type: navigation
44
order: 1
55
sidebar:
@@ -8,10 +8,10 @@ sidebar:
88
badge: Beta
99
---
1010

11-
Guardrails in AI Gateway help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and model providers (such as OpenAI, Anthropic, DeepSeek, and others), Guardrails ensures a consistent and secure experience across your entire AI ecosystem.
11+
Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers)(such as OpenAI, Anthropic, DeepSeek, and others), Guardrails ensures a consistent and secure experience across your entire AI ecosystem.
1212

1313
Guardrails proactively monitor interactions between users and AI models, allowing you to:
1414

15-
Enhance safety: Protect users by detecting and mitigating harmful content.
16-
Improve compliance: Meet evolving regulatory standards.
17-
Reduce costs: Prevent unnecessary processing by blocking harmful requests early.
15+
- Protect users by detecting and mitigating harmful content.
16+
- Meet compliance requirements by aligning with evolving regulatory standards.
17+
- Optimize costs by preventing unnecessary processing of harmful requests early.

src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx

Lines changed: 83 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -5,44 +5,103 @@ sidebar:
55
order: 3
66
---
77

8-
AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Here’s a breakdown of the process:
8+
AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process:
99

10-
1. **Intercepting interactions:**
11-
AI Gateway sits between the user and the AI model, intercepting every prompt and response.
10+
1. Intercepting interactions:
11+
AI Gateway proxies requests and responses, sitting between the user and the AI model.
1212

13-
2. **Evaluating content:**
13+
2. Inspecting content:
1414

15-
- **User prompts:** When a user sends a prompt, AI Gateway checks it against safety parameters (for example violence, hate, or sexual content). Based on your configuration, the system can either flag the prompt or block it before it reaches the AI model.
16-
- **Model responses:** After processing, the AI model’s response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user.
15+
- User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model.
16+
- Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user.
1717

18-
3. **Model-specific behavior:**
18+
3. Applying actions:
19+
Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding.
1920

20-
- **Text generation models:** Both prompts and responses are evaluated.
21-
- **Embedding models:** Only the prompt is evaluated, and the response is passed directly back to the user.
22-
- **Catalogued models:** If the model type is identifiable, only the prompt is evaluated; the response bypasses Guardrails and is delivered directly.
21+
## Supported model types
2322

24-
4. **Real-time observability:**
25-
Detailed logs provide visibility into user queries and model outputs, allowing you to monitor interactions continuously and adjust safety parameters as needed.
23+
Guardrails determines the type of AI model being used and applies safety checks accordingly:
24+
25+
- Text generation models: Both prompts and responses are evaluated.
26+
- Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user.
27+
- Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails.
28+
29+
If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed.
30+
31+
## Configuration
32+
33+
Within AI Gateway settings, you can customize Guardrails:
34+
35+
- Enable or disable content moderation.
36+
- Choose evaluation scope: Analyze user prompts, model responses, or both.
37+
- Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block).
38+
39+
## Workers AI and Guardrails
40+
41+
Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails.
42+
43+
Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard.
44+
45+
## Additional considerations
46+
47+
- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed.
48+
49+
:::note
50+
51+
Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway.
52+
53+
## :::
54+
55+
pcx_content_type: how-to
56+
title: How Guardrails works
57+
sidebar:
58+
order: 3
59+
60+
---
61+
62+
AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process:
63+
64+
1. Intercepting interactions:
65+
AI Gateway proxies requests and responses, sitting between the user and the AI model.
66+
67+
2. Inspecting content:
68+
69+
- User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model.
70+
- Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user.
71+
72+
3. Applying actions:
73+
Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding.
74+
75+
## Supported model types
76+
77+
Guardrails determines the type of AI model being used and applies safety checks accordingly:
78+
79+
- Text generation models: Both prompts and responses are evaluated.
80+
- Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user.
81+
- Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails.
82+
83+
If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed.
2684

2785
## Configuration
2886

29-
Within AI Gateway settings, you can tailor the Guardrails feature to your requirements:
87+
Within AI Gateway settings, you can customize Guardrails:
3088

31-
- **Guardrails:** Enable or disable content moderation.
32-
- **Evaluation scope:** Choose to analyse user prompts, model responses, or both.
33-
- **Hazard categories:** Define specific categories (such as violence, hate, or sexual content) to monitor, and set actions for each category (ignore, flag, or block).
89+
- Enable or disable content moderation.
90+
- Choose evaluation scope: Analyze user prompts, model responses, or both.
91+
- Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block).
3492

35-
## Leveraging Llama Guard on Workers AI
93+
## Workers AI and Guardrails
3694

37-
Guardrails is powered by [**Llama Guard**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), Meta’s open-source content moderation tool designed for real-time safety monitoring. AI Gateway uses the [**Llama Guard 3 8B model**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), hosted on [**Workers AI**](/workers-ai/) to drive its safety features. This model is continuously updated to adapt to emerging safety challenges.
95+
Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails.
96+
97+
Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard.
3898

3999
## Additional considerations
40100

41-
- **Workers AI usage:**
42-
Enabling Guardrails incurs usage on Workers AI. Monitor your usage through the Workers AI Dashboard.
101+
- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed.
102+
103+
:::note
43104

44-
- **Latency impact:**
45-
Evaluating both the request and the response introduces extra latency. Factor this into your deployment planning.
105+
Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway.
46106

47-
- **Model availability:**
48-
If the underlying model is unavailable, requests that are flagged will proceed; however, requests set to be blocked will result in an error.
107+
:::

0 commit comments

Comments
 (0)