-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[AIG]Guardrails docs #20098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIG]Guardrails docs #20098
Changes from 4 commits
9a54d96
befd14a
04ee357
842d0d3
0f907b4
86d8f6b
451be8f
ba3ab97
c1378e9
728c906
f0b8527
766a736
2a6fcdb
1f051a4
21f0956
e05221e
cc073bc
bf61f4b
22eeb30
add251e
f2fb2cb
748a6f6
8a73f7f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| --- | ||
| title: Guardrails | ||
| pcx_content_type: navigation | ||
| order: 1 | ||
| sidebar: | ||
| order: 8 | ||
| group: | ||
| badge: Beta | ||
| --- | ||
|
|
||
| Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers) (such as OpenAI, Anthropic, DeepSeek, and others), AI Gateway's Guardrails ensure a consistent and secure experience across your entire AI ecosystem. | ||
|
|
||
| Guardrails proactively monitor interactions between users and AI models, giving you: | ||
|
|
||
| - **Consistent moderation**: Uniform moderation layer that works across models and providers. | ||
| - **Enhanced safety and user trust**: Proactively protect users from harmful or inappropriate interactions. | ||
| - **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - **Auditing and compliance capabilities**: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails. | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| --- | ||
| pcx_content_type: how-to | ||
| title: How Guardrails works | ||
| sidebar: | ||
| order: 3 | ||
| --- | ||
|
|
||
| AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: | ||
|
||
|
|
||
| 1. Intercepting interactions: | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| AI Gateway proxies requests and responses, sitting between the user and the AI model. | ||
|
|
||
| 2. Inspecting content: | ||
|
|
||
| - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. | ||
| - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. | ||
|
|
||
| 3. Applying actions: | ||
| Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. | ||
|
|
||
| ## Supported model types | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Guardrails determines the type of AI model being used and applies safety checks accordingly: | ||
|
|
||
| - Text generation models: Both prompts and responses are evaluated. | ||
| - Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user. | ||
| - Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails. | ||
|
|
||
| If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed. | ||
|
|
||
| ## Configuration | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Within AI Gateway settings, you can customize Guardrails: | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - Enable or disable content moderation. | ||
| - Choose evaluation scope: Analyze user prompts, model responses, or both. | ||
| - Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). | ||
|
|
||
| ## Workers AI and Guardrails | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. | ||
|
|
||
| Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. | ||
|
|
||
| ## Additional considerations | ||
|
|
||
| - Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. | ||
|
|
||
| :::note | ||
|
|
||
| Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. | ||
|
|
||
| ## ::: | ||
|
||
|
|
||
| pcx_content_type: how-to | ||
| title: How Guardrails works | ||
| sidebar: | ||
| order: 3 | ||
|
|
||
| --- | ||
|
|
||
| AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Below a breakdown of the process: | ||
|
|
||
| 1. Intercepting interactions: | ||
| AI Gateway proxies requests and responses, sitting between the user and the AI model. | ||
|
|
||
| 2. Inspecting content: | ||
|
|
||
| - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. | ||
| - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. | ||
|
|
||
| 3. Applying actions: | ||
| Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. | ||
|
|
||
| ## Supported model types | ||
|
|
||
| Guardrails determines the type of AI model being used and applies safety checks accordingly: | ||
|
|
||
| - Text generation models: Both prompts and responses are evaluated. | ||
| - Embedding models: Only the prompt is evaluated, and the response is passed directly back to the user. | ||
| - Unknown models: If the model type cannot be determined, prompts are evaluated, but responses bypass Guardrails. | ||
|
|
||
| If Guardrails cannot access the underlying model, requests set to "block" will result in an error, while flagged requests will proceed. | ||
|
|
||
| ## Configuration | ||
|
|
||
| Within AI Gateway settings, you can customize Guardrails: | ||
|
|
||
| - Enable or disable content moderation. | ||
| - Choose evaluation scope: Analyze user prompts, model responses, or both. | ||
| - Define hazard categories: Select categories like violence, hate, or sexual content and assign actions (ignore, flag, or block). | ||
|
|
||
| ## Workers AI and Guardrails | ||
|
|
||
| Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. | ||
|
|
||
| Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. | ||
|
|
||
| ## Additional considerations | ||
|
|
||
| - Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. | ||
|
|
||
| :::note | ||
|
|
||
| Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. | ||
|
|
||
| ::: | ||
Uh oh!
There was an error while loading. Please reload this page.