-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[AIG]Guardrails docs #20098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[AIG]Guardrails docs #20098
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
9a54d96
Guardails docs
daisyfaithauma befd14a
minor fixes
daisyfaithauma 04ee357
Update index.mdx
kathayl 842d0d3
Update index.mdx
kathayl 0f907b4
Update set-up-guardrail.mdx
kathayl 86d8f6b
Update src/content/docs/ai-gateway/guardrails/index.mdx
daisyfaithauma 451be8f
removed duplicate
daisyfaithauma ba3ab97
moved details
daisyfaithauma c1378e9
Update set-up-guardrail.mdx
kathayl 728c906
changes to docs
daisyfaithauma f0b8527
Merge branch 'aig-guardrails' of https://github.com/cloudflare/cloudf…
daisyfaithauma 766a736
Merged
daisyfaithauma 2a6fcdb
title
daisyfaithauma 1f051a4
title
daisyfaithauma 21f0956
add setup details
daisyfaithauma e05221e
spelling
daisyfaithauma cc073bc
Update set-up-guardrail.mdx
daisyfaithauma bf61f4b
Update supported-model-types.mdx
daisyfaithauma 22eeb30
Update usage-considerations.mdx
kathayl add251e
Update set-up-guardrail.mdx
daisyfaithauma f2fb2cb
Apply suggestions from code review
kodster28 748a6f6
Update UI instructions
kodster28 8a73f7f
Added note
kodster28 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| --- | ||
| title: Guardrails | ||
| pcx_content_type: navigation | ||
| order: 1 | ||
| sidebar: | ||
| order: 8 | ||
| group: | ||
| badge: Beta | ||
| --- | ||
|
|
||
| Guardrails help you deploy AI applications safely by intercepting and evaluating both user prompts and model responses for harmful content. Acting as a proxy between your application and [model providers](/ai-gateway/providers/) (such as OpenAI, Anthropic, DeepSeek, and others), AI Gateway's Guardrails ensure a consistent and secure experience across your entire AI ecosystem. | ||
|
|
||
| Guardrails proactively monitor interactions between users and AI models, giving you: | ||
|
|
||
| - **Consistent moderation**: Uniform moderation layer that works across models and providers. | ||
| - **Enhanced safety and user trust**: Proactively protect users from harmful or inappropriate interactions. | ||
| - **Flexibility and control over allowed content**: Specify which categories to monitor and choose between flagging or outright blocking. | ||
| - **Auditing and compliance capabilities**: Receive updates on evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails. | ||
|
|
||
| ## How Guardrails work | ||
|
|
||
| AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Guardrails work by: | ||
|
|
||
| 1. Intercepting interactions: | ||
| AI Gateway proxies requests and responses, sitting between the user and the AI model. | ||
|
|
||
| 2. Inspecting content: | ||
| - User prompts: AI Gateway checks prompts against safety parameters (for example, violence, hate, or sexual content). Based on your settings, prompts can be flagged or blocked before reaching the model. | ||
| - Model responses: Once processed, the AI model response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. | ||
|
|
||
| 3. Applying actions: | ||
| Depending on your configuration, flagged content is logged for review, while blocked content is prevented from proceeding. | ||
kodster28 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
23 changes: 23 additions & 0 deletions
23
src/content/docs/ai-gateway/guardrails/set-up-guardrail.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| --- | ||
| pcx_content_type: how-to | ||
| title: Set up Guardrails | ||
| sidebar: | ||
| order: 3 | ||
| --- | ||
|
|
||
| Add Guardrails to any gateway to start evaluating and potentially modifying responses. | ||
|
|
||
| 1. Log into the [Cloudflare dashboard](https://dash.cloudflare.com/) and select your account. | ||
| 2. Go to **AI** > **AI Gateway**. | ||
| 3. Select a gateway. | ||
| 4. Go to **Guardrails**. | ||
| 5. Switch the toggle to **On**. | ||
| 6. To customize categories, select **Change** > **Configure specific categories**. | ||
| 7. Update your choices for how Guardrails works on specific prompts or responses (**Flag**, **Ignore**, **Block**). | ||
| - For **Prompts**: Guardrails will evaluate and transform incoming prompts based on your security policies. | ||
| - For **Responses**: Guardrails will inspect the model's responses to ensure they meet your content and formatting guidelines. | ||
| 8. Select **Save**. | ||
|
|
||
| :::note[Header] | ||
| For additional details about how to implement Guardrails, refer to [Usage considerations](/ai-gateway/guardrails/usage-considerations/). | ||
| ::: |
12 changes: 12 additions & 0 deletions
12
src/content/docs/ai-gateway/guardrails/supported-model-types.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| --- | ||
| pcx_content_type: reference | ||
| title: Supported model types | ||
| sidebar: | ||
| order: 3 | ||
| --- | ||
|
|
||
| AI Gateway's Guardrails detects the type of AI model being used and applies safety checks accordingly: | ||
|
|
||
| - **Text generation models**: Both prompts and responses are evaluated. | ||
| - **Embedding models**: Only the prompt is evaluated, as the response consists of numerical embeddings, which are not meaningful for moderation. | ||
| - **Unknown models**: If the model type cannot be determined, only the prompt is evaluated, while the response bypass Guardrails. |
21 changes: 21 additions & 0 deletions
21
src/content/docs/ai-gateway/guardrails/usage-considerations.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| --- | ||
| pcx_content_type: reference | ||
| title: Usage considerations | ||
| sidebar: | ||
| order: 4 | ||
| --- | ||
|
|
||
| Guardrails currently uses [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) on [Workers AI](/workers-ai/) to perform content evaluations. The underlying model may be updated in the future, and we will reflect those changes within Guardrails. | ||
|
|
||
| Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You can monitor usage through the Workers AI Dashboard. | ||
|
|
||
| ## Additional considerations | ||
|
|
||
| - Model availability: If at least one hazard category is set to `block`, but AI Gateway is unable to receive a response from Workers AI, the request will be blocked. | ||
| - Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. | ||
|
|
||
| :::note | ||
|
|
||
| Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and understand that you are responsible for the results and outcomes of your use of AI Gateway. | ||
|
|
||
| ::: |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.