|
| 1 | +--- |
| 2 | +pcx_content_type: how-to |
| 3 | +title: How Guardrails works |
| 4 | +sidebar: |
| 5 | + order: 3 |
| 6 | +--- |
| 7 | + |
| 8 | +AI Gateway inspects all interactions in real time by evaluating content against predefined safety parameters. Here’s a breakdown of the process: |
| 9 | + |
| 10 | +1. **Intercepting interactions:** |
| 11 | + AI Gateway sits between the user and the AI model, intercepting every prompt and response. |
| 12 | + |
| 13 | +2. **Evaluating content:** |
| 14 | + |
| 15 | + - **User prompts:** When a user sends a prompt, AI Gateway checks it against safety parameters (for example violence, hate, or sexual content). Based on your configuration, the system can either flag the prompt or block it before it reaches the AI model. |
| 16 | + - **Model responses:** After processing, the AI model’s response is inspected. If hazardous content is detected, it can be flagged or blocked before being delivered to the user. |
| 17 | + |
| 18 | +3. **Model-specific behavior:** |
| 19 | + |
| 20 | + - **Text generation models:** Both prompts and responses are evaluated. |
| 21 | + - **Embedding models:** Only the prompt is evaluated, and the response is passed directly back to the user. |
| 22 | + - **Catalogued models:** If the model type is identifiable, only the prompt is evaluated; the response bypasses Guardrails and is delivered directly. |
| 23 | + |
| 24 | +4. **Real-time observability:** |
| 25 | + Detailed logs provide visibility into user queries and model outputs, allowing you to monitor interactions continuously and adjust safety parameters as needed. |
| 26 | + |
| 27 | +## Configuration |
| 28 | + |
| 29 | +Within AI Gateway settings, you can tailor the Guardrails feature to your requirements: |
| 30 | + |
| 31 | +- **Guardrails:** Enable or disable content moderation. |
| 32 | +- **Evaluation scope:** Choose to analyse user prompts, model responses, or both. |
| 33 | +- **Hazard categories:** Define specific categories (such as violence, hate, or sexual content) to monitor, and set actions for each category (ignore, flag, or block). |
| 34 | + |
| 35 | +## Leveraging Llama Guard on Workers AI |
| 36 | + |
| 37 | +Guardrails is powered by [**Llama Guard**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), Meta’s open-source content moderation tool designed for real-time safety monitoring. AI Gateway uses the [**Llama Guard 3 8B model**](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/), hosted on [**Workers AI**](/workers-ai/) to drive its safety features. This model is continuously updated to adapt to emerging safety challenges. |
| 38 | + |
| 39 | +## Additional considerations |
| 40 | + |
| 41 | +- **Workers AI usage:** |
| 42 | + Enabling Guardrails incurs usage on Workers AI. Monitor your usage through the Workers AI Dashboard. |
| 43 | + |
| 44 | +- **Latency impact:** |
| 45 | + Evaluating both the request and the response introduces extra latency. Factor this into your deployment planning. |
| 46 | + |
| 47 | +- **Model availability:** |
| 48 | + If the underlying model is unavailable, requests that are flagged will proceed; however, requests set to be blocked will result in an error. |
0 commit comments