From 910771996795a77a33b8d7601c8887bd70ec6396 Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Mon, 3 Mar 2025 12:02:30 -0800 Subject: [PATCH 1/2] Update usage-considerations.mdx add info on expected latency from enabling guardrails --- src/content/docs/ai-gateway/guardrails/usage-considerations.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx index 08446a4fad359e..1a117e39a2ccfd 100644 --- a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx +++ b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx @@ -12,7 +12,7 @@ Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You ## Additional considerations - Model availability: If at least one hazard category is set to `block`, but AI Gateway is unable to receive a response from Workers AI, the request will be blocked. Conversely, if a hazard category is set to `flag` and AI Gateway cannot obtain a response from Workers AI, the request will proceed without evaluation. This approach prioritizes availability, allowing requests to continue even when content evaluation is not possible. -- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed. +- Latency impact: Enabling Guardrails adds some latency. Enabling Guardrails introduces additional latency to requests. Typically, evaluations using Llama Guard 3 8B on Workers AI add approximately 500 milliseconds per request. However, larger requests may experience increased latency, though this increase is not linear. Consider this when balancing safety and performance. :::note From a7299149f4ac8d0785e18cdc13a043aca45acbe4 Mon Sep 17 00:00:00 2001 From: Kathy <153706637+kathayl@users.noreply.github.com> Date: Mon, 3 Mar 2025 17:32:18 -0800 Subject: [PATCH 2/2] Update usage-considerations.mdx --- .../docs/ai-gateway/guardrails/usage-considerations.mdx | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx index 1a117e39a2ccfd..f1510e8172491f 100644 --- a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx +++ b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx @@ -11,8 +11,11 @@ Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You ## Additional considerations -- Model availability: If at least one hazard category is set to `block`, but AI Gateway is unable to receive a response from Workers AI, the request will be blocked. Conversely, if a hazard category is set to `flag` and AI Gateway cannot obtain a response from Workers AI, the request will proceed without evaluation. This approach prioritizes availability, allowing requests to continue even when content evaluation is not possible. -- Latency impact: Enabling Guardrails adds some latency. Enabling Guardrails introduces additional latency to requests. Typically, evaluations using Llama Guard 3 8B on Workers AI add approximately 500 milliseconds per request. However, larger requests may experience increased latency, though this increase is not linear. Consider this when balancing safety and performance. +- **Model availability**: If at least one hazard category is set to `block`, but AI Gateway is unable to receive a response from Workers AI, the request will be blocked. Conversely, if a hazard category is set to `flag` and AI Gateway cannot obtain a response from Workers AI, the request will proceed without evaluation. This approach prioritizes availability, allowing requests to continue even when content evaluation is not possible. +- **Latency impact**: Enabling Guardrails adds some latency. Enabling Guardrails introduces additional latency to requests. Typically, evaluations using Llama Guard 3 8B on Workers AI add approximately 500 milliseconds per request. However, larger requests may experience increased latency, though this increase is not linear. Consider this when balancing safety and performance. +- **Handling long content**: When evaluating long prompts or responses, Guardrails automatically segments the content into smaller chunks, processing each through separate Guardrail requests. This approach ensures comprehensive moderation but may result in increased latency for longer inputs. +- **Supported languages**: Llama Guard 3.3 8B supports content safety classification in the following languages: English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. + :::note