Update usage-considerations.mdx

kathayl · web-flow · commit 910771996795 · 2025-03-03T12:02:30.000-08:00
add info on expected latency from enabling guardrails
diff --git a/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx b/src/content/docs/ai-gateway/guardrails/usage-considerations.mdx
@@ -12,7 +12,7 @@ Since Guardrails runs on Workers AI, enabling it incurs usage on Workers AI. You
 ## Additional considerations
 
 - Model availability: If at least one hazard category is set to `block`, but AI Gateway is unable to receive a response from Workers AI, the request will be blocked. Conversely, if a hazard category is set to `flag` and AI Gateway cannot obtain a response from Workers AI, the request will proceed without evaluation. This approach prioritizes availability, allowing requests to continue even when content evaluation is not possible.
-- Latency impact: Enabling Guardrails adds some latency. Consider this when balancing safety and speed.
+- Latency impact: Enabling Guardrails adds some latency. Enabling Guardrails introduces additional latency to requests. Typically, evaluations using Llama Guard 3 8B on Workers AI add approximately 500 milliseconds per request. However, larger requests may experience increased latency, though this increase is not linear. Consider this when balancing safety and performance.
 
 :::note