You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Total token limits**: Set overall usage quotas per user/tenant
28
+
-**Time-based windows**: Configure limits per second, minute, or hour
29
+
30
+
### 3. **Model/Provider Failover**
31
+
32
+
Ensure high availability with automatic failover mechanisms:
33
+
34
+
- Detect unhealthy backends and route traffic to healthy instances
35
+
- Support for active-passive and active-active failover strategies
36
+
- Graceful degradation when primary models are unavailable
37
+
38
+
### 4. **Traffic Splitting & Canary Testing**
39
+
40
+
Deploy new models safely with progressive rollout capabilities:
41
+
42
+
-**A/B Testing**: Split traffic between model versions to compare performance
43
+
-**Canary Deployments**: Gradually shift traffic to new models (e.g., 5% → 25% → 50% → 100%)
44
+
-**Shadow Traffic**: Send duplicate requests to new models without affecting production
45
+
-**Weight-based routing**: Fine-tune traffic distribution across model variants
46
+
47
+
### 5. **LLM Observability & Monitoring**
48
+
49
+
Gain deep insights into your LLM infrastructure:
50
+
51
+
-**Request/Response Metrics**: Track latency, throughput, token usage, and error rates
52
+
-**Model Performance**: Monitor accuracy, quality scores, and user satisfaction
53
+
-**Cost Analytics**: Analyze spending patterns across models and providers
54
+
-**Distributed Tracing**: End-to-end visibility with OpenTelemetry integration
55
+
-**Custom Dashboards**: Visualize metrics in Prometheus, Grafana, or your preferred monitoring stack
56
+
57
+
## Supported LLM Providers
58
+
59
+
| Provider Name | API Schema Config on [AIServiceBackend](https://aigateway.envoyproxy.io/docs/api/#aiservicebackendspec)| Upstream Authentication Config on [BackendSecurityPolicy](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicyspec)| Status | Note |
|[Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)|`{"name":"AzureOpenAI","version":"2025-01-01-preview"}` or `{"name":"OpenAI", "version": "openai/v1"}`|[Azure Credentials](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicyazurecredentials) or [Azure API Key](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicyazureapikey)| ✅ ||
64
+
|[Google Gemini on AI Studio](https://ai.google.dev/gemini-api/docs/openai)|`{"name":"OpenAI","version":"v1beta/openai"}`|[API Key](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicyapikey)| ✅ | Only the OpenAI compatible endpoint |
|[Anthropic on GCP Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude)|`{"name":"GCPAnthropic", "version":"vertex-2023-10-16"}`|[GCP Credentials](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicygcpcredentials)| ✅ | Support both Native Anthropic messages endpoint and OpenAI compatible endpoint |
|[DeepInfra](https://deepinfra.com/docs/inference)|`{"name":"OpenAI","version":"v1/openai"}`|[API Key](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicyapikey)| ✅ | Only the OpenAI compatible endpoint |
|[Anthropic](https://docs.claude.com/en/home)|`{"name":"Anthropic"}`|[Anthropic API Key](https://aigateway.envoyproxy.io/docs/api/#backendsecuritypolicyanthropicapikey)| ✅ | Support only Native Anthropic messages endpoint |
79
+
| Self-hosted-models |`{"name":"OpenAI","version":"v1"}`| N/A | ✅ | Depending on the API schema spoken by self-hosted servers. For example, [vLLM](https://docs.vllm.ai/en/v0.8.3/serving/openai_compatible_server.html) speaks the OpenAI format. Also, API Key auth can be configured as well. |
80
+
13
81
## Prerequisites
14
82
15
83
Before starting, ensure you have the following tools installed:
0 commit comments