-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
The intelligent-routing.xml policy fragment is registered in APIM (stacks/apim/main.tf) and documented in params/apim/README.md as "Multi-backend routing (future)", but is never included in any policy template.
Gap
Currently each tenant has exactly one backend per service type - no failover. If the shared AI Foundry Hub throttles or becomes unavailable, there is no fallback route. The intelligent-routing fragment already implements priority-based backend selection with throttle awareness and random load balancing within priority tiers.
Proposed Implementation
- Introduce a
routesconfiguration in tenant tfvars to define multiple backends with priority/weight for OpenAI (and optionally other services) - Include the
intelligent-routingfragment in the per-tenant OpenAI path routing when multiple routes are configured - Create corresponding backend pool resources in Terraform for load-balanced backends
- Update integration tests to verify failover behavior
When to Implement
This becomes critical when:
- Multi-region deployment is planned
- PTU + consumption spillover is needed
- High-availability SLA requirements increase
Severity
MEDIUM - Foundation exists but isn't wired. Value increases significantly with multi-region or PTU+consumption spillover plans.
Context
Identified during APIM multi-tenancy and AI gateway policy gap analysis (Feb 2026).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status