Skip to content

APIM AI Gateway: Wire intelligent-routing fragment for multi-backend failover #99

@mishraomp

Description

@mishraomp

Summary

The intelligent-routing.xml policy fragment is registered in APIM (stacks/apim/main.tf) and documented in params/apim/README.md as "Multi-backend routing (future)", but is never included in any policy template.

Gap

Currently each tenant has exactly one backend per service type - no failover. If the shared AI Foundry Hub throttles or becomes unavailable, there is no fallback route. The intelligent-routing fragment already implements priority-based backend selection with throttle awareness and random load balancing within priority tiers.

Proposed Implementation

  1. Introduce a routes configuration in tenant tfvars to define multiple backends with priority/weight for OpenAI (and optionally other services)
  2. Include the intelligent-routing fragment in the per-tenant OpenAI path routing when multiple routes are configured
  3. Create corresponding backend pool resources in Terraform for load-balanced backends
  4. Update integration tests to verify failover behavior

When to Implement

This becomes critical when:

  • Multi-region deployment is planned
  • PTU + consumption spillover is needed
  • High-availability SLA requirements increase

Severity

MEDIUM - Foundation exists but isn't wired. Value increases significantly with multi-region or PTU+consumption spillover plans.

Context

Identified during APIM multi-tenancy and AI gateway policy gap analysis (Feb 2026).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions