-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Problem
Currently, routing rules can only determine which routing group a request should go to, but they cannot access backend health status. When all backends in the selected routing group are unhealthy, the gateway fails with "Number of active backends found zero" - even if other healthy backends exist in different routing groups.
This creates a gap for users who want to implement high availability across routing groups while maintaining workload isolation during normal operation.
Use Case
We have two Trino clusters configured in separate routing groups for workload isolation:
• online - for low-latency operational queries
• analytics - for heavy analytical workloads
We route queries based on X-Trino-Client-Tags header. When both clusters are healthy, workload isolation works perfectly. However, when the online cluster becomes unavailable, we want queries tagged with "online" to failover to the analytics cluster rather than fail entirely.
Proposed Solutions
- Enable by syntax configurable fallback backends for each of the defined backends (requires schema change)
- For each of routing rules definition allow to specify whether its optional allowing to go to the next rule in case of failure (requires schema change)
- Expose backed health for MVEL expressions rule evaluations. This is the least invasive change but allows user to define more sophisticated routing logic based on the fact that one of the backends might be temporarily unavailable.