Skip to content

Ability to fallback to another healthy backend in case the prefferred one is unhealthy #902

@aleksanderlech

Description

@aleksanderlech

Problem

Currently, routing rules can only determine which routing group a request should go to, but they cannot access backend health status. When all backends in the selected routing group are unhealthy, the gateway fails with "Number of active backends found zero" - even if other healthy backends exist in different routing groups.

This creates a gap for users who want to implement high availability across routing groups while maintaining workload isolation during normal operation.

Use Case

We have two Trino clusters configured in separate routing groups for workload isolation:
• online - for low-latency operational queries
• analytics - for heavy analytical workloads

We route queries based on X-Trino-Client-Tags header. When both clusters are healthy, workload isolation works perfectly. However, when the online cluster becomes unavailable, we want queries tagged with "online" to failover to the analytics cluster rather than fail entirely.

Proposed Solutions

  1. Enable by syntax configurable fallback backends for each of the defined backends (requires schema change)
  2. For each of routing rules definition allow to specify whether its optional allowing to go to the next rule in case of failure (requires schema change)
  3. Expose backed health for MVEL expressions rule evaluations. This is the least invasive change but allows user to define more sophisticated routing logic based on the fact that one of the backends might be temporarily unavailable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions