Skip to content

rfc for Egress Gateway#144

Open
maksymvavilov wants to merge 1 commit intoKuadrant:mainfrom
maksymvavilov:egress-gw-rfc
Open

rfc for Egress Gateway#144
maksymvavilov wants to merge 1 commit intoKuadrant:mainfrom
maksymvavilov:egress-gw-rfc

Conversation

@maksymvavilov
Copy link
Contributor

RFC for providing a support for Egress Gateway

@maksymvavilov maksymvavilov moved this to In Progress in Kuadrant Mar 9, 2026
@maksymvavilov maksymvavilov force-pushed the egress-gw-rfc branch 4 times, most recently from c2b32ec to f73920c Compare March 11, 2026 06:08
@maksymvavilov maksymvavilov marked this pull request as ready for review March 11, 2026 09:46
@maksymvavilov maksymvavilov moved this from In Progress to Ready For Review in Kuadrant Mar 11, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Maskym Vavilov <mvavilov@redhat.com>

# Summary

Enable Kuadrant's data plane policies (AuthPolicy, RateLimitPolicy) to operate on egress traffic flowing through an Istio egress gateway. The egress gateway infrastructure (connectivity, TLS, routing) is provided by Istio and configured by the user. Kuadrant's role is to ensure its policy set works in the egress context, provide documentation and examples, and lay the groundwork for future egress-specific capabilities.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNSPolicy should also be included in the policy set for egress

- **Compliance and auditing**: Controlling and auditing which workloads can reach which external services
- **AI workloads**: External inference providers (OpenAI, Anthropic), remote MCP servers, and multi-cluster model serving all require egress gateway capabilities

The immediate ask comes from the RHOAI team, which needs rate limiting and authentication support for outbound traffic to external AI inference providers. Projects like MCP Gateway already handle egress connectivity (ServiceEntry, DestinationRule, HTTPRoute) but explicitly delegate auth and rate limiting to Kuadrant.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this it isn't an upstream concern

1. Ensure the Kuadrant data plane policies (AuthPolicy, RateLimitPolicy) work with OSSM/Istio for egress
2. Support egress with the MCP Gateway (expected to work already — validate and document)
3. Provide examples and documentation for using AuthPolicy for token exchange and auth token management in the egress context (particularly for AI inference use cases)
4. Document the egress gateway setup including secure connectivity via ServiceEntry and DestinationRule (configured by the end user, not by Kuadrant)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leverage DNSPolicy to direct traffic to the Egress gateway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think we are missing something here, one of the goals needs to be about configuring the dns routing pod -> egress gateway -> external service.

Applications should call external services using real hostnames (api.openai.com) without knowing about the egress gateway:

  1. DNSPolicy attached to egress Gateway (using existing DNSPolicy functionality) creates DNSRecord CRs in a DNS zone
  2. The zone can be any provider: Kuadrant CoreDNS, AWS Route53, Google Cloud DNS, Azure DNS, etc.
  3. DNS records map external hostnames to gateway IP: api.openai.com → 10.96.x.x
  4. Cluster DNS forwards queries for those specific domains to the zone's DNS server
  5. Application resolves hostname → gateway IP → traffic flows through gateway transparently
Application: curl http://api.openai.com
    ↓ DNS query
Cluster DNS (forwards api.openai.com queries to DNS zone)
    ↓
DNS Zone (any provider: CoreDNS, Route53, GCP, Azure)
    ↓ returns: 10.96.x.x (gateway IP)
Application: HTTP to 10.96.x.x (Host: api.openai.com)
    ↓
Egress Gateway (HTTPRoute matches hostname, AuthPolicy injects creds)
    ↓
api.openai.com (receives request with credentials)

I think we should add an investigation task for DNS-based routing alongside Task 0a (OSSM) and Task 0b (Credential Injection). This validates how applications can transparently route to the egress gateway using external hostnames.

What Needs Testing

1. DNSPolicy creates records for egress gateway

  • Attach DNSPolicy to egress Gateway (existing DNSPolicy functionality, no code changes)
  • Verify DNSRecords created with gateway ClusterIP as targets
  • Test with multiple providers: Kuadrant CoreDNS, Route53, etc.

2. Cluster DNS forwarding configuration

  • Kubernetes: Configure kube-system/coredns ConfigMap with forward directive for specific domains
  • OpenShift: Configure dns.operator.openshift.io/default with forwardPlugin
  • Validate pods resolve external hostname to gateway IP

3. End-to-end flow

  • Application uses external hostname directly (api.openai.com)
  • DNS resolves to gateway IP
  • HTTPRoute matches external hostname
  • AuthPolicy injects credentials

Scope

Stage 1 (investigation):

  • Use existing DNSPolicy attached to egress Gateway (no new "egress mode" needed - DNSPolicy just works on any Gateway)
  • Manual cluster DNS forwarding configuration (document the pattern)
  • Validate on Kubernetes and OpenShift
  • Test with different DNS providers

Out of scope:

  • No new DNSPolicy "egress mode" required
  • Automated cluster DNS forwarding configuration (could be a future enhancement, but not part of this egress work)


1. **No mesh requirement** — The solution must not require workloads to be in a service mesh. No sidecar injection. Workloads reach the egress gateway via its IP address or DNS
2. **Egress connectivity is user-configured** — Kuadrant does not create or manage ServiceEntry, DestinationRule, or egress gateway deployments. These are configured by the user via Istio APIs. Kuadrant operates on traffic already flowing through the gateway. **Risk:** Some Istio-based Gateway API implementations (e.g., Red Hat OpenShift Cluster Ingress Controller) may reserve Istio APIs (ServiceEntry, DestinationRule) for infrastructure use only, making them unavailable to end users. On such platforms, the egress gateway setup described here may not be supported. This constraint limits initial egress support to environments where Istio APIs are fully accessible (e.g., OSSM deployed via the Sail Operator)
3. **Traffic routing to the egress gateway is assumed** — For Stage 1, we assume traffic is already flowing through the egress gateway (via explicit gateway IP, DNS configuration, or network policy). How workloads discover and reach the egress gateway is the user's responsibility. Stage 2 explores using DNSPolicy with CoreDNS to automate this (see [Future Work](#dns-for-egress))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by stage 2 do you mean something beyond this work? I am reading this as the scope for egress gateway work within the next release. So DNS being stage 2 here is concerning me

Copy link
Member

@mikenairn mikenairn Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of this work is to help users get routing working from pod to external service see https://github.com/Kuadrant/architecture/pull/144/files#r2964831280. This seems misleading since it should be a main part of the work to explain how to do it.


**1. Gateway** — Defines the egress gateway listeners:
```yaml
apiVersion: gateway.networking.k8s.io/v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using metallb to provision LB ip address this is fine. But we may also want to specify using a clusterIP for the gateway to avoid for example a AWS LB being created

sni: api.example.com
```

For mTLS to external services (where the gateway presents a client certificate), use `mode: MUTUAL` with `clientCertificate`, `privateKey`, and `caCertificates` fields.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably remove this as it isn't something we are implementing


ServiceEntry is chosen over the upstream wg-ai-gateway XBackendDestination CRD because XBackendDestination is still in early alpha — the API is not finalized and there is no production track record. ServiceEntry is battle-tested and widely deployed.

## OSSM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can prob move this to an alternatives / considered / future work section


## OSSM

OpenShift Service Mesh (OSSM) is Red Hat's supported distribution of Istio on OpenShift, deployed via the Sail Operator. "Support OSSM/Istio" means ensuring Kuadrant's policies work on this specific distribution. In practice OSSM is Istio with Red Hat packaging, so the implementation work is the same — but validation must happen on OSSM specifically.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this about OSSM in the design doc

@@ -0,0 +1,373 @@
# Egress Gateway Support for Kuadrant
Copy link
Member

@mikenairn mikenairn Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add something early on that explains this is istio configuration we are talking about and not some Egress Gateway type.

Example:

In this RFC, "egress gateway" refers to the Istio pattern of using a standard Gateway API Gateway resource configured for outbound traffic to external services, with routing defined via Istio's ServiceEntry and DestinationRule CRDs. This is not a separate Gateway type or Kuadrant-managed resource — it is user-configured infrastructure that Kuadrant policies attach to.


The full stack — MCP Gateway for routing, Kuadrant for policy enforcement — needs end-to-end validation on egress (Task 4).

## Stage 2: Future Work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a need for Stage 1 and Stage 2. Stage 2 from what i can tell is work you are saying we aren't going to do as part of this proposal. If that is correct, could all this be condensed into a "Future X" section, and depending on what was in it , updates to the Non Goals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready For Review

Development

Successfully merging this pull request may close these issues.

4 participants