Conversation
019b26e to
aad926c
Compare
c2b32ec to
f73920c
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Maskym Vavilov <mvavilov@redhat.com>
f73920c to
6f90f95
Compare
|
|
||
| # Summary | ||
|
|
||
| Enable Kuadrant's data plane policies (AuthPolicy, RateLimitPolicy) to operate on egress traffic flowing through an Istio egress gateway. The egress gateway infrastructure (connectivity, TLS, routing) is provided by Istio and configured by the user. Kuadrant's role is to ensure its policy set works in the egress context, provide documentation and examples, and lay the groundwork for future egress-specific capabilities. |
There was a problem hiding this comment.
DNSPolicy should also be included in the policy set for egress
| - **Compliance and auditing**: Controlling and auditing which workloads can reach which external services | ||
| - **AI workloads**: External inference providers (OpenAI, Anthropic), remote MCP servers, and multi-cluster model serving all require egress gateway capabilities | ||
|
|
||
| The immediate ask comes from the RHOAI team, which needs rate limiting and authentication support for outbound traffic to external AI inference providers. Projects like MCP Gateway already handle egress connectivity (ServiceEntry, DestinationRule, HTTPRoute) but explicitly delegate auth and rate limiting to Kuadrant. |
There was a problem hiding this comment.
I think we can remove this it isn't an upstream concern
| 1. Ensure the Kuadrant data plane policies (AuthPolicy, RateLimitPolicy) work with OSSM/Istio for egress | ||
| 2. Support egress with the MCP Gateway (expected to work already — validate and document) | ||
| 3. Provide examples and documentation for using AuthPolicy for token exchange and auth token management in the egress context (particularly for AI inference use cases) | ||
| 4. Document the egress gateway setup including secure connectivity via ServiceEntry and DestinationRule (configured by the end user, not by Kuadrant) |
There was a problem hiding this comment.
Leverage DNSPolicy to direct traffic to the Egress gateway
There was a problem hiding this comment.
Yes, i think we are missing something here, one of the goals needs to be about configuring the dns routing pod -> egress gateway -> external service.
Applications should call external services using real hostnames (api.openai.com) without knowing about the egress gateway:
- DNSPolicy attached to egress Gateway (using existing DNSPolicy functionality) creates
DNSRecordCRs in a DNS zone - The zone can be any provider: Kuadrant CoreDNS, AWS Route53, Google Cloud DNS, Azure DNS, etc.
- DNS records map external hostnames to gateway IP:
api.openai.com → 10.96.x.x - Cluster DNS forwards queries for those specific domains to the zone's DNS server
- Application resolves hostname → gateway IP → traffic flows through gateway transparently
Application: curl http://api.openai.com
↓ DNS query
Cluster DNS (forwards api.openai.com queries to DNS zone)
↓
DNS Zone (any provider: CoreDNS, Route53, GCP, Azure)
↓ returns: 10.96.x.x (gateway IP)
Application: HTTP to 10.96.x.x (Host: api.openai.com)
↓
Egress Gateway (HTTPRoute matches hostname, AuthPolicy injects creds)
↓
api.openai.com (receives request with credentials)
I think we should add an investigation task for DNS-based routing alongside Task 0a (OSSM) and Task 0b (Credential Injection). This validates how applications can transparently route to the egress gateway using external hostnames.
What Needs Testing
1. DNSPolicy creates records for egress gateway
- Attach DNSPolicy to egress Gateway (existing DNSPolicy functionality, no code changes)
- Verify DNSRecords created with gateway ClusterIP as targets
- Test with multiple providers: Kuadrant CoreDNS, Route53, etc.
2. Cluster DNS forwarding configuration
- Kubernetes: Configure
kube-system/corednsConfigMap with forward directive for specific domains - OpenShift: Configure
dns.operator.openshift.io/defaultwithforwardPlugin - Validate pods resolve external hostname to gateway IP
3. End-to-end flow
- Application uses external hostname directly (
api.openai.com) - DNS resolves to gateway IP
- HTTPRoute matches external hostname
- AuthPolicy injects credentials
Scope
Stage 1 (investigation):
- Use existing DNSPolicy attached to egress Gateway (no new "egress mode" needed - DNSPolicy just works on any Gateway)
- Manual cluster DNS forwarding configuration (document the pattern)
- Validate on Kubernetes and OpenShift
- Test with different DNS providers
Out of scope:
- No new DNSPolicy "egress mode" required
- Automated cluster DNS forwarding configuration (could be a future enhancement, but not part of this egress work)
|
|
||
| 1. **No mesh requirement** — The solution must not require workloads to be in a service mesh. No sidecar injection. Workloads reach the egress gateway via its IP address or DNS | ||
| 2. **Egress connectivity is user-configured** — Kuadrant does not create or manage ServiceEntry, DestinationRule, or egress gateway deployments. These are configured by the user via Istio APIs. Kuadrant operates on traffic already flowing through the gateway. **Risk:** Some Istio-based Gateway API implementations (e.g., Red Hat OpenShift Cluster Ingress Controller) may reserve Istio APIs (ServiceEntry, DestinationRule) for infrastructure use only, making them unavailable to end users. On such platforms, the egress gateway setup described here may not be supported. This constraint limits initial egress support to environments where Istio APIs are fully accessible (e.g., OSSM deployed via the Sail Operator) | ||
| 3. **Traffic routing to the egress gateway is assumed** — For Stage 1, we assume traffic is already flowing through the egress gateway (via explicit gateway IP, DNS configuration, or network policy). How workloads discover and reach the egress gateway is the user's responsibility. Stage 2 explores using DNSPolicy with CoreDNS to automate this (see [Future Work](#dns-for-egress)) |
There was a problem hiding this comment.
by stage 2 do you mean something beyond this work? I am reading this as the scope for egress gateway work within the next release. So DNS being stage 2 here is concerning me
There was a problem hiding this comment.
I think part of this work is to help users get routing working from pod to external service see https://github.com/Kuadrant/architecture/pull/144/files#r2964831280. This seems misleading since it should be a main part of the work to explain how to do it.
|
|
||
| **1. Gateway** — Defines the egress gateway listeners: | ||
| ```yaml | ||
| apiVersion: gateway.networking.k8s.io/v1 |
There was a problem hiding this comment.
When using metallb to provision LB ip address this is fine. But we may also want to specify using a clusterIP for the gateway to avoid for example a AWS LB being created
| sni: api.example.com | ||
| ``` | ||
|
|
||
| For mTLS to external services (where the gateway presents a client certificate), use `mode: MUTUAL` with `clientCertificate`, `privateKey`, and `caCertificates` fields. |
There was a problem hiding this comment.
we can probably remove this as it isn't something we are implementing
|
|
||
| ServiceEntry is chosen over the upstream wg-ai-gateway XBackendDestination CRD because XBackendDestination is still in early alpha — the API is not finalized and there is no production track record. ServiceEntry is battle-tested and widely deployed. | ||
|
|
||
| ## OSSM |
There was a problem hiding this comment.
We can prob move this to an alternatives / considered / future work section
|
|
||
| ## OSSM | ||
|
|
||
| OpenShift Service Mesh (OSSM) is Red Hat's supported distribution of Istio on OpenShift, deployed via the Sail Operator. "Support OSSM/Istio" means ensuring Kuadrant's policies work on this specific distribution. In practice OSSM is Istio with Red Hat packaging, so the implementation work is the same — but validation must happen on OSSM specifically. |
There was a problem hiding this comment.
I don't think we need this about OSSM in the design doc
| @@ -0,0 +1,373 @@ | |||
| # Egress Gateway Support for Kuadrant | |||
There was a problem hiding this comment.
I think we should add something early on that explains this is istio configuration we are talking about and not some Egress Gateway type.
Example:
In this RFC, "egress gateway" refers to the Istio pattern of using a standard Gateway API Gateway resource configured for outbound traffic to external services, with routing defined via Istio's ServiceEntry and DestinationRule CRDs. This is not a separate Gateway type or Kuadrant-managed resource — it is user-configured infrastructure that Kuadrant policies attach to.
|
|
||
| The full stack — MCP Gateway for routing, Kuadrant for policy enforcement — needs end-to-end validation on egress (Task 4). | ||
|
|
||
| ## Stage 2: Future Work |
There was a problem hiding this comment.
I'm not sure there's a need for Stage 1 and Stage 2. Stage 2 from what i can tell is work you are saying we aren't going to do as part of this proposal. If that is correct, could all this be condensed into a "Future X" section, and depending on what was in it , updates to the Non Goals?
RFC for providing a support for Egress Gateway