-
Notifications
You must be signed in to change notification settings - Fork 7
First pass at a schema for Backend plus musings on scope
#20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -166,15 +166,14 @@ spec: | |
| type: FQDN | ||
| fqdn: | ||
| hostname: api.openai.com | ||
| port: 443 | ||
| tls: | ||
| mode: Terminate | Passthrough | Mutual | ||
| sni: api.openai.com | ||
| caBundleRef: | ||
| name: vendor-ca | ||
| # clientCertificateRef: # if MUTUAL | ||
| # name: egress-client-cert | ||
| # possible extension semantics, for illustration purposes only. | ||
| ports: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure how does the gateway select which port as the target if the backend is refered by a |
||
| - number: 443 | ||
| protocol: TLS | ||
| tls: | ||
| mode: SIMPLE | MUTUAL | PASSTHROUGH | PLATFORM_PROVIDED | INSECURE_DISABLE | ||
| sni: api.openai.com | ||
| caBundleRef: | ||
| name: vendor-ca | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not related, but i wonder where is the caBundle located, just a name is not enough There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would we be using ClusterTrustBundle (which is cluster-scoped, so |
||
| extensions: | ||
| - name: inject-credentials | ||
| type: gateway.networking.k8s.io/CredentialInjector:v1 | ||
|
|
@@ -187,14 +186,11 @@ spec: | |
| namespace: platform-secrets | ||
| ``` | ||
|
|
||
| #### TLS Policy | ||
|
|
||
| The example above inlines a basic TLS configuration directly on the Backend resource. This is intentional. | ||
| Gateway API’s existing `BackendTLSPolicy` is designed around Service-based backends. | ||
|
|
||
| Using it for egress today would require representing each external FQDN as a synthetic Service, which this proposal aims to avoid. | ||
| #### TLS Policy | ||
|
|
||
| As the `Backend` resource shape stabilizes, we SHOULD evaluate whether `BackendTLSPolicy` can be reused, extended, or aligned for external egress use cases. | ||
| The example above inlines a basic TLS configuration directly on the `Backend` resource. This is intentional. | ||
| Gateway API’s existing `BackendTLSPolicy` is designed around Service-based backends only and may end up being too restrictive for our needs. More specifically, using it for egress today would require representing each external FQDN as a synthetic Service, which this proposal aims to avoid. Furthermore, one could argue that inlined TLS policy provides simpler UX, especially in egress use-cases. As the `Backend` resource shape stabilizes, we SHOULD evaluate whether `BackendTLSPolicy` can be reused, extended, or aligned for external egress use cases. | ||
|
|
||
| #### Backend Extensions | ||
|
|
||
|
|
@@ -203,6 +199,205 @@ Those topics are covered in the separate **[Payload Processing proposal](../7-p | |
|
|
||
| Examples in this document are illustrative only. | ||
|
|
||
| #### Scope and Persona Ownership | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @howardjohn @mikemorris Feel free to chime in with any thoughts on the scoping story
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm maybe this isn't an issue since, at this point, Backend is only referenced via an xRoute? But I still wonder how the admin sets policy for a particular FQDN if any app owner can create a Backend
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having both a global and local Backend makes sense to me. If we go this route, the main question would be around resolving conflicts. What seems correct, is for a globally scoped Backend to take precedence. This avoids the problem of needing to ensuring that a global policy -- which may be required for compliance -- isn't silently overridden. If we go this route we'd need to set a status condition on the namespaced backend to indicate that it's being overridden.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 - Global backends should take precedence over local ones and we should report in status. This is slowly becoming my preferred option the more that I think about it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think my preferred approach for this would be having both namespace-scoped and cluster-scoped options for a frontend, but keeping Backend as a single namespaced resource, and treating cluster/global scoped definitions as a "last hop" rather than "override". I'll try to illustrate how I'm defining that difference below: OverrideIn the override model, a ClusterServiceEntry for "Last hop"In the last hop model, a ClusterServiceEntry for Thinking this through further, there might be use cases for each model, similar to the overrides vs defaults behavior described in https://gateway-api.sigs.k8s.io/geps/gep-2649/?h=override#hierarchy and maybe this behavior should be configurable? Ref #20 (comment) for further exploration on explicitly routing through an egress Gateway. |
||
|
|
||
| While the namespaced ownership semantics of Kubernetes `Service`s are well-defined, the story for our proposed `Backend` resource is less clear, specifically for FQDN destinations. The fundamental question at issue is: who "owns" the destination, and what is the appropriate scope for defining it? There are two basic options: | ||
|
|
||
| - **Namespaced Backends**: Each namespace defines its own `Backend` resources for the external destinations it needs to reach. This model aligns with existing Kubernetes patterns, where resources are scoped to the namespace of the consuming workload. While this model allows __service owners__ to manage their own backends independently, it may lead to duplication if multiple namespaces need to reach the same external service. Furthermore, it may complicate cross-namespace policy enforcement if, for example, the egress gateway is in a central namespace (e.g. "egress-system") and multiple, disparate namespaces define conflicting `Backend` resources for the same FQDN. In this case, the gateway implementation would have to apply different policy depending on the source namespace of the request which could get combinatorially expensive. It also removes any ability for the cluster admin to centrally manage and audit egress destinations or apply a default set of policies for all egress traffic to said destination. | ||
|
|
||
| - **Cluster-scoped Backends**: `Backend` resources are defined at the cluster scope, allowing a single definition per external destination. This model aligns with the idea that __platform operators__ or __cluster admins__ are responsible for managing egress destinations and their associated policies. It simplifies policy enforcement at the gateway level, as there is a single source of truth for each destination. However, it may limit the flexibility of service owners to define custom backends or policies for their specific needs. | ||
|
|
||
| Realistically, both models have merit and are widely used across many gateway/mesh implementations. Prior art from the Network Policy subproject (i.e. `AdminNetworkPolicy` vs `NetworkPolicy`) suggests that both cluster-scoped and namespaced resources can coexist to serve different personas and use cases. We should consider whether: | ||
|
|
||
| 1. Whether `Backend` should be namespaced or cluster-scoped. | ||
| 2. Whether we should define both namespaced and cluster-scoped variants of `Backend` (e.g. `GlobalBackend` or `ClusterWideBakcend`)to serve different personas (service owners vs platform operators). | ||
|
Comment on lines
+212
to
+213
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for namespace scope There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's centralize this discussion in #20 (comment)? |
||
|
|
||
| Experience from implementations (e.g. this [discussion on Istio's ServiceEntry resource](https://docs.google.com/document/d/1uDWoWxHyMCE4oUc-nTJPfVoQyikZP-UMp-_BrAA-PQE/edit?tab=t.0)) and user feedback will be critical for informing this decision. | ||
|
|
||
| #### Schema Definition | ||
|
|
||
| ```go | ||
| // +genclient | ||
| // +kubebuilder:object:root=true | ||
| // +kubebuilder:subresource:status | ||
| // Backend is the Schema for the backends API. | ||
| type Backend struct { | ||
| metav1.TypeMeta `json:",inline"` | ||
| // metadata is a standard object metadata. | ||
| // +optional | ||
| metav1.ObjectMeta `json:"metadata,omitempty"` | ||
| // spec defines the desired state of Backend. | ||
| // +required | ||
| Spec BackendSpec `json:"spec"` | ||
| // status defines the observed state of Backend. | ||
| // +optional | ||
| Status BackendStatus `json:"status,omitempty"` | ||
| } | ||
|
|
||
| // BackendSpec defines the desired state of Backend. | ||
| type BackendSpec struct { | ||
| // destination defines the backend destination to route traffic to. | ||
| // +required | ||
| Destination BackendDestination `json:"destination"` | ||
| // extensions defines optional extension processors that can be applied to this backend. | ||
| // +optional | ||
| Extensions []BackendExtension `json:"extensions,omitempty"` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a clear set of examples for how these extensions would be used? If not, can we omit them until we do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One example from above is CredentialInjector - these could look quite similar to the HTTPRouteFilter |
||
| } | ||
|
|
||
| // TODO: Do we need the destination field or can we inline this all | ||
| // in spec? | ||
| // +kubebuilder:validation:ExactlyOneOf=fqdn;service;ip | ||
| type BackendDestination struct { | ||
| // +required | ||
| Type BackendType `json:"type"` | ||
| // +optional | ||
| Ports []BackendPort `json:"ports,omitempty"` | ||
| // +optional | ||
| FQDN *FQDNBackend `json:"fqdn,omitempty"` | ||
| // Service *ServiceBackend `json:"service,omitempty"` | ||
| // IP *IPBackend `json:"ip,omitempty"` | ||
| } | ||
|
|
||
| // BackendType defines the type of the Backend destination. | ||
| // +kubebuilder:validation:Enum=FQDN;IP;Service | ||
| type BackendType string | ||
|
|
||
| const ( | ||
| // FQDN represents a fully qualified domain name. | ||
| BackendTypeFQDN BackendType = "FQDN" | ||
| // IP represents an IP address. | ||
| BackendTypeIP BackendType = "IP" | ||
| BackendTypeService BackendType = "Service" | ||
| ) | ||
|
|
||
| type BackendPort struct { | ||
| // Number defines the port number of the backend. | ||
| // +required | ||
| // +kubebuilder:validation:Minimum=1 | ||
| // +kubebuilder:validation:Maximum=65535 | ||
| Number uint32 `json:"number"` | ||
| // Protocol defines the protocol of the backend. | ||
| // +required | ||
| // +kubebuilder:validation:MaxLength=256 | ||
| Protocol BackendProtocol `json:"protocol"` | ||
| // TLS defines the TLS configuration for the backend. | ||
| // +optional | ||
| TLS *BackendTLS `json:"tls,omitempty"` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whats the semantics of policy-attached BackendTLSPolicy + inline co-existing?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now at least (pre-GEP), I'd say BackendTLSPolicy is not allowed to have Backend as a targetRef, so we can defer the decision after we get a better sense of Backend semantics (e.g scoping). I have a bias towards inlining, so my ideal would probably be to have the inline policy take precedence if defined
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a decision to choose anything other than BackendTLSPolicy here requires significantly more discussion + detail in this proposal. If the goal is just to copy + inline BackendTLSPolicy types, that might make sense, but there are other benefits of a policy here, such as the ability to reuse config across different backends. |
||
| // +optional | ||
| ProtocolOptions *BackendProtocolOptions `json:"protocolOptions,omitempty"` | ||
| } | ||
|
|
||
| // BackendProtocol defines the protocol for backend communication. | ||
| // +kubebuilder:validation:Enum=HTTP;HTTPS;GRPC;TCP;TLS;MCP | ||
| type BackendProtocol string | ||
|
|
||
| const ( | ||
| BackendProtocolHTTP BackendProtocol = "HTTP" | ||
| BackendProtocolHTTPS BackendProtocol = "HTTPS" | ||
keithmattix marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| BackendProtocolGRPC BackendProtocol = "GRPC" | ||
| BackendProtocolTCP BackendProtocol = "TCP" | ||
| BackendProtocolTLS BackendProtocol = "TLS" | ||
| BackendProtocolMCP BackendProtocol = "MCP" | ||
| ) | ||
|
|
||
| type BackendTLS struct { | ||
| // Mode defines the TLS mode for the backend. | ||
| // +required | ||
| Mode BackendTLSMode `json:"mode"` | ||
| // SNI defines the server name indication to present to the upstream backend. | ||
| // +optional | ||
| SNI string `json:"sni,omitempty"` | ||
| // CaBundleRef defines the reference to the CA bundle for validating the backend's | ||
| // certificate. | ||
| // Defaults to system CAs if not specified. | ||
| // +optional | ||
| CaBundleRef []ObjectReference `json:"caBundleRef,omitempty"` | ||
|
|
||
| InsecureSkipVerify *bool `json:"insecureSkipVerify,omitempty"` | ||
|
|
||
| // ClientCertificateRef defines the reference to the client certificate for mutual | ||
| // TLS. Only used if mode is MUTUAL. | ||
| // +optional | ||
| ClientCertificateRef *SecretObjectReference `json:"clientCertificateRef,omitempty"` | ||
|
|
||
| SubjectAltNames []string `json:"subjectAltNames,omitempty"` | ||
| } | ||
|
|
||
| // BackendTLSMode defines the TLS mode for backend connections. | ||
| // +kubebuilder:validation:Enum=SIMPLE;MUTUAL;PASSTHROUGH;PLATFORM_PROVIDED;INSECURE_DISABLE | ||
keithmattix marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| type BackendTLSMode string | ||
|
|
||
| const ( | ||
| // Enable TLS with simple server certificate verification. | ||
| BackendTLSModeSIMPLE BackendTLSMode = "SIMPLE" | ||
| // Enable mutual TLS. | ||
| BackendTLSModeMUTUAL BackendTLSMode = "MUTUAL" | ||
| // Don't terminate TLS, use SNI to route. | ||
| BackendTLSModePASSTHROUGH BackendTLSMode = "PASSTHROUGH" | ||
| // Use implementation's built-in TLS (e.g. service mesh powered mTLS). | ||
| BackendTLSModePLATFORM_PROVIDED BackendTLSMode = "PLATFORM_PROVIDED" | ||
| // Disable TLS. | ||
| BackendTLSModeINSECURE_DISABLE BackendTLSMode = "INSECURE_DISABLE" | ||
|
||
| ) | ||
|
|
||
| // +kubebuilder:validation:ExactlyOneOf=mcp | ||
| type BackendProtocolOptions struct { | ||
| // +optional | ||
| MCP *MCPProtocolOptions `json:"mcp,omitempty"` | ||
| } | ||
|
|
||
| type MCPProtocolOptions struct { | ||
| // MCP protocol version. MUST be one of V2|V3. | ||
| // +optional | ||
| // +kubebuilder:validation:MaxLength=256 | ||
| Version string `json:"version,omitempty"` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now, the format is
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry you caught some of the AI autocomplete I missed. Yeah we should definitely do that |
||
| // URL path for MCP traffic. Default is /mcp. | ||
| // +optional | ||
| // +kubebuilder:default:=/mcp | ||
| Path string `json:"path,omitempty"` | ||
| } | ||
|
|
||
| // FQDNBackend describes a backend that exists outside of the cluster. | ||
| // Hostnames must not be cluster.local domains or otherwise refer to | ||
| // Kubernetes services within a cluster. Implementations must report | ||
| // violations of this requirement in status. | ||
| type FQDNBackend struct { | ||
| // Hostname of the backend service. Examples: "api.example.com" | ||
| // +required | ||
| Hostname string `json:"hostname"` | ||
| } | ||
|
|
||
| type BackendExtension struct { | ||
| // +required | ||
| Name string `json:"name"` | ||
| // +required | ||
| Type string `json:"type"` | ||
| // TODO: How does this work practically? Can we leverage Kubernetes unstructured types here? | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An unstructured type makes sense, but my assumption is that we should require a schema for every extension type even when there's no CRD. The schemas can be stored in, or linked from, a config map, then the configs can be verified by a webhook or the controller. It lets us have our cake and eat it too in terms of having config validation without requiring a CRD for each extension. It also has the knock-on advantage of advertising all of the available extension types.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I definitely think there must be some sort of schema, but I'm wondering if, from an implementation perspective, it makes sense to force folks to use Kubernetes schemes specifically (which I think is required if we rely on unstructured) |
||
| // Would implementations have to define a schema for their extensions (even if they aren't CRDs)? | ||
| // Maybe that's a good thing? | ||
| Config any `json:"config,omitempty"` | ||
|
Comment on lines
+375
to
+377
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this works in a k8s API? |
||
| } | ||
|
|
||
| // BackendStatus defines the observed state of Backend. | ||
| type BackendStatus struct { | ||
| // For Kubernetes API conventions, see: | ||
| // https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties | ||
| // conditions represent the current state of the Backend resource. | ||
| // Each condition has a unique type and reflects the status of a specific aspect of the resource. | ||
| // | ||
| // Standard condition types include: | ||
| // - "Available": the resource is fully functional | ||
| // - "Progressing": the resource is being created or updated | ||
| // - "Degraded": the resource failed to reach or maintain its desired state | ||
| // | ||
| // The status of each condition is one of True, False, or Unknown. | ||
| // +listType=map | ||
| // +listMapKey=type | ||
| // +optional | ||
| Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
| } | ||
| ``` | ||
|
|
||
| ## Routing Modes | ||
|
|
||
| ### Endpoint Mode | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to think about wildcard support here. This was a huge request in Istio since folks didn't want to manually iterate all of their backends (e.g. each individual s3 bucket). This likely intersects with the dynamic forward proxy use-case though, so my plan for now is to defer it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this as the forward-proxy section listed in the proposal, which is intended not to cover first