Skip to content

Commit 6e1559e

Browse files
committed
Add API to GEP-3793.
Signed-off-by: Flynn <[email protected]>
1 parent de6d664 commit 6e1559e

File tree

1 file changed

+298
-9
lines changed

1 file changed

+298
-9
lines changed

geps/gep-3793/index.md

Lines changed: 298 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,11 @@ she doesn't have to explicitly name, and can simply trust to exist.
7373
not in scope for this GEP, in order to have a fighting chance of getting
7474
functionality into Gateway API 1.4.
7575

76+
Additionally, note that providing support for Chihiro to swap the default
77+
Gateway without downtime may very well require supporting multiple default
78+
Gateways at the same time, since Kubernetes does not support atomic swaps of
79+
resources.
80+
7681
- Allow Ana to override Chihiro's choice for the default Gateway for a given
7782
Route without explicitly specifying the Gateway.
7883

@@ -161,10 +166,10 @@ Gateways.
161166

162167
## API
163168

164-
Most of the API work for this GEP is TBD at this point. The challenge is to
165-
find a way to allow Ana to use Routes without requiring her to specify the
166-
Gateway explicitly, while still allowing Chihiro and Ian to retain control
167-
over the Gateway and its configuration.
169+
The main challenge in the API design is to find a way to allow Ana to use
170+
Routes without requiring her to specify the Gateway explicitly, while still
171+
allowing Chihiro and Ian to retain control over the Gateway and its
172+
configuration.
168173

169174
An additional concern is CD tools and GitOps workflows. In very broad terms,
170175
these tools function by applying manifests from a Git repository to a
@@ -189,10 +194,274 @@ will need resolution before this GEP can graduate.
189194

190195
[discussion]: https://github.com/kubernetes-sigs/gateway-api/pull/3852#discussion_r2140117567
191196

197+
Finally, although support for multiple default Gateways is a non-goal for this
198+
GEP, it's worth noting that allowing Chihiro full control over the default
199+
Gateway is very much a goal, which includes giving Chihiro a clean way to swap
200+
one default Gateway for another. This is important because a zero-downtime
201+
swap implies having two default Gateways running at the same time, since
202+
Kubernetes does not support any sort of atomic swap operation.
203+
192204
### Gateway for Ingress (North/South)
193205

206+
There are two main aspects to the API design for default Gateways:
207+
208+
1. Giving Ana a way to bind Routes to the default Gateway.
209+
210+
2. Giving Chihiro a way to control which Gateway is the default, and to
211+
enumerate which Routes are bound to it.
212+
213+
#### 1. Binding a Route to the Default Gateway
214+
215+
For Ana to indicate that a Route should use the default Gateway, she MUST
216+
leave `parentRefs` empty in the `spec` of the Route, for example:
217+
218+
```yaml
219+
apiVersion: gateway.networking.k8s.io/v1
220+
kind: HTTPRoute
221+
metadata:
222+
name: my-route
223+
spec:
224+
rules:
225+
- backendRefs:
226+
- name: my-service
227+
port: 80
228+
```
229+
230+
would route _all_ HTTP traffic arriving at the default Gateway to `my-service`
231+
on port 80.
232+
233+
Note that Ana MUST omit `parentRefs` entirely: specifying an empty array for
234+
`parentRefs` MUST fail validation. If a Route with an empty array for
235+
`parentRefs` somehow exists in the cluster, all Gateways in the cluster MUST
236+
refuse to accept it. (Omitting `parentRefs` entirely will work much more
237+
cleanly with GitOps tools than specifying an empty array.)
238+
239+
Note also that if Ana specifies _any_ `parentRefs`, the default Gateway MUST
240+
NOT claim the Route unless of the `parentRefs` explicitly names the default
241+
Gateway. To do otherwise makes it impossible for Ana to define mesh-only
242+
Routes, or to specify a Route that is meant to use only a specific Gateway
243+
that is not the default. This implies that for Ana to specify a Route intended
244+
to serve both north/south and east/west roles, she MUST explicitly specify the
245+
Gateway in `parentRefs`, even if that Gateway happens to be the default
246+
Gateway.
247+
248+
All other characteristics of a Route using the default Gateway MUST behave the
249+
same as if the default Gateway were explicitly specified in `parentRefs`.
250+
251+
The default Gateway MUST use `status.parents` to announce that it has bound
252+
the Route, for example:
253+
254+
```yaml
255+
status:
256+
parents:
257+
- name: my-default-gateway
258+
namespace: default
259+
controllerName: gateway.networking.k8s.io/some-gateway-controller
260+
conditions:
261+
- type: Accepted
262+
status: "True"
263+
lastTransitionTime: "2025-10-01T12:00:00Z"
264+
message: "Route is bound to default Gateway"
265+
```
266+
267+
The default Gateway MUST NOT rewrite the `parentRefs` of a Route using the
268+
default Gateway; it MUST leave `parentRefs` empty. This becomes important if
269+
the default Gateway changes, or (in some situations) if GitOps tools are in
270+
play.
271+
272+
##### Enumerating Routes Bound to the Default Gateway
273+
274+
To enumerate Routes bound to the default Gateway, Ana can look for Routes with
275+
no `parentRefs` specified, and then check the `status.parents` of those Routes
276+
to see if the Route has been claimed. This will also tell Ana which Gateway is
277+
the default, even if she doesn't have RBAC to query Gateway resources
278+
directly.
279+
280+
While this is possible with `kubectl get -o yaml`, it's not exactly a friendly
281+
user experience, so adding this functionality to a tool like `gwctl` would be
282+
a dramatic improvement. In fact, looking at the `status` of a Route is very
283+
much something that we should expect Ana to do often, whether or not default
284+
Gateways are in play; `gwctl` or something similar SHOULD be able to show her
285+
which Routes are bound to which Gateways in every case, not just with default
286+
Gateways.
287+
288+
**Open Questions:**
289+
290+
Should the Gateway also add a `condition` explicitly expressing that the Route
291+
has been claimed by the default Gateway, perhaps with `type: DefaultGateway`?
292+
This could help tooling like `gwctl` more easily enumerate Routes bound to the
293+
default Gateway.
294+
295+
#### 2. Controlling which Gateway is the Default
296+
297+
Since Chihiro must be able to control which Gateway is the default, selecting
298+
the default Gateway must be an active configuration step taken by Chihiro,
299+
rather than any kind of implicit behavior. To that end, the Gateway resource
300+
will gain a new field, `spec.isDefault`:
301+
302+
```go
303+
type GatewaySpec struct {
304+
// ... other fields ...
305+
IsDefault *bool `json:"isDefault,omitempty"`
306+
}
307+
```
308+
309+
If `spec.isDefault` is set to `true`, the Gateway MUST claim Routes that have
310+
specified no `parentRefs` (subject to the usual Gateway API rules about which
311+
Routes may be bound to a Gateway), and it MUST update its own `status` to with
312+
a `condition` of type `DefaultGateway` and `status` true to indicate that it
313+
is the default Gateway, for example:
314+
315+
```yaml
316+
status:
317+
conditions:
318+
- type: DefaultGateway
319+
status: "True"
320+
lastTransitionTime: "2025-10-01T12:00:00Z"
321+
message: "Gateway is the default Gateway"
322+
```
323+
324+
If `spec.isDefault` is not present or is set to `false`, the Gateway MUST NOT
325+
claim those Routes and MUST NOT set the `DefaultGateway` condition in its
326+
`status`.
327+
328+
##### Access to the Default Gateway
329+
330+
The rules for which Routes may bind to a Gateway do not change for the default
331+
Gateway. In particular, if a default Gateway should accept Routes from other
332+
namespaces, then it MUST include the appropriate `AllowedRoutes` definition,
333+
and without such an `AllowedRoutes`, a default Gateway MUST accept only Routes
334+
from its own namespace.
335+
336+
##### Behavior with No Default Gateway
337+
338+
If no Gateway has `spec.isDefault` set to `true`, then the behavior is exactly
339+
the same as for Gateway API 1.3: all Routes MUST specify `parentRefs` in order
340+
to function, and no Gateway will claim Routes that do not specify
341+
`parentRefs`.
342+
343+
##### Deleting the Default Gateway
344+
345+
Deleting the default Gateway MUST behave the same as deleting any other
346+
Gateway: all Routes that were bound to the default Gateway MUST be unbound,
347+
and the `Accepted` conditions in the `status` of those Routes SHOULD be
348+
removed.
349+
350+
##### Multiple Default Gateways
351+
352+
Support for multiple default Gateways in a cluster is not one of the original
353+
goals of this GEP. However, allowing Chihiro to control which Gateway is the
354+
default - including being able to switch which Gateway is the default at
355+
runtime, without requiring downtime - is a goal.
356+
357+
Kubernetes itself will not prevent setting `spec.isDefault` to `true` on
358+
multiple Gateways in a cluster, and it also doesn't support any atomic swap
359+
mechanisms. If we want to enforce only a single default Gateway, the Gateway
360+
controllers will have to implement that enforcement logic. There are three
361+
possible options here.
362+
363+
1. Don't bother with any enforcement logic.
364+
365+
In this case, a Route with no `parentRefs` specified will be bound to _all_
366+
Gateways that have `spec.isDefault` set to `true`. Since Gateway API
367+
already allows a Route to be bound to multiple Gateways, and the Route
368+
`status` is already designed for it, this should function without
369+
difficulty.
370+
371+
2. Treat multiple Gateways with `spec.isDefault` set to `true` as if no
372+
Gateway has `spec.isDefault` set to `true`.
373+
374+
If we assume that all Gateway controllers in a cluster can see all the
375+
Gateways in the cluster, then detecting that multiple Gateways have
376+
`spec.isDefault` set to `true` is relatively straightforward.
377+
378+
For option 2, every Gateway with `spec.isDefault` set to `true` can simply
379+
refuse to accept Routes with no `parentRefs` specified, behaving as if no
380+
Gateway has been chosen as the default. Each Gateway would also update its
381+
`status` with a `condition` of type `DefaultGateway` and `status` false to
382+
indicate that it is not the default Gateway, for example:
383+
384+
```yaml
385+
status:
386+
conditions:
387+
- type: DefaultGateway
388+
status: "False"
389+
lastTransitionTime: "2025-10-01T12:00:00Z"
390+
message: "Multiple Gateways are marked as default"
391+
```
392+
393+
3. Perform conflict resolution as with Routes.
394+
395+
In this case, the oldest Gateway with `spec.isDefault` set to `true` will
396+
be considered the only default Gateway. That oldest Gateway will accept all
397+
Routes with no `parentRefs` specified, while all other Gateways with
398+
`spec.isDefault` set to `true` will ignore those Routes.
399+
400+
The oldest default Gateway will update its `status` to reflect that it the
401+
default Gateway; all other Gateways with `spec.isDefault` set to `true`
402+
will update their `status` as in Option 2.
403+
404+
Unfortunately, option 2 will almost certainly cause downtime in any case where
405+
Chihiro wants to change the default Gateway:
406+
407+
- If Chihiro deletes the default Gateway before creating the new one, then all
408+
routes using the default Gateway will be unbound during the time that
409+
there's no default Gateway, resulting in errors for any requests using those
410+
Routes.
411+
412+
- If Chihiro creates the new default Gateway before deleting the old one, then
413+
all Routes using the default Gateway are still unbound during the time that
414+
both Gateways exist.
415+
416+
Option 3 gives Chihiro a way to change the default Gateway without downtime:
417+
when they create the new default Gateway, it will not take effect until the
418+
old default Gateway is deleted. However, it doesn't give Chihiro any way to
419+
test the Routes through the new default Gateway before deleting the old
420+
Gateway.
421+
422+
Reluctantly, we must therefore conclude that option 1 is the only viable
423+
choice. Therefore: Gateways MUST NOT attempt to enforce a single default
424+
Gateway, and MUST allow Routes with no `parentRefs` to bind to _all_ Gateways
425+
that have `spec.isDefault` set to `true`. This is simplest to implement, it
426+
permits zero-downtime changes to the default Gateway, and it allows for
427+
testing of the new default Gateway before the old one is deleted.
428+
429+
##### Changes in Functionality
430+
431+
If Chihiro changes the default Gateway to a different implementation that does
432+
not support all the functionality of the previous default Gateway, then the
433+
Routes that were bound to the previous default Gateway will no longer function
434+
as expected. This is not a new problem: it already exists when Ana changes a
435+
Route's `parentRefs`, or when Chihiro changes the implementation of a Gateway
436+
that is explicitly specified in a Route's `parentRefs`.
437+
438+
At present, we do not propose any solution to this problem, other than to note
439+
that `gwctl` or similar tools SHOULD be able to show Ana not just the Gateways
440+
to which a Route is bound, but also the features supported by those Gateways,
441+
to at least help Ana understand if she is trying to use Gateways that don't
442+
support a feature that she needs. This is a definitely an area for future
443+
work, and it is complicated by the fact that Ana may not have access to read
444+
Gateway resources in the cluster at all.
445+
446+
##### Listeners, ListenerSets, and Merging
447+
448+
Setting `spec.isDefault` on a Gateway affects which Routes will bind to the
449+
Gateway, not where the Gateway listens for traffic. As such, setting
450+
`spec.isDefault` MUST NOT alter a Gateway's behavior with respect to
451+
Listeners, ListenerSets, or merging.
452+
453+
In the future, we may want to consider allowing a default ListenerSet rather
454+
than only a default Gateway, but that is not in scope for this GEP. Even if it
455+
is considered later, the guiding principle SHOULD be that `spec.isDefault`
456+
SHOULD NOT affect where a Gateway listens for traffic or whether it can be
457+
merged with other Gateways.
458+
194459
### Gateway For Mesh (East/West)
195460

461+
Mesh traffic is defined by using a Service as a `parentRef` rather than a
462+
Gateway. As such, there is no case where a default Gateway would be used for
463+
mesh traffic.
464+
196465
## Conformance Details
197466

198467
#### Feature Names
@@ -204,14 +473,34 @@ not seem like a good choice.
204473

205474
### Conformance tests
206475

476+
TBD.
477+
207478
## Alternatives
208479

209-
A possible alternative API design is to modify the behavior of Listeners or
210-
ListenerSets; rather than having a "default Gateway", perhaps we would have
211-
"[default Listeners]". One challenge here is that the Route `status` doesn't
212-
currently expose information about which Listener is being used, though it
213-
does show which Gateway is being used.
480+
- A possible alternative API design is to modify the behavior of Listeners or
481+
ListenerSets; rather than having a "default Gateway", perhaps we would have
482+
"[default Listeners]". One challenge here is that the Route `status` doesn't
483+
currently expose information about which Listener is being used, though it
484+
does show which Gateway is being used.
214485

215486
[default Listeners]: https://github.com/kubernetes-sigs/gateway-api/pull/3852#discussion_r2149056246
216487

488+
- We could define the default Gateway as a Gateway with a magic name, e.g.
489+
"default". This doesn't actually make things that much simpler for Ana
490+
(she'd still have to specify `parentRefs`), and it raises questions about
491+
Chihiro's ability to control which Routes can bind to the default Gateway,
492+
as well as how namespacing would work -- it's especially unhelpful for Ana
493+
if she has to know the namespace of the default Gateway in order to use it.
494+
495+
- A default Gateway could overwrite a Route's empty `parentRefs` with a
496+
non-empty `parentRefs` pointing to the default Gateway. The main challenge
497+
with this approach is that once the `parentRefs` are overwritten, it's no
498+
longer possible to know that the Route was originally intended to use the
499+
default Gateway. Using the `status` to indicate that the Route is bound to
500+
the default Gateway instead both preserves Ana's original intent and also
501+
makes it possible to change the default Gateway without requiring Ana to
502+
recreate all her Routes.
503+
217504
## References
505+
506+
TBD.

0 commit comments

Comments
 (0)