-
Notifications
You must be signed in to change notification settings - Fork 353
docs(MADR): add MADR-083 for Envoy Zone-Aware Routing #14532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Introduce a new MADR that defines how Envoy’s zone-aware routing feature will be integrated into Kuma’s control plane. ### What changed - Added MADR-083: “Enable Envoy Zone-Aware Routing in Kuma” - Defined decision context, drivers, and outcome for injecting zone_aware_lb_config via EDS/CDS - Specified new MeshLoadBalancingStrategy API with type: ZoneAware - Provided example YAML showing ZoneAware configuration using zoneIdentifier and minClusterSize ### Why - Ensure per-host throughput balance and minimize cross-AZ data transfer costs - Preserve Kuma’s declarative abstraction and existing kuma.io/zone labels Implements MADR for kumahq#9161 Signed-off-by: Arpit Mishra <[email protected]>
6411e22 to
79a67b1
Compare
Reviewer Checklist🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
|
slonka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, would love some input from @justindavies as well
| - Extend `MeshLoadBalancingStrategy.spec.default.localityAwareness.localZone` to accept: | ||
| - `type: ZoneAware` | ||
| - `minClusterSize`: mapped to Envoy’s `zone_aware_lb_config.min_cluster_size` | ||
| - `zoneIdentifier`: tag key for zone (e.g., `topology/availability-zone`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about
upstream.zone_routing.force_local_zone.min_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also do you know what @lukidzi wanted to achieve with this subZoneIdentifier mentioned in the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slonka 'upstream.zone_routing.force_local_zone' is a runtime config and as far as i understand, runtime config does not support cluster level configurations.
Is it correct or I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure about why @lukidzi suggested subzoneid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukidzi - can you weight in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukidzi any update here?
lobkovilya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the MADR, I think I understand the motivation, I left a comment related to the API
| name: backend | ||
| sectionName: http | ||
| default: | ||
| loadBalancer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC zone_aware_lb_config and locality_weighted_lb_config are 2 alternative approaches and they're mutually exclusive. At this moment, we are already using locality_weighted_lb_config when configuring policy like
type: MeshLoadBalancingStrategy
name: local-zone-affinity-backend
mesh: default
spec:
to:
- targetRef:
kind: MeshService
name: backend
default:
localityAwareness:
localZone:
affinityTags:
- key: kubernetes.io/hostname
weight: 9000
- key: topology.kubernetes.io/zone
weight: 9Your suggested API change doesn't reflect the fact 2 approaches are mutually exclusive, i.e. we'd have to add extra validation to reject policies like this (because it's not possible to use zone_aware_lb and locality_weighted_lb at the same time):
type: MeshLoadBalancingStrategy
name: local-zone-affinity-backend
mesh: default
spec:
to:
- targetRef:
kind: MeshService
name: backend
default:
loadBalancer:
type: ZoneAware
zoneAware:
# Minimum total endpoints before zone-aware logic activates
minClusterSize: 1
# Tag used on dataplanes to identify their zone
zoneIdentifier: topology/availability-zone
localityAwareness:
localZone:
affinityTags:
- key: kubernetes.io/hostname
weight: 9000
- key: topology.kubernetes.io/zone
weight: 9I think MeshLoadBalancingStrategy policy API should make it clear to users that zoneAware and localZone are 2 mutually exclusive approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobkovilya i was also concerned about this issue, but since the existing policy structure does not have generic keys like 'type' that takes kind of enums, I'm not able to solve this issue
All i can think of is adding validation layer during policy apply
If you have any better solution, please help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobkovilya any suggestions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lobkovilya any update here?
Introduce a new MADR that defines how Envoy’s zone-aware routing feature will be integrated into Kuma’s control plane.
What changed
Why
Implements MADR for #9161