Skip to content

Conversation

@slonka
Copy link
Contributor

@slonka slonka commented Jan 29, 2026

Motivation

Zone proxies are becoming mesh-scoped to enable MeshIdentity and policy support. This requires revisiting the deployment model.

Implementation

MADR 094 addresses:

  • Unified vs separate zone proxy deployments
  • Standalone deployment vs sidecar to fake container
  • Universal (VM) deployment model
  • Helm configuration structure and user flows (Konnect, self-hosted, unfederated)
  • Default installation behavior

Key decisions: unified zone proxy as Dataplane with kuma.io/zone-proxy-role label, standalone deployment, mesh-scoped resources.

Supporting documentation

  • docs/madr/decisions/090-zone-egress-identity.md

fix #9030

@slonka slonka added the ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change) label Jan 29, 2026
@github-actions
Copy link
Contributor

Reviewer Checklist

🔍 Each of these sections need to be checked by the reviewer of the PR 🔍:
If something doesn't apply please check the box and add a justification if the reason is non obvious.

  • Is the PR title satisfactory? Is this part of a larger feature and should be grouped using > Changelog?
  • PR description is clear and complete. It Links to relevant issue as well as docs and UI issues
  • This will not break child repos: it doesn't hardcode values (.e.g "kumahq" as an image registry)
  • IPv6 is taken into account (.e.g: no string concatenation of host port)
  • Tests (Unit test, E2E tests, manual test on universal and k8s)
    • Don't forget ci/ labels to run additional/fewer tests
  • Does this contain a change that needs to be notified to users? In this case, UPGRADE.md should be updated.
  • Does it need to be backported according to the backporting policy? (this GH action will add "backport" label based on these file globs, if you want to prevent it from adding the "backport" label use no-backport-autolabel label)

@slonka slonka marked this pull request as ready for review January 30, 2026 13:00
@slonka slonka requested a review from a team as a code owner January 30, 2026 13:00
@slonka slonka marked this pull request as draft January 30, 2026 13:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces MADR 094, which defines the deployment model for zone proxies as they transition from global-scoped to mesh-scoped resources. The architectural change addresses fundamental limitations with the current global zone proxy model, particularly around MeshIdentity issuance and policy application.

Changes:

  • Adds comprehensive architecture decision record for zone proxy deployment model
  • Proposes unified zone proxy approach where a single Dataplane type uses labels to determine ingress/egress/both capabilities
  • Defines deployment patterns for Kubernetes, Universal, Helm, Terraform, and Konnect workflows
  • Establishes defaults for Helm installations and migration paths from global to mesh-scoped zone proxies

@slonka slonka changed the title docs(MADR): initial commit of zone proxies new deployment model docs(MADR): zone proxies new deployment model Feb 2, 2026
@slonka
Copy link
Contributor Author

slonka commented Feb 9, 2026

General Q:

  • check what resources we need - do we need RBAC?

Automaat
Automaat previously approved these changes Feb 9, 2026
bartsmykla
bartsmykla previously approved these changes Feb 9, 2026
@slonka slonka marked this pull request as ready for review February 9, 2026 11:28
@slonka slonka dismissed stale reviews from bartsmykla and Automaat via 9196180 February 9, 2026 11:52
bartsmykla
bartsmykla previously approved these changes Feb 9, 2026
Automaat
Automaat previously approved these changes Feb 9, 2026
@slonka slonka dismissed stale reviews from Automaat and bartsmykla via c2f5f9e February 9, 2026 12:47

| Tooling Decision | Choice |
|------------------|--------|
| Per-mesh Services | **Yes** - each mesh gets its own Service/LoadBalancer for mTLS isolation |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What means Per-mesh Services? What kind of services ?

Copy link
Contributor Author

@slonka slonka Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes Services - specifically the LoadBalancer or NodePort Service that exposes the zone proxy to other zones. Today there's one global Service for zone ingress and one for zone egress. With mesh-scoped zone proxies, each mesh gets its own Service (e.g., kuma-payments-mesh-zoneproxy) because each mesh has a different mTLS CA, so they can't share a single LoadBalancer without complex SNI-based cert selection. This is detailed in the "Per-Mesh Services (Not Shared)" section below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just kubernetes specific what about universal not sure if naming Service is here a good option

| Tooling Decision | Choice |
|------------------|--------|
| Per-mesh Services | **Yes** - each mesh gets its own Service/LoadBalancer for mTLS isolation |
| Namespace placement | **kuma-system** |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be system namespace? In case of mulitple meshes is it possible to change it or we want to stick to the one only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single-mesh (this MADR's scope), kuma-system keeps everything together with the CP. For multi-mesh, standalone kuma-zone-proxy releases can target any namespace — the Helm --namespace flag controls that. We could add a dedicated namespace option per mesh later, but the single-mesh default of kuma-system keeps things simple. I'll clarify this in the namespace section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure there is transparent proxy disabled for a zoneproxy - even if it's enabled on the namespace we should have it disabled on the deployment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good because that assumes kuma-system namespace will be marked as a "meshed" namespace which we don't want right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also kuma-system is sort of meant to be admin realm here there's almost a Mesh operator persona comes in no?

Copy link
Contributor

@lahabana lahabana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a PoC for this?


2. **Cannot apply policies on zone proxies**: Kuma policies (MeshTrafficPermission, MeshTimeout, etc.) are mesh-scoped.
A global zone proxy cannot be targeted by mesh-specific policies, limiting observability and traffic control for cross-zone communication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Observability story is tricky (how do you share to a mesh a subset of all the observability)

**Scope of this document**: This MADR focuses on **deployment tooling** — how users deploy zone proxies via Helm, Konnect UI, and Terraform.

**Single-mesh focus**: This document assumes **single-mesh-per-zone as the default** deployment pattern.
For multi-mesh scenarios, deploy additional zone proxies using separate Helm releases with a dedicated `kuma-zone-proxy` chart. A multi-mesh deployment guide will be provided separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No let's not as we're just creating maintainance for a path we don't recommend.

I think for folks that NEED multi-mesh they can:

  • Just write their own deployment
  • Push for Mesh to be supported by KO

Our goal for K3 is really to streamline what folks can do. Adding a new helm short is a complete no go.

| Tooling Decision | Choice |
|------------------|--------|
| Per-mesh Services | **Yes** - each mesh gets its own Service/LoadBalancer for mTLS isolation |
| Namespace placement | **kuma-system** |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good because that assumes kuma-system namespace will be marked as a "meshed" namespace which we don't want right?

| Tooling Decision | Choice |
|------------------|--------|
| Per-mesh Services | **Yes** - each mesh gets its own Service/LoadBalancer for mTLS isolation |
| Namespace placement | **kuma-system** |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also kuma-system is sort of meant to be admin realm here there's almost a Mesh operator persona comes in no?

│ Mesh: [default ▼] │
│ (if multiple meshes detected there will be info here on │
│ how to handle that) │
└─────────────────────────────────────────────────────────┘
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if no mesh?

# mesh defaults to "default"
```

If the user selected a non-default mesh name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always do it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/skip-test PR: Don't run unit and e2e tests (maybe this is just a doc change)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants