|
1 | 1 | ---
|
2 |
| -title: Introducint Ambient Multicluster |
3 |
| -description: How Ambient Multicluster lets you connect multiple clusters in a single ambient mesh. |
4 |
| -publishdate: 08-04-2025 |
5 |
| -attribution: Steven Jin Xuan (Microsoft) |
| 2 | +title: Introducing multicluster support for ambient mode |
| 3 | +description: Introducing multicluster support for ambient mode |
| 4 | +date: 2025-08-04 |
| 5 | +attribution: Jackie Maertens (Microsoft), Keith Mattix (Microsoft), Mikhail Krinkin (Microsoft), Steven Jin (Microsoft) |
6 | 6 | keywords: [ambient,multicluster]
|
7 | 7 | ---
|
8 | 8 |
|
9 |
| -Multicluster has been one of the most requested Ambient features — and as of Istio 1.27, it’s now available in alpha. |
10 |
| -Ambient Multicluster enables secure, transparent communication between clusters using the same lightweight, modular architecture users already rely on. |
11 |
| -While still in alpha, this release delivers the core functionality of a multicluster mesh and lays the groundwork for a complete feature set in upcoming releases. |
| 9 | +Multicluster has been one of the most requested Ambient features — and as of Istio 1.27, it's now available. |
| 10 | +We sought to capture the benefits and avoid the complications of multicluster architectures using the same modular design that ambient users love. |
| 11 | +While still in alpha, this release delivers the core functionality of a multicluster mesh and lays the groundwork for a full feature set in upcoming releases. |
12 | 12 |
|
13 |
| -## Connectivity |
| 13 | +## Multicluster's Many Benefits (and Challenges) |
14 | 14 |
|
15 |
| -In a single Kubernetes cluster, every pod can directly connect to another pod via a pod or service through a unique IP address as per the [Kubernetes Network Model](https://kubernetes.io/docs/concepts/services-networking/). |
16 |
| -However, in a multicluster mesh, there is no guarantee that the IP address spaces of different clusters are disjoint. |
17 |
| -Even if it was, there is no guarantee that routing tables are set up to route from one cluster to another. |
18 |
| -In Ambient Multicluster, we connect clusters by deploying east-west gateways with globally routable IP addresses and by marking services as global. |
| 15 | +Multicluster architectures increase outage resilience, shrink the blast radiuses, |
| 16 | +ease adoption of data residence policies, and simplify cost tracking. |
| 17 | +That said, integrating multiple clusters poses connectivity, security, and operation hurdles. |
| 18 | + |
| 19 | +In a single Kubernetes cluster, every pod can directly connect to another pod via a pod IP or service VIP. |
| 20 | +However, in a multicluster deployment, there is no guarantee that the IP address spaces of different clusters are disjoint. |
| 21 | +Even if the spaces were disjoint, users would need to configure routing tables to route traffic from one cluster to another. |
| 22 | +Cross-cluster connectivity means that pod-to-pod traffic can leave cluster boundaries -- and that pods may accept connections from outside the cluster. |
| 23 | +Without care, an attacker could connect to a vulnerable pod, or sniff unencrypted traffic. |
| 24 | +All of this must be orchestrated through APIs that are both secure and simple enough to keep pace with ever-changing environments. |
| 25 | + |
| 26 | +## Key Components. |
| 27 | + |
| 28 | +Ambient multicluster extends ambient with new components and minimal APIs to |
| 29 | +securely connect clusters using the same lightweight, modular architecture of ambient. |
| 30 | + |
| 31 | +### East-West Gateways |
| 32 | + |
| 33 | +Each cluster deploys an East-West gateway with a globally routable IP that acts as an entrypoint for cross cluster communication. |
| 34 | +A ztunnel communicates across clusters by connecting to the east-west gateway and sending the destination service FQDN. |
| 35 | +The east-west gateway will then forward the connection to a cluster-local pod of its choosing. |
| 36 | +As such, we do not need to worry about overlapping IP spaces because we never directly address a pod in a remote cluster. |
| 37 | +Ambient multicluster achieves cross-cluster connectivity without changes to cluster connectivity. |
| 38 | + |
| 39 | +The east-west gateways are configured using GatewayAPI and controlled by istiod. |
| 40 | +By using these ambient and declarative APIs, there is no need to restart workloads, manage IP address spaces, or configure routing tables. |
| 41 | + |
| 42 | +### Double HBONE |
| 43 | + |
| 44 | +Ambient Multicluster uses nested [HBONE](https://istio.io/latest/docs/ambient/architecture/hbone/) connections to secure traffic traversing cluster boundaries to extend ambient's strong security. |
| 45 | +An outer HBONE connects the source ztunnel to its the east-west gateway while an inner HBONE tunnel extends the outer the connection to the destination. |
| 46 | +The outer HBONE connection encrypts cross cluster traffic, encrypts the destination service FQDN, and allows the east-west gateway to verify the source's identity. |
| 47 | +The inner HBONE connection encrypts traffic end-to-end, allowing for identity verification of the destination pod. |
| 48 | +Put together, the two HBONE layers stop unauthenticated access, protect against data sniffing, and still allow ztunnel to verify the destination’s identity. |
| 49 | +At the same time, it allows ztunnel to effectively reuse cross cluster connections, minimizing TLS handshakes. |
19 | 50 |
|
20 |
| -The `ServiceScope` API allows mesh administrators to mark which combinations of labels make a service global, |
21 |
| -and app developers can label their services accordingly. |
22 |
| -By default, services labeled `istio.io/global=true` are marked global. |
23 |
| -Then, `istiod` informs each ztunnel how many endpoints there are for each global service. |
24 |
| -If ztunnel decides to send traffic to a remote cluster, then it will direct the traffic to the remote cluster's east-west gateway |
25 |
| -and the east-west gateway will pick the destination pod. |
26 |
| -This architecture obviates the need for ztunnel to know about every pod in the mesh, while still providing enough information for ztunnel to load balance across clusters. |
27 |
| - |
28 |
| -By default ztunnel will load balance traffic uniformly across all clusters, |
29 |
| -but you can control the load balancing behavior of a service with its [`trafficDistribution`](https://kubernetes.io/docs/concepts/services-networking/service/#traffic-distribution). |
30 |
| - |
31 |
| -## Security |
32 |
| - |
33 |
| -In both Sidecar and Ambient Multicluster, proxies send traffic to east-west gateways indicating the destination service, and the east-west gateway picks the destination pod. |
34 |
| -Sidecar mode indicates the destination service using TLS SNI. |
35 |
| -Not only does this communicate the destination service with no encryption, |
36 |
| -there is no way for the east-west gateway to apply identity-based policy at the edge of your cluster. |
37 |
| - |
38 |
| -Rather than relying on SNI tricks, Ambient Multicluster uses nested [HBONE](https://istio.io/latest/docs/ambient/architecture/hbone/) connections to enable cross-cluster connectivity. |
39 |
| -We first establish an outer HBONE connection to the east-west gateway. |
40 |
| -Then, within the outer HBONE connection we create an inner HBONE connection that the east-west gateway forwards opaquely to the destination ztunnel of its choosing. |
41 |
| - |
42 |
| -Since the client ztunnel participates in two mTLS (once with the east-west gateway, and once with the destination ztunnel), identity is enforced both at the edge of the cluster and the destination. |
43 |
| -As such, non-mesh traffic cannot enter clusters through east-west gateways. |
44 |
| -Also, since ztunnel communicates the destination service in HBONE, it is invisible to outside observers. |
45 |
| -Further, HBONE allows us to reuse TLS connections between ztunnel proxies and east-west gateways (already implemented) as well as between ztunnel proxies in different clusters (to be implemented), thus reducing the total number of TCP/TLS handshakes and identity verification steps. |
46 | 51 | The one drawback is that we encrypt application data twice (once for the outer HBONE and once for the inner HBONE).
|
47 |
| -We found this to be an acceptable drawback because it allows us to stick with open standards, and we expect the extra encryption to be negligible compared to the cost of sending data across clusters. |
| 52 | +We found this to be an acceptable drawback because it allows us to stick with open standards, and we expect the extra encryption to be negligible compared to the cost of sending data across clusters. |
48 | 53 |
|
49 |
| -## Sameness |
| 54 | +{{< image link="./mc-ambient-traffic-flow.png" caption="Istio Ambient Multicluster traffic Flow" >}} |
50 | 55 |
|
51 |
| -Even though clusters in a multicluster mesh need not be identical, we do require some uniformity across clusters. |
52 |
| -Some requirements are necessary for two clusters to function in the same mesh, |
53 |
| -while others only exist because of Ambient Multicluster's alpha state. |
| 56 | +### ServiceScope API |
54 | 57 |
|
55 |
| -### Identity |
| 58 | +Once clusters are securely connected, marking services as global to allow cross cluster communication, |
| 59 | +the `ServiceScope` API allows mesh administrators to mark which combinations of labels make a service global, |
| 60 | +and app developers can label their services accordingly. |
| 61 | +A global service is one has endpoints in all clusters and can be accessed from any cluster. |
| 62 | +The default `ServiceScope` is |
56 | 63 |
|
57 |
| -Since a core feature of double HBONE is allowing identity verification at the east-west gateway, we must define how identities change across cluster boundaries. |
58 |
| -Ambient Multicluster adopts {{< gloss "namespace sameness" >}}namespace sameness{{< /gloss >}} just like the rest of Istio. |
59 |
| -This means that the same identity is indistinguishable across clusters. |
60 |
| -Cluster boundaries have no effect on identity. |
61 |
| -We have no plans on departing from namespace sameness in any future releases. |
| 64 | +{{< text yaml >}} |
| 65 | + serviceScopeConfigs: |
| 66 | + - servicesSelector: |
| 67 | + matchExpressions: |
| 68 | + - key: istio.io/global |
| 69 | + operator: In |
| 70 | + values: ["true"] |
| 71 | + scope: GLOBAL |
| 72 | +{{< /text >}} |
62 | 73 |
|
63 |
| -### Service configuration |
| 74 | +meaning that any service with the `istio.io/global=true` label is global. |
| 75 | +Although the default value is straightforward, the API is flexible and can express complex conditions using a mix of ANDs and ORs. |
64 | 76 |
|
65 |
| -For our alpha release, we require all services and service entries to have the exact same configuration across clusters. |
66 |
| -Notably, waypoint configuration also has to be uniform. |
| 77 | +By default, ztunnel will load balance traffic uniformly across clusters, but this can be configured using the service's `trafficDistribution` field to only reach across clusters when there are no local endpoints. |
| 78 | +Thus users have control over whether and when traffic crosses cluster boundaries. |
67 | 79 |
|
68 |
| -One question we struggled with was that of where cross cluster traffic should traverse a waypoint. |
69 |
| -When sending cross cluster traffic to a service with a waypoint, should traffic traverse a waypoint in the client's cluster or the destination's cluster? |
70 |
| -Traversing waypoints in the client's cluster allows us to apply policies such as L7 cross-cluster failover. |
71 |
| -On the other hand, traversing waypoints in the destination cluster allows enforcing the destination cluster's L7 policy. |
72 |
| -Ultimately, we decided on the latter for our alpha release to avoid any authorization policy-related surprises. |
| 80 | +## Limitations and Roadmap |
73 | 81 |
|
74 |
| -There are many other nuances on how we apply L7 policy and how to handle cross-cluster configuration skew. |
75 |
| -That said, we are actively looking for ways to loosen these requirements and support L7 policy to be applied in the client cluster. |
76 |
| -This should ease the setup process of Ambient and allow for gradual configuration rollouts without the risk of undefined behavior. |
| 82 | +Although the current implementation of ambient multicluster has strong security and the basic feature set of a multicluster product, |
| 83 | +there is still a lot of work to be done. |
77 | 84 |
|
78 |
| -### Meshconfig |
| 85 | +For example, currently, we require that global services, attached waypoints, and serviceScope configuration have uniform configuration across all clusters. |
| 86 | +Although this greatly simplified our alpha implementation, we are looking to increase flexibility by allowing for more configuration skew. |
79 | 87 |
|
80 |
| -Given that we have multiple clusters in a single mesh, we assume that MeshConfig is uniform across clusters. |
81 |
| -Crucially, this assumption means that `ServiceScope` must be uniform across clusters, since `ServiceScope` is part of MeshConfig. |
82 |
| -In other words, the criteria for a service to be marked as global must be the same in all clusters. |
83 |
| -If we also consider the fact that all services must share the same configuration, services are marked global in every cluster, or no cluster. |
84 |
| -As with service configuration, we are exploring ways to loosen Meshconfig sameness requirements and more fine-grained ways of marking services global. |
| 88 | +Similarly, waypoints and L7 policy enforcement have proven difficult since different clusters might have different policy. |
| 89 | +In our alpha implementation, if a service has a waypoint, it will go through said waypoint in the destination cluster. |
| 90 | +This reduces unexpected surprises by enforcing the destination cluster's L7 authorization policy, but does take away the ability to perform L7 cross-cluster failover. |
| 91 | +Eventually, we would like to also apply L7 policy in the source cluster, but this is not yet implemented. |
85 | 92 |
|
86 |
| -## Looking ahead |
| 93 | +We are also looking to improve our reference documentation, guides, testing, and performance as well as thinking about deployment models other than multi-primary. |
87 | 94 |
|
88 |
| -Other than allowing configuration skew across clusters, there is a lot of work to do to promote Ambient Multicluster to beta. |
89 |
| -We are looking to improve our reference documentation, guides, testing, and performance. |
90 |
| -We are also thinking about deployment models other than multi-primary. |
91 | 95 | If you would like to try out Ambient Multicluster, please follow [this guide](TODO).
|
92 | 96 | Since many details are in discussion, we would love to hear any of your thoughts, comments, and use cases.
|
93 |
| -You can contact us through [Slack](TODO) or [GitHub](TODO). |
| 97 | +You can find ways to reach us on the [Istio community page](https://istio.io/latest/about/community/). |
0 commit comments