You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/blog/2025/ambient-multicluster/index.md
+17-23Lines changed: 17 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,9 @@ attribution: Steven Jin Xuan (Microsoft)
6
6
keywords: [ambient,multicluster]
7
7
---
8
8
9
-
One of the most requested Ambient features is multicluster support.
10
-
We are excited to announce multicluster support as of Istio 1.27.0!
11
-
Our goal for Ambient Multicluster was to allow users to connect, control, secure, and observe multiple clusters
12
-
with the same modular architecture that Ambient users love.
13
-
While the current implementation's maturity is alpha, it offers the core functionality of a multicluster mesh while setting us up to provide a complete feature set in future releases.
9
+
Multicluster has been one of the most requested Ambient features — and as of Istio 1.27, it’s now available in alpha.
10
+
Ambient Multicluster enables secure, transparent communication between clusters using the same lightweight, modular architecture users already rely on.
11
+
While still in alpha, this release delivers the core functionality of a multicluster mesh and lays the groundwork for a complete feature set in upcoming releases.
14
12
15
13
## Connectivity
16
14
@@ -19,7 +17,7 @@ However, in a multicluster mesh, there is no guarantee that the IP address space
19
17
Even if it was, there is no guarantee that routing tables are set up to route from one cluster to another.
20
18
In Ambient Multicluster, we connect clusters by deploying east-west gateways with globally routable IP addresses and by marking services as global.
21
19
22
-
The `ServiceScope` API allows mesh administrators to mark which combinations of labels makes a service global,
20
+
The `ServiceScope` API allows mesh administrators to mark which combinations of labels make a service global,
23
21
and app developers can label their services accordingly.
24
22
By default, services labeled `istio.io/global=true` are marked global.
25
23
Then, `istiod` informs each ztunnel how many endpoints there are for each global service.
@@ -32,20 +30,17 @@ but you can control the load balancing behavior of a service with its [`trafficD
32
30
33
31
## Security
34
32
35
-
In both Sidecar and Ambient Multicluster, proxies send traffic to east-west gateways indicating the destination service, and it the east-west gateway picks the destination pod.
33
+
In both Sidecar and Ambient Multicluster, proxies send traffic to east-west gateways indicating the destination service, and the east-west gateway picks the destination pod.
36
34
Sidecar mode indicates the destination service using TLS SNI.
37
35
Not only does this communicate the destination service with no encryption,
38
36
there is no way for the east-west gateway to apply identity-based policy at the edge of your cluster.
39
37
40
38
Rather than relying on SNI tricks, Ambient Multicluster uses nested [HBONE](https://istio.io/latest/docs/ambient/architecture/hbone/) connections to enable cross-cluster connectivity.
41
-
First, the client ztunnel establishes an outer HBONE connection to the remote cluster's east-west gateway, allowing both the client ztunnel and the east-west gateway to verify each others identity.
42
-
The ztunnel then creates an HTTP2 CONNECT stream in the outer HBONE connection with an `authority` of the destination service.
43
-
Using the authority of the HTTP2 CONNECT stream, the east-west gateway picks the destination and opaquely forwards the stream.
44
-
The source ztunnel then uses the HTTP2 CONNECT stream to establish an inner HBONE connection with the destination ztunnel allowing the ztunnels to verify each other's identities.
45
-
One last HTTP2 CONNECT stream is established to send plaintext data between the source and destination pods.
46
-
47
-
Since there are two TLS handshakes (one per HBONE connection), identity is enforced both at the edge of the cluster and the destination ztunnel.
48
-
As such, non mesh traffic cannot enter clusters through east-west gateways.
39
+
We first establish an outer HBONE connection to the east-west gateway.
40
+
Then, within the outer HBONE connection we create an inner HBONE connection that the east-west gateway forwards opaquely to the destination ztunnel of its choosing.
41
+
42
+
Since the client ztunnel participates in two mTLS (once with the east-west gateway, and once with the destination ztunnel), identity is enforced both at the edge of the cluster and the destination.
43
+
As such, non-mesh traffic cannot enter clusters through east-west gateways.
49
44
Also, since ztunnel communicates the destination service in HBONE, it is invisible to outside observers.
50
45
Further, HBONE allows us to reuse TLS connections between ztunnel proxies and east-west gateways (already implemented) as well as between ztunnel proxies in different clusters (to be implemented), thus reducing the total number of TCP/TLS handshakes and identity verification steps.
51
46
The one drawback is that we encrypt application data twice (once for the outer HBONE and once for the inner HBONE).
@@ -73,21 +68,20 @@ Notably, waypoint configuration also has to be uniform.
73
68
One question we struggled with was that of where cross cluster traffic should traverse a waypoint.
74
69
When sending cross cluster traffic to a service with a waypoint, should traffic traverse a waypoint in the client's cluster or the destination's cluster?
75
70
Traversing waypoints in the client's cluster allows us to apply policies such as L7 cross-cluster failover.
76
-
On the other hand, traversing waypoints in the destination cluster allows applying L7 authorization policy as configured in the destination.
71
+
On the other hand, traversing waypoints in the destination cluster allows enforcing the destination cluster's L7 policy.
77
72
Ultimately, we decided on the latter for our alpha release to avoid any authorization policy-related surprises.
78
73
79
-
The aforementioned service sameness requirements and waypoint implementation are negotiable.
80
-
In upcoming releases, we are working to define behavior when these configurations differ across clusters
74
+
There are many other nuances on how we apply L7 policy and how to handle cross-cluster configuration skew.
75
+
That said, we are actively looking for ways to loosen these requirements and support L7 policy to be applied in the client cluster.
81
76
This should ease the setup process of Ambient and allow for gradual configuration rollouts without the risk of undefined behavior.
82
-
As for waypoints and L7 policy, we do plan on supporting L7 policies such as cross-cluster failover, though the exact design is still unknown.
83
77
84
78
### Meshconfig
85
79
86
-
Given that we have multiple clusters in a single mesh, we assume that MeshConfig are uniform across clusters.
80
+
Given that we have multiple clusters in a single mesh, we assume that MeshConfig is uniform across clusters.
87
81
Crucially, this assumption means that `ServiceScope` must be uniform across clusters, since `ServiceScope` is part of MeshConfig.
88
-
In other words, the criteria for a service to be marked as global must the same in all clusters.
82
+
In other words, the criteria for a service to be marked as global must be the same in all clusters.
89
83
If we also consider the fact that all services must share the same configuration, services are marked global in every cluster, or no cluster.
90
-
As with service configuration, we exploring ways to loosen Meshconfig sameness requirements and more fine-grained of marking services global.
84
+
As with service configuration, we are exploring ways to loosen Meshconfig sameness requirements and more fine-grained ways of marking services global.
91
85
92
86
## Looking ahead
93
87
@@ -96,4 +90,4 @@ We are looking to improve our reference documentation, guides, testing, and perf
96
90
We are also thinking about deployment models other than multi-primary.
97
91
If you would like to try out Ambient Multicluster, please follow [this guide](TODO).
98
92
Since many details are in discussion, we would love to hear any of your thoughts, comments, and use cases.
99
-
You can contact us here.
93
+
You can contact us through [Slack](TODO) or [GitHub](TODO).
0 commit comments