Skip to content

Commit 64f9f95

Browse files
committed
first draft
1 parent eb15ebf commit 64f9f95

File tree

1 file changed

+99
-0
lines changed

1 file changed

+99
-0
lines changed
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
---
2+
title: Introducint Ambient Multicluster
3+
description: How Ambient Multicluster lets you connect multiple clusters in a single ambient mesh.
4+
publishdate: 08-04-2025
5+
attribution: Steven Jin Xuan (Microsoft)
6+
keywords: [ambient,multicluster]
7+
---
8+
9+
One of the most requested Ambient features is multicluster support.
10+
We are excited to announce multicluster support as of Istio 1.27.0!
11+
Our goal for Ambient Multicluster was to allow users to connect, control, secure, and observe multiple clusters
12+
with the same modular architecture that Ambient users love.
13+
While the current implementation's maturity is alpha, it offers the core functionality of a multicluster mesh while setting us up to provide a complete feature set in future releases.
14+
15+
## Connectivity
16+
17+
In a single Kubernetes cluster, every pod can directly connect to another pod via a pod or service through a unique IP address as per the [Kubernetes Network Model](https://kubernetes.io/docs/concepts/services-networking/).
18+
However, in a multicluster mesh, there is no guarantee that the IP address spaces of different clusters are disjoint.
19+
Even if it was, there is no guarantee that routing tables are set up to route from one cluster to another.
20+
In Ambient Multicluster, we connect clusters by deploying east-west gateways with globally routable IP addresses and by marking services as global.
21+
22+
The `ServiceScope` API allows mesh administrators to mark which combinations of labels makes a service global,
23+
and app developers can label their services accordingly.
24+
By default, services labeled `istio.io/global=true` are marked global.
25+
Then, `istiod` informs each ztunnel how many endpoints there are for each global service.
26+
If ztunnel decides to send traffic to a remote cluster, then it will direct the traffic to the remote cluster's east-west gateway
27+
and the east-west gateway will pick the destination pod.
28+
This architecture obviates the need for ztunnel to know about every pod in the mesh, while still providing enough information for ztunnel to load balance across clusters.
29+
30+
By default ztunnel will load balance traffic uniformly across all clusters,
31+
but you can control the load balancing behavior of a service with its [`trafficDistribution`](https://kubernetes.io/docs/concepts/services-networking/service/#traffic-distribution).
32+
33+
## Security
34+
35+
In both Sidecar and Ambient Multicluster, proxies send traffic to east-west gateways indicating the destination service, and it the east-west gateway picks the destination pod.
36+
Sidecar mode indicates the destination service using TLS SNI.
37+
Not only does this communicate the destination service with no encryption,
38+
there is no way for the east-west gateway to apply identity-based policy at the edge of your cluster.
39+
40+
Rather than relying on SNI tricks, Ambient Multicluster uses nested [HBONE](https://istio.io/latest/docs/ambient/architecture/hbone/) connections to enable cross-cluster connectivity.
41+
First, the client ztunnel establishes an outer HBONE connection to the remote cluster's east-west gateway, allowing both the client ztunnel and the east-west gateway to verify each others identity.
42+
The ztunnel then creates an HTTP2 CONNECT stream in the outer HBONE connection with an `authority` of the destination service.
43+
Using the authority of the HTTP2 CONNECT stream, the east-west gateway picks the destination and opaquely forwards the stream.
44+
The source ztunnel then uses the HTTP2 CONNECT stream to establish an inner HBONE connection with the destination ztunnel allowing the ztunnels to verify each other's identities.
45+
One last HTTP2 CONNECT stream is established to send plaintext data between the source and destination pods.
46+
47+
Since there are two TLS handshakes (one per HBONE connection), identity is enforced both at the edge of the cluster and the destination ztunnel.
48+
As such, non mesh traffic cannot enter clusters through east-west gateways.
49+
Also, since ztunnel communicates the destination service in HBONE, it is invisible to outside observers.
50+
Further, HBONE allows us to reuse TLS connections between ztunnel proxies and east-west gateways (already implemented) as well as between ztunnel proxies in different clusters (to be implemented), thus reducing the total number of TCP/TLS handshakes and identity verification steps.
51+
The one drawback is that we encrypt application data twice (once for the outer HBONE and once for the inner HBONE).
52+
We found this to be an acceptable drawback because it allows us to stick with open standards, and we expect the extra encryption to be negligible compared to the cost of sending data across clusters.
53+
54+
## Sameness
55+
56+
Even though clusters in a multicluster mesh need not be identical, we do require some uniformity across clusters.
57+
Some requirements are necessary for two clusters to function in the same mesh,
58+
while others only exist because of Ambient Multicluster's alpha state.
59+
60+
### Identity
61+
62+
Since a core feature of double HBONE is allowing identity verification at the east-west gateway, we must define how identities change across cluster boundaries.
63+
Ambient Multicluster adopts {{< gloss "namespace sameness" >}}namespace sameness{{< /gloss >}} just like the rest of Istio.
64+
This means that the same identity is indistinguishable across clusters.
65+
Cluster boundaries have no effect on identity.
66+
We have no plans on departing from namespace sameness in any future releases.
67+
68+
### Service configuration
69+
70+
For our alpha release, we require all services and service entries to have the exact same configuration across clusters.
71+
Notably, waypoint configuration also has to be uniform.
72+
73+
One question we struggled with was that of where cross cluster traffic should traverse a waypoint.
74+
When sending cross cluster traffic to a service with a waypoint, should traffic traverse a waypoint in the client's cluster or the destination's cluster?
75+
Traversing waypoints in the client's cluster allows us to apply policies such as L7 cross-cluster failover.
76+
On the other hand, traversing waypoints in the destination cluster allows applying L7 authorization policy as configured in the destination.
77+
Ultimately, we decided on the latter for our alpha release to avoid any authorization policy-related surprises.
78+
79+
The aforementioned service sameness requirements and waypoint implementation are negotiable.
80+
In upcoming releases, we are working to define behavior when these configurations differ across clusters
81+
This should ease the setup process of Ambient and allow for gradual configuration rollouts without the risk of undefined behavior.
82+
As for waypoints and L7 policy, we do plan on supporting L7 policies such as cross-cluster failover, though the exact design is still unknown.
83+
84+
### Meshconfig
85+
86+
Given that we have multiple clusters in a single mesh, we assume that MeshConfig are uniform across clusters.
87+
Crucially, this assumption means that `ServiceScope` must be uniform across clusters, since `ServiceScope` is part of MeshConfig.
88+
In other words, the criteria for a service to be marked as global must the same in all clusters.
89+
If we also consider the fact that all services must share the same configuration, services are marked global in every cluster, or no cluster.
90+
As with service configuration, we exploring ways to loosen Meshconfig sameness requirements and more fine-grained of marking services global.
91+
92+
## Looking ahead
93+
94+
Other than allowing configuration skew across clusters, there is a lot of work to do to promote Ambient Multicluster to beta.
95+
We are looking to improve our reference documentation, guides, testing, and performance.
96+
We are also thinking about deployment models other than multi-primary.
97+
If you would like to try out Ambient Multicluster, please follow [this guide](TODO).
98+
Since many details are in discussion, we would love to hear any of your thoughts, comments, and use cases.
99+
You can contact us here.

0 commit comments

Comments
 (0)