Skip to content

Latest commit

 

History

History

README.md

Multi-Cluster Istio Setup: Lessons from the Trenches

TL;DR: Setting up multi-cluster Istio connectivity is complex, but it's a game-changer for distributed microservices. Here's what we learned the hard way so you don't have to.

The Problem We Faced

Picture this: You're an SRE team managing a SaaS platform with 200+ microservices, and the business wants to expand to a new region on the opposite side of the world. The good news? 99% of your components are region-agnostic and just need a simple YAML change. The bad news? How do your application pods communicate back with services in the main cluster?

The Traditional Approach (And Why It Sucks)

The "obvious" solution was to expose services publicly:

  1. Deploy an ingress controller/load balancer
  2. Add DNS records to your provider
  3. Make your services internet-facing

What we learned: This approach is a security nightmare. Suddenly, your internal services need to be hardened for the wild internet, and you're dealing with additional attack vectors, compliance issues, and maintenance overhead.

The Replication Trap

Another tempting option was to replicate all 200+ microservices in the second cluster. Here's why we noped out of that:

  • Cost explosion: 2x infrastructure costs
  • Operational overhead: Maintaining 400+ microservices instead of 200+
  • Consistency nightmares: Keeping configurations, secrets, and deployments in sync across regions
  • SRE burnout: Our team would need to double in size

The Solution: Istio Multi-Cluster Magic

We discovered that Istio could give us exactly what we needed: seamless inter-cluster communication that feels like everything is running in the same cluster. No public exposure, no replication, just pure networking magic.

Our Implementation Journey

Step 1: Primary Cluster Setup

Pro tip: Label your namespaces properly from the start. We learned this the hard way when we had to redo our topology configuration.

kubectl label namespace istio-system topology.istio.io/network=production-vpc

Install Istio with a specific revision (trust me, you'll thank yourself later):

istioctl install --set revision=<revision-name> -f primary.yaml

Create the east-west gateway - this is where the magic happens:

bash gen-eastwest-gateway.sh --network production-vpc --revision <revision> | istioctl install --set revision=<new-revision> -y -f - --set values.meshConfig.defaultConfig.holdApplicationUntilProxyStarts=true --set values.cni.cniBinDir=/home/kubernetes/bin --readiness-timeout 30m

Expose istiod - this is crucial for cross-cluster communication:

sed 's/{{.Revision}}/<your-revision>/g' expose-istiod-with-rev.yaml | kubectl apply -n istio-system -f -

Step 2: Secondary Cluster Configuration

Label the secondary cluster with a different network topology:

kubectl label namespace istio-system topology.istio.io/network=secondary-vpc

Mark it as a remote control plane - this tells Istio not to deploy the full control plane:

kubectl annotate namespace istio-system topology.istio.io/controlPlaneClusters=Kubernetes

Install Istio with the remote profile - Make sure you use the same revision as the primary:

istioctl install --set revision=<same-revision-as-primary> -f secondary.yaml

Create the authentication secret - this is the handshake between clusters:

istioctl create-remote-secret \
    --context=<secondary-cluster-context> \
    --name=<secondary-cluster-name> | \
    kubectl apply -f - --context=<primary-cluster-context>

Deploy the east-west gateway in the secondary cluster:

bash gen-eastwest-gateway.sh --network secondary-vpc --revision <revision> | istioctl install --set revision=<new-revision> -y -f - --set values.meshConfig.defaultConfig.holdApplicationUntilProxyStarts=true --set values.cni.cniBinDir=/home/kubernetes/bin --readiness-timeout 30m

Expose services - this is the final piece that makes everything work:

kubectl apply -n istio-system -f expose-service.yaml

What We Learned (The Hard Way)

1. Revision Consistency is Critical

We initially used different Istio revisions between clusters. The result? Complete communication failure. Always use the same revision across all clusters.

2. Network Topology Labels Matter

We had to redo our entire setup because we didn't properly label our namespaces. Don't skip this step.

3. The East-West Gateway is Your Friend

This component handles all the cross-cluster traffic routing. Without it properly configured, you're just running two separate Istio instances.

The Result

After implementing this setup, our application pods in the secondary cluster could access services in the primary cluster using the same DNS names (abc.default) as if they were running locally. No public exposure, no replication, just pure networking magic.

Resources and Further Reading

Final Thoughts

Multi-cluster Istio setup is complex, but it's worth it. The initial investment in time and complexity pays off in operational simplicity, cost savings, and security improvements.

Remember: Start simple, test each step thoroughly, and don't rush the process. We learned that the hard way, and now you don't have to.


This guide is based on our real-world experience implementing multi-cluster Istio connectivity. If you run into issues, check the troubleshooting section of the official docs first - we've been there, and that's usually where the answers are.