This project provides a robust, minimal setup for a multi-cloud Kubernetes architecture spanning AWS and Google Cloud Platform (GCP). It uses Terraform for infrastructure provisioning, AWS Route 53 for DNS-based failover, and HashiCorp Consul for cross-cluster service discovery and service mesh capabilities.
The primary goal is to create a resilient Active-Passive setup where AWS serves as the primary region and GCP acts as a hot standby.
The architecture is divided into two main traffic patterns:
-
North-South Traffic (User to Application): Managed by AWS Route 53. User requests are directed to the primary AWS region (set as ap-south-1 in this example). If health checks for the AWS Application Load Balancer (ALB) fail, Route 53 automatically reroutes all traffic to the GCP Cloud Load Balancer.
-
East-West Traffic (Service-to-Service): Managed by HashiCorp Consul. A federated Consul service mesh is deployed across both Kubernetes clusters. This allows services to communicate seamlessly with each other across cloud boundaries using consistent DNS names (e.g.,
billing-api.service.consul
), regardless of which region is currently active.
graph TD
subgraph Internet
User
end
subgraph "AWS Route 53"
DNS[your-domain.com]
end
subgraph "AWS (Primary)"
AWS_ALB[Application LB] --> EKS[EKS Cluster]
end
subgraph "GCP (Secondary)"
GCP_LB[Cloud LB] --> GKE[GKE Cluster]
end
User -- North-South --> DNS
DNS -- Primary --> AWS_ALB
DNS -- Failover --> GCP_LB
EKS <-- Consul Federation --> GKE
├── consul/
│ ├── consul-aws-values.yaml # Helm values for the primary AWS Consul datacenter
│ └── consul-gcp-values.yaml # Helm values for the secondary GCP Consul datacenter
│
└── infra/
├── aws/ # Terraform root module for all AWS resources
├── gcp/ # Terraform root module for all GCP resources
├── dns/ # Terraform root module for Route 53 failover records
├── global/ # Terraform for global resources (Route 53 Zone, TF state backends)
├── modules/ # Reusable Terraform modules (EKS, GKE, VPC, etc.)
└── scripts/ # Helper scripts for running Terraform
Follow these steps to provision the entire multi-cloud environment.
- Tools:
- Terraform (
>= 1.0
) - AWS CLI (
>= 2.0
) - Google Cloud SDK (
gcloud
) kubectl
helm
- Terraform (
- Configuration:
- Ensure your AWS and GCP credentials are configured locally.
- Purchase a domain name and have it managed by AWS Route 53.
This project uses a bootstrapping approach for Terraform state management. The global
component is responsible for creating the remote backends (an S3 bucket in AWS and a GCS bucket in GCP) that will be used by all other components.
For this project, the global
component itself will store its state locally in a terraform.tfstate
file within the infra/global/
directory.
Note on Best Practices: In a production or team environment, it is highly recommended to create the backend resources manually and configure a remote backend for the global
component as well. This prevents the initial state from being tied to a single machine. However, for simplicity and ease of setup in this project, we use a local state for the initial bootstrapping.
No manual action is required here if you follow the scripts, but be aware of this design decision.
Use the provided scripts from the infra/scripts
directory to deploy the infrastructure. This ensures all components are planned and applied in the correct order of dependency.
-
Plan all changes:
./plan.sh staging
Review the generated plans in each component directory (
../aws
,../gcp
, etc.) to ensure the changes are expected. -
Apply all changes:
./apply.sh staging
This will execute the plans created in the previous step.
After provisioning, configure kubectl
to communicate with your new clusters.
- For EKS (AWS):
aws eks update-kubeconfig --region $(terraform -chdir=infra/aws output -raw region) --name $(terraform -chdir=infra/aws output -raw eks_cluster_name)
- For GKE (GCP):
gcloud container clusters get-credentials $(terraform -chdir=infra/gcp output -raw gke_cluster_name) --region $(terraform -chdir=infra/gcp output -raw region)
Consul is deployed using Helm after the infrastructure is ready. This process involves a few manual steps to securely link the two datacenters.
-
Get GCP Egress IPs: Find the static outbound IP addresses of your GCP cluster. This is used to firewall the AWS Consul endpoint.
terraform -chdir=infra/gcp output nat_egress_ips
Copy the output and replace the placeholder IPs in
consul/consul-aws-values.yaml
under theservice.beta.kubernetes.io/aws-load-balancer-source-ranges
annotation. -
Deploy Consul to AWS (Primary): Switch your
kubectl
context to the EKS cluster.helm repo add hashicorp https://helm.releases.hashicorp.com helm install consul hashicorp/consul --values consul/consul-aws-values.yaml --namespace consul --create-namespace
-
Get AWS Consul Load Balancer Address: Wait a few minutes for the AWS Load Balancer to be provisioned, then retrieve its address.
kubectl get service consul-server -n consul -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Copy the output address (e.g.,
xxxx.elb.us-east-1.amazonaws.com
). -
Update GCP Consul Config: Paste the AWS Load Balancer address into
consul/consul-gcp-values.yaml
, replacing the<REPLACE_WITH_AWS_LOAD_BALANCER_ADDRESS>
placeholder in theretry_join
block. -
Deploy Consul to GCP (Secondary): Switch your
kubectl
context to the GKE cluster.helm install consul-gcp hashicorp/consul --values consul/consul-gcp-values.yaml --namespace consul --create-namespace
Your multi-cloud service mesh is now operational. You can verify that the two datacenters are federated using the following methods.
-
Via the UI (from AWS cluster):
# Port-forward the Consul UI service kubectl port-forward service/consul-ui -n consul 8501:80
Navigate to
http://localhost:8501
in your browser. You should see bothaws
andgcp
datacenters in the dropdown menu at the top left. -
Via the CLI (from any server pod):
# Exec into a Consul server pod in the AWS cluster kubectl exec -it consul-server-0 -n consul -- /bin/sh # Run the member command to see servers in both datacenters consul members -wan
- Cloud Providers: AWS, Google Cloud Platform
- IaC: Terraform
- Container Orchestration: Amazon EKS, Google GKE
- DNS & Failover: AWS Route 53
- Service Mesh: HashiCorp Consul
- Databases: Amazon RDS (Postgres), Amazon ElastiCache (Redis)
- Load Balancing: AWS Application Load Balancer, GCP Global Cloud Load Balancer
- Ingress Controllers: AWS Load Balancer Controller (for EKS), GKE Ingress Controller