This directory contains the complete Terraform infrastructure for deploying a high-availability NestJS application using AWS Elastic Kubernetes Service (EKS). The implementation follows Infrastructure as Code (IaC) principles with modular, reusable components and leverages Kubernetes for container orchestration.
| Document | Description |
|---|---|
| Prerequisites and Setup | Complete guide for first-time configuration, AWS credentials setup, kubectl/helm installation, and domain configuration |
| AWS Resources Deep Dive | In-depth technical documentation of all AWS resources, module architecture, IAM policies, and security groups (consolidated) |
| CI/CD Workflows | GitHub Actions workflows for automated deployment and teardown |
| Terraform Testing | Testing framework, troubleshooting guide (including Kubernetes provider issues), and detailed test suite documentation |
| Kubernetes Basics | Introduction to Kubernetes concepts, AWS EKS specifics, and learning resources |
Required Setup: Before deploying this infrastructure, you must configure several variables and files with your own values. See the Prerequisites and Setup Guide for:
- AWS account and domain requirements
- S3 backend configuration
- Project name and environment setup
- kubectl and Helm installation
- GitHub secrets for CI/CD
- DNS configuration steps
- High-Level Overview
- Key Components
- Environment Configuration Differences
- CI/CD Workflows
- Terraform Testing
- Project Structure
The EKS infrastructure implements a production-ready, highly available Kubernetes platform on AWS. The architecture leverages managed Kubernetes services (EKS), AWS Load Balancer Controller for native ALB integration, and Horizontal Pod Autoscaling for dynamic workload management.
Internet
↓
[Route 53] → Points to ALB DNS (created by Ingress)
↓
[Application Load Balancer (ALB)] ← Created by AWS Load Balancer Controller
↓ (HTTPS:443 / HTTP:80→HTTPS)
[Kubernetes Service] ← ClusterIP, load balanced internally
↓
[Kubernetes Pods] (running containers)
↓ Scheduled on
[EC2 Worker Nodes] (in private subnets)
↓ Managed by
[EKS Node Group with Auto Scaling]
↓ Part of
[EKS Cluster] (Managed Kubernetes Control Plane)
- Inbound Traffic: User requests hit Route 53 → ALB (validates SSL certificate) → Kubernetes Ingress → Kubernetes Service → Pods on worker nodes
- Outbound Traffic: Pods → NAT Gateway (in public subnets) → Internet Gateway → Internet
- High Availability: Multi-AZ deployment for both control plane and worker nodes
- Security: Private subnets for compute, security groups with least-privilege access, IAM roles for service accounts (IRSA)
- Scalability: Cluster Autoscaler for worker nodes, Horizontal Pod Autoscaler (HPA) for pods
- Kubernetes-Native: Ingress resources for ALB management, native Kubernetes service discovery and load balancing
- Modularity: Reusable Terraform modules following Single Responsibility Principle
- Environment Flexibility: Configuration-driven differences between dev and prod environments
┌─────────────────────────────────────────────────────────────────┐
│ AWS Cloud │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ VPC (Shared) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Public │ │ Public │ │ │
│ │ │ Subnet 1 │ │ Subnet 2 │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │
│ │ │ │ ALB │ │ │ │ ALB │ │ │ │
│ │ │ │ (EKS) │ │ │ │ (EKS) │ │ │ │
│ │ │ └────────┘ │ │ └────────┘ │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Private │ │ Private │ │ │
│ │ │ Subnet 1 │ │ Subnet 2 │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │
│ │ │ │ EKS │ │ │ │ EKS │ │ │ │
│ │ │ │ Node │ │ │ │ Node │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ [Pods] │ │ │ │ [Pods] │ │ │ │
│ │ │ └────────┘ │ │ └────────┘ │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────┐ ┌────────────┐ ┌─────────────────────────┐ │
│ │ ECR │ │ ACM │ │ EKS Control Plane │ │
│ │ (Shared) │ │ (Shared) │ │ (EKS-Specific) │ │
│ └────────────┘ └────────────┘ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
| AWS/Kubernetes Component | Role in the Architecture |
|---|---|
| Route 53 & ACM | The Route 53 Hosted Zone manages DNS records. AWS Certificate Manager (ACM) provides and validates the SSL certificate, which is attached to the ALB (created by Ingress) for secure HTTPS communication. |
| Application Load Balancer (ALB) | Dynamically created by AWS Load Balancer Controller when Kubernetes Ingress resources are deployed. Distributes incoming traffic, listens on Port 443 (HTTPS), and redirects Port 80 (HTTP) to HTTPS. Routes traffic directly to pod IPs when using target-type: ip. |
| Kubernetes Ingress | Defines HTTP/HTTPS routing rules from outside the cluster to Services. When an Ingress is created, the AWS Load Balancer Controller automatically provisions an ALB with the specified configuration (SSL certificate, listeners, routing rules). |
| Kubernetes Service | Provides a stable internal endpoint (ClusterIP) for a set of pods. Acts as an internal load balancer within the cluster, distributing traffic across healthy pods based on their readiness probe status. |
| Kubernetes Pod | The smallest deployable unit containing one or more containers. Pods are scheduled on worker nodes and receive VPC IP addresses via AWS VPC CNI plugin, enabling direct communication with ALB and other AWS services. |
| Kubernetes Deployment | Manages the desired state of pod replicas, handles rolling updates, rollbacks, and ensures the specified number of pods are always running. Works with HPA for automatic scaling. |
| Horizontal Pod Autoscaler (HPA) | Automatically scales the number of pod replicas based on CPU/memory utilization or custom metrics. Queries metrics from Kubernetes Metrics Server and adjusts Deployment replica count. |
| EKS Cluster (Control Plane) | AWS-managed Kubernetes control plane running across multiple AZs. Includes the API server, etcd, scheduler, and controller manager. Provides the Kubernetes API endpoint that kubectl and other tools communicate with. |
| EKS Node Group | Manages EC2 worker nodes that run Kubernetes pods. Uses Auto Scaling Groups for dynamic scaling based on pod resource requests. Nodes are deployed in private subnets with security groups controlling access. |
| AWS Load Balancer Controller | Kubernetes controller (deployed via Helm) that watches for Ingress/Service resources and automatically creates/manages ALBs, target groups, and listeners in AWS. Uses IAM Roles for Service Accounts (IRSA) for AWS API access. |
| Virtual Private Cloud (VPC) | Provides isolated network infrastructure with EKS-specific subnet tags. Public subnets host ALBs and NAT Gateways. Private subnets host worker nodes. Tags like kubernetes.io/role/elb enable automatic subnet discovery by the Load Balancer Controller. |
| Elastic Container Registry (ECR) | A private Docker registry storing application container images with lifecycle policies for automated cleanup. Images are pulled by worker nodes during pod deployment. |
For detailed technical documentation on module architecture, IAM policies (consolidated section with all 3 roles), security groups, and resource configurations, see AWS Resources Deep Dive.
For Kubernetes fundamentals and AWS EKS-specific concepts, see Kubernetes Basics.
The infrastructure supports two environments (dev and prod) with configuration-driven differences to balance cost, performance, and reliability.
| Component | Setting | dev | prod | Rationale |
|---|---|---|---|---|
| VPC | NAT Gateway | Single (single_nat_gateway = true) | Multiple (one per AZ) | Cost savings in dev; high availability in prod |
| ECR | Tagged Image Retention | 3 images | 10 images | Minimal storage in dev; deeper rollback history in prod |
| EKS Node Group | Instance Type | t3.small | t3.medium | Lower cost in dev; more capacity in prod |
| EKS Node Group | Min Nodes | 1 | 2 | Lower baseline cost in dev; always-on capacity in prod |
| EKS Node Group | Max Nodes | 5 | 10 | Limited scaling in dev; room for growth in prod |
| EKS Node Group | Desired Count | 2 | 3 | Minimum for HA in dev; baseline capacity in prod |
| EKS Node Group | Capacity Type | SPOT | ON_DEMAND | Cost savings in dev; reliability in prod |
| EKS Node Group | Disk Size | 20GB | 40GB | Minimal storage in dev; more cache in prod |
| K8s Deployment | Replicas | 2 | 3 | Lower baseline in dev; HA in prod |
| K8s Deployment | CPU Request | 50m | 250m | Minimal resources in dev; proper allocation in prod |
| K8s Deployment | CPU Limit | 250m | 500m | Lower ceiling in dev; more burst capacity in prod |
| K8s Deployment | Memory Request | 128Mi | 512Mi | Minimal memory in dev; proper allocation in prod |
| K8s Deployment | Memory Limit | 512Mi | 1024Mi | Lower ceiling in dev; more headroom in prod |
| HPA | Min Replicas | 2 | 3 | Lower baseline in dev; HA in prod |
| HPA | Max Replicas | 5 | 10 | Limited scaling in dev; more capacity in prod |
| Route 53 | force_destroy | true | false | Allow cleanup in dev; protect domain in prod |
Configuration Files:
infra-eks/deployment/common.tfvars- Common variablesinfra-eks/deployment/app/eks_node_group/vars.tf- Node group defaultsinfra-eks/deployment/app/k8s_app/vars.tf- Application defaults
Example from common.tfvars:
environment = "prod"
project_name = "myapp"
# VPC Configuration
single_nat_gateway = {
dev = true
prod = false
}
# ECR Configuration
image_retention_max_count = {
dev = 3
prod = 10
}Example from eks_node_group/vars.tf:
variable "instance_type" {
type = map(string)
default = {
dev = "t3.small"
prod = "t3.medium"
}
}
variable "capacity_type" {
type = map(string)
default = {
dev = "SPOT"
prod = "ON_DEMAND"
}
}All infrastructure deployment and teardown is managed through GitHub Actions workflows located in .github/workflows/eks/. These workflows automate Terraform operations in a dependency-aware order.
Deployment Stages:
- Initial Setup (Manual) - Deploy S3 state bucket and Route53 hosted zone
- Full Deployment (On push to main) - Complete infrastructure from ECR to running Kubernetes application (11 steps)
- Teardown (Manual) - Clean removal in reverse dependency order
eks-deploy-hosted-zone.yaml- One-time setup of foundational infrastructureeks-deploy-aws-infra.yaml- Full deployment pipeline (ECR → SSL → Docker image → VPC → EKS Cluster → Node Group → AWS LB Controller → K8s App → Routing)eks-destroy-aws-infra.yaml- Application infrastructure teardown (includes cleanup of orphaned ALBs/SGs)eks-destroy-hosted-zone.yaml- DNS and state storage cleanup
AWS_ACCESS_KEY_ID- AWS IAM user access keyAWS_SECRET_ACCESS_KEY- AWS IAM user secret key
- kubectl Configuration: Workflows configure kubectl access to EKS cluster using AWS CLI
- Helm Installation: AWS Load Balancer Controller is installed via Terraform Helm provider
- Ingress Wait Time: Deployment waits 5-10 minutes for ALB provisioning after Ingress creation
- Manual Ingress Deletion: During teardown, Ingress resources are manually deleted to ensure proper ALB cleanup
For detailed workflow documentation, job sequences, manual steps, and troubleshooting, see CI/CD Workflows.
cd infra-eks/
chmod +x run-tests.sh
./run-tests.shWhat It Does:
- Runs all
.tftest.hclfiles intests/unit/usingterraform test - Tests run in plan mode (no real AWS resources created)
- Uses mock AWS credentials
- Outputs test results to
test.log
The test suite validates 7 core modules:
- ecr.tftest.hcl - Repository configuration, lifecycle policies, image scanning
- ssl.tftest.hcl - Certificate configuration, DNS validation, SANs
- hosted_zone.tftest.hcl - Route 53 zone configuration, force_destroy settings
- eks_cluster.tftest.hcl - Cluster, IAM roles, security groups, logging
- eks_node_group.tftest.hcl - Node group, scaling, capacity types, launch template
- aws_lb_controller.tftest.hcl - OIDC provider, IAM configuration, Helm deployment
- k8s_app.tftest.hcl - Deployment, Service, HPA, Ingress, security context
- Mock AWS Credentials: Required for provider initialization even in plan mode
- Kubernetes Provider: Tests use mock configuration - unset
KUBECONFIGto avoid interference - No Cluster Required: All tests run in plan mode without actual cluster connectivity
Tests automatically run in the test-eks-terraform-modules job before any deployment to validate all modules.
For detailed test suite documentation, troubleshooting guide (including Kubernetes provider issues), and comprehensive test explanations, see Terraform Testing.
infra-eks/
├── deployment/ # Root modules (environment-specific)
│ ├── backend/ # S3 state bucket
│ ├── ecr/ # ECR repository
│ ├── hosted_zone/ # Route 53 Hosted Zone
│ ├── ssl/ # ACM certificate
│ ├── app/ # Application infrastructure (EKS-specific)
│ │ ├── vpc/ # VPC and networking (with EKS tags)
│ │ ├── eks_cluster/ # EKS cluster (control plane)
│ │ ├── eks_node_group/ # Worker nodes with Auto Scaling
│ │ ├── aws_lb_controller/ # AWS Load Balancer Controller (Helm)
│ │ ├── k8s_app/ # Kubernetes Deployment/Service/Ingress/HPA
│ │ └── routing/ # Route 53 A records
│ ├── common.tfvars # Shared configuration
│ ├── domain.tfvars # Domain-specific configuration
│ ├── backend.tfvars # Backend configuration
│ └── backend-config.hcl # Backend initialization config
│
├── modules/ # Child modules (reusable)
│ ├── aws_lb_controller/ # AWS Load Balancer Controller module
│ ├── ecr/ # ECR module
│ ├── eks_cluster/ # EKS cluster module
│ ├── eks_node_group/ # EKS node group module
│ ├── hosted_zone/ # Route 53 Hosted Zone module
│ ├── k8s_app/ # Kubernetes application module
│ ├── routing/ # Route 53 routing module
│ └── ssl/ # ACM certificate module
│
├── tests/ # Terraform tests
│ ├── unit/ # Unit tests for modules
│ │ ├── aws_lb_controller.tftest.hcl
│ │ ├── ecr.tftest.hcl
│ │ ├── eks_cluster.tftest.hcl
│ │ ├── eks_node_group.tftest.hcl
│ │ ├── hosted_zone.tftest.hcl
│ │ ├── k8s_app.tftest.hcl
│ │ └── ssl.tftest.hcl
│ └── versions.tf # Provider versions for tests
│
├── docs/ # Documentation
├── run-tests.sh # Test runner script
├── test-runner.tf # Test configuration
└── test.log # Test output log
For questions or issues, please refer to the root README or open an issue in the GitHub repository.