This directory contains the complete Terraform infrastructure for deploying a high-availability NestJS application using AWS Elastic Container Service (ECS). The implementation follows Infrastructure as Code (IaC) principles with modular, reusable components.
| Document | Description |
|---|---|
| Prerequisites and Setup | Complete guide for first-time configuration, AWS credentials setup, and domain configuration |
| AWS Resources Deep Dive | In-depth technical documentation of all AWS resources, module architecture, IAM policies, and security groups |
| CI/CD Workflows | GitHub Actions workflows for automated deployment and teardown |
| Terraform Testing | Testing framework, troubleshooting guide, and detailed test suite documentation |
| Security Decisions | Security architecture decisions, threat model, and compliance considerations |
Required Setup: Before deploying this infrastructure, you must configure several variables and files with your own values. See the Prerequisites and Setup Guide for:
- AWS account and domain requirements
- S3 backend configuration
- Project name and environment setup
- GitHub secrets for CI/CD
- DNS configuration steps
- High-Level Overview
- Key Components
- Environment Configuration Differences
- CI/CD Workflows
- Terraform Testing
- Project Structure
The ECS infrastructure implements a production-ready, highly available container orchestration platform on AWS. The architecture leverages AWS-native services to provide automated scaling, load balancing, and secure HTTPS communication.
Internet
↓
[Route 53] → Points to ALB DNS
↓
[Application Load Balancer (ALB)]
↓ (HTTPS:443 / HTTP:80→HTTPS)
[ALB Target Group] ← Health checks ECS Tasks
↓
[ECS Tasks] (in awsvpc mode)
↓ Running on
[EC2 Instances] (in private VPC subnets)
↓ Managed by
[Auto Scaling Group + Capacity Provider]
↓ Part of
[ECS Cluster]
- Inbound Traffic: User requests hit Route 53 → ALB (validates SSL certificate) → Target Group → ECS Tasks on EC2 instances
- Outbound Traffic: ECS Tasks → NAT Gateway (in public subnets) → Internet Gateway → Internet
- High Availability: Multi-AZ deployment ensures infrastructure resilience
- Security: Private subnets for compute, security groups with least-privilege access, IAM roles for service-to-service authorization
- Scalability: Auto Scaling Groups for EC2 instances, ECS Service auto-scaling for tasks
- Modularity: Reusable Terraform modules following Single Responsibility Principle
- Environment Flexibility: Configuration-driven differences between dev and prod environments
| AWS Service | Role in the Architecture |
|---|---|
| Route 53 & ACM | The Route 53 Hosted Zone manages DNS records. AWS Certificate Manager (ACM) provides and validates the SSL certificate, which is attached to the ALB's HTTPS listener to enable secure communication. |
| Application Load Balancer (ALB) | Distributes incoming traffic. It listens on Port 443 (HTTPS) and redirects all Port 80 (HTTP) traffic to HTTPS (301 Permanent Redirect). The ALB forwards traffic to an ALB Target Group, which acts as the dynamic list of healthy ECS Tasks. |
| ECS Task | The fundamental unit of deployment (the running container). Deployed onto EC2 instances, tasks receive a private IP via awsvpc networking and are registered with the ALB Target Group. |
| Virtual Private Cloud (VPC) | Provides an isolated network, defining public and private subnets across multiple AZs for high availability. NAT Gateways enable private resources to access the internet. |
| ECS Service | The deployment mechanism that defines how many copies of a specific task definition should run on a given ECS cluster, automatically maintaining that desired count and integrating with an Elastic Load Balancer for traffic distribution. |
| ECS Cluster | The compute capacity (EC2 instances) running within private subnets. It uses an Auto Scaling Group (ASG) and a Capacity Provider who tells the ECS how to manage the ASG scaling. A critical element in the cluster is the ECS Control Plane, the central component that coordinates containers (i.e., tasks) and ensures cluster wellbeing. Furthermore, each EC2 instance includes an ECS Agent that reports containers health to the Control Plane. |
| Elastic Container Registry (ECR) | A private Docker registry storing application container images. Uses priority rules (Rule 1: untagged, Rule 2: tagged) to aggressively expire images while safely retaining a configurable count of environment-tagged (dev-, prod-) images. |
For detailed technical documentation on module architecture, IAM policies, security groups, and resource configurations, see AWS Resources Deep Dive.
The infrastructure supports two environments (dev and prod) with configuration-driven differences to balance cost, performance, and reliability.
| Component | Setting | dev | prod | Rationale |
|---|---|---|---|---|
| VPC | NAT Gateway | Single (one_nat_gateway = true) | Multiple (one per AZ) | Cost savings in dev; high availability in prod |
| ECR | Tagged Image Retention | 3 images | 10 images | Minimal storage in dev; deeper rollback history in prod |
| ECS Cluster (ASG) | Min Instances | 1 | 2 | Lower baseline cost in dev; always-on capacity in prod |
| ECS Cluster (ASG) | Max Instances | 2 | 4 | Limited scaling in dev; room for growth in prod |
| ECS Cluster | Max Utilization | 100% | 75% | Run instances at full capacity in dev; maintain scaling buffer in prod |
| ECS Cluster (ASG) | Scale-In Protection | false | true | Quick teardown in dev; protect running tasks in prod |
| ECS Service | Task Placement | binpack:cpu | spread:az, spread:instanceId | Cost optimization in dev; fault tolerance in prod |
| ALB | Deletion Protection | false | true | Easy cleanup in dev; prevent accidental deletion in prod |
| Route 53 | force_destroy | true | false | Allow cleanup in dev; protect domain in prod |
Configuration File: infra-ecs/deployment/common.tfvars
Example:
environment = "prod"
project_name = "high-availability-app"
# VPC Configuration
single_nat_gateway = false # prod uses multiple
# ECR Configuration
image_retention_max_count = {
dev = 3
prod = 10
}
# ECS Cluster Configuration
instance_min_size = {
dev = 1
prod = 2
}
instance_max_size = {
dev = 2
prod = 4
}
cluster_max_capacity_provider_reservation = {
dev = 100
prod = 75
}
protect_asg_from_scale_in = {
dev = false
prod = true
}
# ECS Service Configuration
task_placement_strategies = {
dev = [
{
type = "binpack"
field = "cpu"
}
]
prod = [
{
type = "spread"
field = "attribute:ecs.availability-zone"
},
{
type = "spread"
field = "instanceId"
}
]
}All infrastructure deployment and teardown is managed through GitHub Actions workflows located in .github/workflows/ecs/. These workflows automate Terraform operations in a dependency-aware order.
Deployment Stages:
- Initial Setup (Manual) - Deploy S3 state bucket and Route53 hosted zone
- Full Deployment (On push to main) - Complete infrastructure from ECR to running application (10 steps)
- Teardown (Manual) - Clean removal in reverse dependency order
ecs-deploy-hosted-zone.yaml- One-time setup of foundational infrastructureecs-deploy-aws-infra.yaml- Full deployment pipeline (ECR → SSL → Docker image → VPC → Cluster → ALB → Service → Routing)ecs-destroy-aws-infra.yaml- Application infrastructure teardownecs-destroy-hosted-zone.yaml- DNS and state storage cleanup
AWS_ACCESS_KEY_ID- AWS IAM user access keyAWS_SECRET_ACCESS_KEY- AWS IAM user secret key
For detailed workflow documentation, job sequences, manual steps, and troubleshooting, see CI/CD Workflows.
cd infra-ecs/
chmod +x run-tests.sh
./run-tests.shWhat It Does:
- Runs all
.tftest.hclfiles intests/unit/usingterraform test - Tests run in plan mode (no real AWS resources created)
- Uses mock AWS credentials
- Outputs test results to
test.log
The test suite validates 5 core modules:
- alb.tftest.hcl - ALB configuration, listeners, target groups, security groups
- ecr.tftest.hcl - Repository configuration, lifecycle policies, image scanning
- ecs_cluster.tftest.hcl - Cluster, ASG, Launch Template, IAM configuration
- ecs_service.tftest.hcl - Task definition, service configuration, auto-scaling
- ssl.tftest.hcl - Certificate configuration, DNS validation, SANs
Tests automatically run in the test-terraform-modules job before any deployment to validate all modules.
For detailed test suite documentation, troubleshooting guide, and ECS-specific testing issues, see Terraform Testing.
infra-ecs/
├── deployment/ # Root modules (environment-specific)
│ ├── backend/ # S3 state bucket
│ ├── ecr/ # ECR repository
│ ├── hosted_zone/ # Route 53 Hosted Zone
│ ├── ssl/ # ACM certificate
│ ├── app/ # Application infrastructure (ECS-specific)
│ │ ├── vpc/ # VPC and networking
│ │ ├── ecs_cluster/ # ECS cluster and ASG
│ │ ├── alb/ # Application Load Balancer
│ │ ├── ecs_service/ # ECS service and tasks
│ │ └── routing/ # Route 53 A records
│ ├── common.tfvars # Shared configuration
│ ├── domain.tfvars # Domain-specific configuration
│ ├── backend.tfvars # Backend configuration
│ └── backend-config.hcl # Backend initialization config
│
├── modules/ # Child modules (reusable)
│ ├── alb/ # ALB module
│ ├── alb_rule/ # ALB listener rule module
│ ├── ecr/ # ECR module
│ ├── ecs_cluster/ # ECS cluster module
│ ├── ecs_service/ # ECS service module
│ ├── hosted_zone/ # Route 53 Hosted Zone module
│ ├── routing/ # Route 53 routing module
│ └── ssl/ # ACM certificate module
│
├── tests/ # Terraform tests
│ ├── unit/ # Unit tests for modules
│ │ ├── alb.tftest.hcl
│ │ ├── ecr.tftest.hcl
│ │ ├── ecs_cluster.tftest.hcl
│ │ ├── ecs_service.tftest.hcl
│ │ └── ssl.tftest.hcl
│ └── versions.tf # Provider versions for tests
│
├── docs/ # Documentation
├── run-tests.sh # Test runner script
├── test-runner.tf # Test configuration
└── test.log # Test output log
For questions or issues, please refer to the root README or open an issue in the GitHub repository.