This Terraform configuration creates a complete EKS cluster infrastructure in AWS GovCloud with KEDA, Karpenter, and Airflow deployment, including all necessary dependencies and custom Airflow Docker image support.
This Terraform configuration is specifically designed for AWS GovCloud environments and fully complies with GovCloud's internet access restrictions. All container images are sourced from ECR repositories within your AWS account, ensuring no external registry dependencies.
The infrastructure is organized into modular components:
Module | Purpose | Key Features |
---|---|---|
VPC (./modules/vpc ) |
Network foundation | Public/private subnets, NAT Gateway, Karpenter discovery tags |
IAM (./modules/iam ) |
Identity & access | EKS roles, Karpenter policies, service account permissions |
EKS (./modules/eks ) |
Kubernetes cluster | EKS cluster, node groups, OIDC provider via JPL IAM as Code |
EFS (./modules/efs ) |
Persistent storage | File system, access points for Airflow DAGs/logs |
SQS (./modules/sqs ) |
Message queuing | Karpenter interruption handling with dead letter queue |
ECR (./modules/ecr ) |
Container registry | Image repositories with lifecycle policies |
Kubernetes (./modules/kubernetes ) |
Application deployment | KEDA, Karpenter, Airflow via Helm charts |
- Terraform >= 1.0
- AWS CLI configured for GovCloud
- Docker for building custom Airflow image
- kubectl for cluster interaction
- helm for chart management
- pre-commit (optional, for development workflow)
- EKS cluster management
- VPC and networking resources
- IAM roles and policies
- EFS file systems
- SQS queues
- ECR repositories
CRITICAL: This Terraform configuration requires an existing VPC with specific configuration:
VPC Requirements:
- VPC Tag: Must have tag
JplVpcType = "TGW-Internal"
- DNS Settings: Both
enableDnsSupport
andenableDnsHostnames
must be enabled - Subnet Tags: Private subnets must have tag
karpenter.sh/discovery = "<cluster-name>"
(replace with your actual cluster name)
Verification Commands:
# Verify VPC exists with correct tag
aws ec2 describe-vpcs --filters "Name=tag:JplVpcType,Values=TGW-Internal" --region us-gov-west-1
# Get VPC ID from the above command output, then verify VPC DNS settings
VPC_ID=$(aws ec2 describe-vpcs --filters "Name=tag:JplVpcType,Values=TGW-Internal" --query 'Vpcs[0].VpcId' --output text --region us-gov-west-1)
aws ec2 describe-vpc-attribute --vpc-id $VPC_ID --attribute enableDnsSupport --region us-gov-west-1
aws ec2 describe-vpc-attribute --vpc-id $VPC_ID --attribute enableDnsHostnames --region us-gov-west-1
# Verify subnets have Karpenter discovery tags (replace CLUSTER_NAME with your actual cluster name)
CLUSTER_NAME="your-cluster-name"
aws ec2 describe-subnets --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:karpenter.sh/discovery,Values=$CLUSTER_NAME" --region us-gov-west-1
CRITICAL: Before running this Terraform configuration, your AWS GovCloud administrators must deploy the JPL IAM as Code CloudFormation stack:
Required CloudFormation Stack:
- Stack Name Pattern:
StackSet-jpl-roles-as-code-*
(dynamically generated) - Purpose: Provides
Custom::JplEksFederation
resource for OIDC provider creation - Deployment: Must be deployed by administrators with elevated permissions
- Status: Human-in-the-loop process that cannot be automated
Verification Commands:
# Check if the CloudFormation stack exists
aws cloudformation list-stacks --region us-gov-west-1 \
--query 'StackSummaries[?contains(StackName, `StackSet-jpl-roles-as-code`) && StackStatus==`CREATE_COMPLETE`]'
# Check if custom resource exports are available
aws cloudformation list-exports --region us-gov-west-1 \
--query 'Exports[?Name==`Custom::JplEksFed::ServiceToken`]'
This configuration addresses AWS GovCloud's internet access restrictions by sourcing all container images from ECR repositories within your AWS account.
Before deployment, mirror all required images to ECR:
# Mirror all required images to ECR
./scripts/mirror-images.sh
Required Images:
- KEDA:
kedacore/keda
,kedacore/keda-metrics-apiserver
,kedacore/keda-admission-webhooks
- Karpenter:
karpenter/controller
,karpenter/webhook
- Supporting:
statsd-exporter
,redis
,git-sync/git-sync
,postgresql
- Base Images:
unity/alpine
,unity/busybox
,unity/nginx
(for sidecars)
# Setup pre-commit hooks and tools
./scripts/setup-pre-commit.sh
Available Checks:
- Terraform: Formatting, validation, security scanning
- Security: Credential detection, private key detection
- Documentation: Markdown linting, YAML validation
- Code Quality: JSON validation, trailing whitespace
# Ensure CloudFormation stack is deployed
aws cloudformation list-stacks --region us-gov-west-1 \
--query 'StackSummaries[?contains(StackName, `StackSet-jpl-roles-as-code`) && StackStatus==`CREATE_COMPLETE`]'
# Verify VPC exists and get its ID
aws ec2 describe-vpcs --filters "Name=tag:JplVpcType,Values=TGW-Internal" --region us-gov-west-1
# Get VPC ID and verify DNS settings are enabled (required for EKS)
VPC_ID=$(aws ec2 describe-vpcs --filters "Name=tag:JplVpcType,Values=TGW-Internal" --query 'Vpcs[0].VpcId' --output text --region us-gov-west-1)
aws ec2 describe-vpc-attribute --vpc-id $VPC_ID --attribute enableDnsSupport --region us-gov-west-1
aws ec2 describe-vpc-attribute --vpc-id $VPC_ID --attribute enableDnsHostnames --region us-gov-west-1
# Verify subnets have required Karpenter discovery tags (replace CLUSTER_NAME with your actual cluster name)
CLUSTER_NAME="your-cluster-name"
aws ec2 describe-subnets --filters "Name=vpc-id,Values=$VPC_ID" "Name=tag:karpenter.sh/discovery,Values=$CLUSTER_NAME" --region us-gov-west-1
./scripts/mirror-images.sh
./scripts/build-airflow-image.sh
terraform init
terraform plan
terraform apply
# Check cluster status
kubectl get nodes
# Check application deployments
kubectl get pods -n keda
kubectl get pods -n karpenter
kubectl get pods -n sps
# Check Karpenter resources
kubectl get nodepools
kubectl get nodeclasses
kubectl port-forward -n sps svc/airflow-webserver 8080:8080
Then open http://localhost:8080 in your browser.
Customize the deployment by modifying terraform.tfvars
:
cluster_name = "your-cluster-name"
kubernetes_version = "1.32"
vpc_cidr = "10.0.0.0/16"
node_group_instance_types = ["t3.medium"]
node_group_desired_size = 4
- KEDA: Autoscaling for Airflow workers (1-10 replicas)
- Karpenter: Node provisioning with c6i.large instances
- Uses subnets and security groups tagged with
karpenter.sh/discovery
- Supports both Spot and On-Demand instances
- Automatic node lifecycle management
- Uses subnets and security groups tagged with
- Airflow: Custom image with Unity SPS plugins
- EFS: Persistent storage for DAGs, logs, and shared data
- SQS: Interruption handling for Karpenter
The configuration provides comprehensive outputs for integration with other systems:
cluster_id
,cluster_endpoint
,cluster_certificate_authority_data
vpc_id
,private_subnet_ids
,public_subnet_ids
efs_file_system_id
,efs_security_group_id
karpenter_queue_url
,karpenter_queue_arn
airflow_repository_url
,karpenter_controller_repository_url
keda_operator_repository_url
karpenter_controller_role_arn
karpenter_node_instance_profile_name
- Private subnets for worker nodes
- Security groups with minimal required access
- IAM roles with least privilege policies
- OIDC provider for secure service account integration
- Public access restricted to specified CIDR blocks
- Encrypted EFS file system
- SQS queue policies for secure message handling
- ECR image scanning enabled
- NAT Gateway: ~$0.045/hour
- EKS cluster: ~$0.10/hour
- Worker nodes: Based on instance type and usage
- EFS: Storage and throughput costs
- SQS: Per-message charges
- ECR: Storage costs
- Cost Optimization: Karpenter can provision Spot instances
- Kubernetes Updates: Plan carefully for version upgrades
- Node Groups: Zero-downtime updates supported
- Security Groups: Can be modified without cluster downtime
- IAM Policies: Can be updated without affecting workloads
- ECR Lifecycle: Automatic cleanup of old images
- Karpenter: Automatic node lifecycle management
- KEDA: Automatic scaling based on workload demand
Common Error Messages:
No exports found for Custom::JplEksFed::ServiceToken
: CloudFormation stack not deployederror creating CloudFormation stack
: Insufficient permissionsNo IAM OpenID Connect Provider found
: OIDC provider doesn't existAccess Denied
: Insufficient permissions
Resolution Steps:
- Verify CloudFormation stack deployment
- Check custom resource exports availability
- Ensure AWS credentials have necessary permissions
- Re-run
terraform plan
andterraform apply
Getting Stack Name:
aws cloudformation list-stacks --region us-gov-west-1 \
--query 'StackSummaries[?contains(StackName, `StackSet-jpl-roles-as-code`)].{StackName:StackName,Status:StackStatus}' \
--output table
Check Discovery Tags:
# Verify subnets have karpenter.sh/discovery tags
aws ec2 describe-subnets --region us-gov-west-1 \
--filters "Name=tag:karpenter.sh/discovery,Values=your-cluster-name"
# Verify security groups have karpenter.sh/discovery tags
aws ec2 describe-security-groups --region us-gov-west-1 \
--filters "Name=tag:karpenter.sh/discovery,Values=your-cluster-name"
AWS Auth Configuration:
# Check if Karpenter node role is in aws-auth
kubectl get configmap aws-auth -n kube-system -o yaml
# The aws-auth ConfigMap is automatically managed by Terraform
# If Karpenter nodes can't authenticate, verify the role ARN is correct
# and the role has the necessary permissions
Common Karpenter Node Issues:
- Authentication failures: Check aws-auth ConfigMap configuration
- API server connectivity: Verify security groups and subnet routing
- Node not joining: Check IAM role permissions and instance profile
- AWS EKS Documentation
- KEDA Documentation
- Karpenter Documentation
- Apache Airflow Documentation
- AWS GovCloud Documentation
- Fork the repository
- Create a feature branch
- Make your changes
- Run pre-commit hooks:
pre-commit run --all-files
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
Name | Version |
---|---|
terraform | >= 1.0 |
aws | ~> 5.0 |
helm | ~> 2.12 |
kubernetes | ~> 2.25 |
null | ~> 3.0 |
tls | ~> 4.0 |
Name | Version |
---|---|
aws | 5.100.0 |
null | 3.2.4 |
Name | Source | Version |
---|---|---|
ecr | ./modules/ecr | n/a |
efs | ./modules/efs | n/a |
eks | ./modules/eks | n/a |
iam | ./modules/iam | n/a |
kubernetes | ./modules/kubernetes | n/a |
sqs | ./modules/sqs | n/a |
Name | Type |
---|---|
null_resource.validate_subnets | resource |
aws_caller_identity.current | data source |
aws_region.current | data source |
aws_subnets.private | data source |
aws_vpc.existing | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
cluster_name | Name of the EKS cluster | string |
"gman-test" |
no |
deploy_dags | Whether to deploy the default DAGs (rdrgen, edrgen, vic2png) to Airflow | bool |
true |
no |
kubernetes_version | Kubernetes version for the EKS cluster | string |
"1.32" |
no |
node_group_desired_size | Desired number of nodes in the node group | number |
4 |
no |
node_group_instance_types | Instance types for the node group | list(string) |
[ |
no |
node_group_max_size | Maximum number of nodes in the node group | number |
4 |
no |
node_group_min_size | Minimum number of nodes in the node group | number |
0 |
no |
public_access_cidrs | CIDR blocks for public access to EKS cluster | list(string) |
[ |
no |
service_ipv4_cidr | CIDR block for Kubernetes services | string |
"10.100.0.0/16" |
no |
tags | Tags to apply to all resources | map(string) |
{ |
no |
Name | Description |
---|---|
airflow_release_name | Airflow Helm release name |
airflow_repository_url | Airflow ECR repository URL |
alpine_repository_url | Alpine base image ECR repository URL |
aws_ebs_csi_driver_repository_url | AWS EBS CSI Driver ECR repository URL |
busybox_repository_url | Busybox base image ECR repository URL |
cluster_arn | EKS cluster ARN |
cluster_certificate_authority_data | EKS cluster certificate authority data |
cluster_endpoint | EKS cluster endpoint |
cluster_id | EKS cluster ID |
cluster_name | EKS cluster name |
cluster_oidc_issuer_url | EKS cluster OIDC issuer URL |
cluster_security_group_id | EKS cluster security group ID |
edrgen_repository_url | Unity EDRGEN ECR repository URL |
efs_file_system_id | EFS file system ID |
efs_security_group_id | EFS security group ID |
eks_pause_repository_url | EKS pause image ECR repository URL |
external_attacher_repository_url | External Attacher ECR repository URL |
external_provisioner_repository_url | External Provisioner ECR repository URL |
external_resizer_repository_url | External Resizer ECR repository URL |
karpenter_controller_repository_url | Karpenter controller ECR repository URL |
karpenter_controller_role_arn | Karpenter controller IAM role ARN |
karpenter_node_instance_profile_name | Karpenter node instance profile name |
karpenter_queue_arn | Karpenter interruption queue ARN |
karpenter_queue_url | Karpenter interruption queue URL |
karpenter_release_name | Karpenter Helm release name |
keda_operator_repository_url | KEDA operator ECR repository URL |
keda_release_name | KEDA Helm release name |
livenessprobe_repository_url | Liveness Probe ECR repository URL |
nginx_repository_url | Nginx base image ECR repository URL |
node_driver_registrar_repository_url | Node Driver Registrar ECR repository URL |
node_group_arn | EKS node group ARN |
node_group_id | EKS node group ID |
node_security_group_id | EKS node security group ID |
oidc_provider_arn | EKS OIDC provider ARN |
oidc_provider_stack_arn | CloudFormation stack ARN for OIDC provider |
private_subnet_ids | Private subnet IDs |
public_subnet_ids | Public subnet IDs |
rdrgen_repository_url | Unity RDRGEN ECR repository URL |
vic2png_repository_url | Unity VIC2PNG ECR repository URL |
vpc_id | VPC ID |