Skip to content

Commit 944cbda

Browse files
Add automated EC2 cleanup and reduce workflow parallelism
Changes: - Add EC2 cleanup step in workflow to terminate orphaned test instances - Reduce max-parallel from 9 to 5 for both test jobs (cost optimization) - Add cleanup Lambda module for daily automated cleanup (safety net) - Add comprehensive documentation for EC2 cleanup strategy Workflow improvements: - Cleanup runs before terraform destroy with if: always() - Terminates instances matching test patterns immediately - Reduces cost from long-running orphaned instances Cost optimization: - Lower parallelism reduces simultaneous instance count - Immediate cleanup prevents instances running for hours - Lambda provides backup cleanup at 1 AM UTC daily Documentation: - EC2_CLEANUP_PLAN.md: Multi-layer cleanup strategy - IAM_SECURITY_INVESTIGATION.md: IAM security analysis - modules/cleanup/: Complete Terraform module for Lambda cleanup
1 parent ded1ec6 commit 944cbda

File tree

7 files changed

+2031
-3
lines changed

7 files changed

+2031
-3
lines changed

.github/workflows/infra_deploy_val.yaml

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,7 @@ jobs:
325325
timeout-minutes: 90
326326
strategy:
327327
fail-fast: false
328-
max-parallel: 9
328+
max-parallel: 5
329329
matrix:
330330
vpu: ["01", "02", "03N", "03S", "03W", "04", "05", "06", "07", "08", "09", "10L", "10U", "11", "12", "13", "14", "15", "16", "17", "18"]
331331
permissions:
@@ -517,7 +517,7 @@ jobs:
517517
timeout-minutes: 150
518518
strategy:
519519
fail-fast: false
520-
max-parallel: 9
520+
max-parallel: 5
521521
matrix:
522522
vpu: ["01", "02", "03N", "03S", "03W", "04", "05", "06", "07", "08", "09", "10L", "10U", "11", "12", "13", "14", "15", "16", "17", "18"]
523523
permissions:
@@ -807,7 +807,40 @@ jobs:
807807
done
808808
fi
809809
continue-on-error: true
810-
810+
811+
- name: Terminate Orphaned Test EC2 Instances
812+
if: always()
813+
run: |
814+
echo "Checking for orphaned test EC2 instances..."
815+
816+
# Get all running/pending test instances from this workflow
817+
INSTANCE_IDS=$(aws ec2 describe-instances \
818+
--filters \
819+
"Name=tag:Project,Values=test_short_range_vpu_*,test_medium_range_vpu_*,test_analysis_assim_extend_vpu_*" \
820+
"Name=instance-state-name,Values=running,pending" \
821+
--query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Project`].Value|[0],LaunchTime]' \
822+
--output text)
823+
824+
if [ -z "$INSTANCE_IDS" ]; then
825+
echo "✓ No orphaned test instances found"
826+
else
827+
echo "Found orphaned instances:"
828+
echo "$INSTANCE_IDS"
829+
830+
# Extract just instance IDs
831+
INSTANCE_LIST=$(echo "$INSTANCE_IDS" | awk '{print $1}' | tr '\n' ' ')
832+
833+
if [ -n "$INSTANCE_LIST" ]; then
834+
echo "Terminating instances: $INSTANCE_LIST"
835+
aws ec2 terminate-instances --instance-ids $INSTANCE_LIST || echo "Failed to terminate some instances"
836+
837+
# Count terminated instances
838+
INSTANCE_COUNT=$(echo $INSTANCE_LIST | wc -w)
839+
echo "✓ Terminated $INSTANCE_COUNT orphaned test instances"
840+
fi
841+
fi
842+
continue-on-error: true
843+
811844
- name: Terraform Destroy
812845
id: destroy_attempt
813846
run: |

0 commit comments

Comments
 (0)