Skip to content

Commit 69b61ca

Browse files
feat: OpenShift cluster cleanup Lambda (v3.0.0)
Automated Lambda function for comprehensive OpenShift cluster cleanup across AWS regions. Handles VPC, Load Balancers, Route53, S3, and EC2 instances. Features: - Automatic cluster detection via master node identification - Comprehensive resource cleanup in dependency order - Multi-region support with configurable targeting - DRY_RUN mode for safe testing - CloudWatch metrics and SNS notifications - Scheduled execution via EventBridge (default: 15 minutes) Infrastructure: - Python 3.13 ARM64 runtime - 1024MB memory, 600s timeout - Concurrency: 1 (prevents race conditions) - IAM permissions: OpenShift-specific (VPC, ELB, Route53, S3) Deployment: - CDK-based infrastructure as code - CloudFormation parameters for configuration - Just commands for easy management Testing: - 27 unit tests covering OpenShift operations - Integration tests for orchestration flow - All tests passing with good coverage
1 parent b025060 commit 69b61ca

29 files changed

+2821
-4289
lines changed
Lines changed: 42 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
# AWS Resources Cleanup
1+
# OpenShift Cluster Cleanup
22

3-
Automated Lambda for EC2, EBS, EKS, and OpenShift cleanup across AWS regions.
3+
Automated Lambda for OpenShift cluster cleanup across AWS regions.
44

55
**Runtime**: Python 3.13 ARM64, 1024MB, 600s timeout
66
**Default**: DRY_RUN mode (logs only)
77
**Concurrency**: 1 (prevents race conditions)
88

99
## Features
1010

11-
- **EC2**: TTL expiration, stop policy, long-stopped instances, untagged cleanup
12-
- **EBS**: Unattached volume deletion
13-
- **EKS**: CloudFormation stack deletion (skip pattern: `pe-.*`)
14-
- **OpenShift**: Full cluster cleanup (VPC, ELB, Route53, S3)
15-
16-
**Protection**: Persistent tags (`jenkins-*`, `pmm-dev`), valid billing tags, `PerconaKeep`, "do not remove" in names
11+
- **OpenShift Detection**: Automatic discovery via master node identification
12+
- **Comprehensive Cleanup**: VPC, Load Balancers, Route53, S3, EC2 instances
13+
- **Multi-Region**: Scans all or specific AWS regions
14+
- **DRY_RUN Mode**: Safe testing without actual resource deletion
15+
- **Monitoring**: CloudWatch logs, metrics, and SNS notifications
16+
- **Scheduled Execution**: Configurable via EventBridge (default: 15 minutes)
1717

1818
## Quick Start
1919

@@ -33,7 +33,7 @@ just deploy # Deploy (DRY_RUN)
3333
just logs # Tail logs
3434
just invoke-aws # Manual trigger
3535
just params # Show config
36-
just test # Run tests (176, 87% coverage)
36+
just test # Run tests
3737
```
3838

3939
Run `just` for all commands.
@@ -44,33 +44,44 @@ Key parameters (CloudFormation):
4444

4545
| Parameter | Default | Description |
4646
|-----------|---------|-------------|
47-
| `DryRunMode` | `true` | Safe mode |
47+
| `DryRunMode` | `true` | Safe mode - logs only |
4848
| `ScheduleRateMinutes` | `15` | Run frequency |
4949
| `TargetRegions` | `all` | Regions to scan |
5050
| `LogLevel` | `INFO` | Log verbosity |
51-
| `UntaggedThresholdMinutes` | `30` | Grace period |
52-
| `VolumeCleanupEnabled` | `true` | Enable volume cleanup |
53-
| `EKSCleanupEnabled` | `true` | Enable EKS cleanup |
5451
| `OpenShiftCleanupEnabled` | `true` | Enable OpenShift cleanup |
52+
| `OpenShiftBaseDomain` | `cd.percona.com` | DNS base domain |
5553

5654
View all: `just params`
5755

58-
## Cleanup Policies
59-
60-
Priority order:
61-
62-
1. **TTL** - `creation-time` + `delete-cluster-after-hours` → TERMINATE
63-
2. **Stop** - `stop-after-days` → STOP
64-
3. **Long Stopped** - >30 days → TERMINATE
65-
4. **Untagged** - Missing `iit-billing-tag` → TERMINATE
56+
## OpenShift Cluster Detection
57+
58+
The Lambda automatically detects OpenShift clusters by:
59+
1. Scanning EC2 instances for master nodes (naming pattern: `*-master-*`)
60+
2. Detecting infrastructure ID from VPC tags
61+
3. Identifying all cluster resources (VPC, ELB, Route53, S3)
62+
4. Orchestrating complete cleanup in dependency order
63+
64+
Cleanup process:
65+
1. Load balancers (ELB, ALB, NLB)
66+
2. NAT gateways and Elastic IPs
67+
3. Network interfaces
68+
4. VPC endpoints
69+
5. Security groups
70+
6. Subnets and route tables
71+
7. Internet gateways
72+
8. VPC
73+
9. Route53 DNS records
74+
10. S3 state buckets
75+
11. EC2 instances
6676

6777
## Logging
6878

6979
```
70-
Instance i-0d09... protected: Valid billing tag 'ps-package-testing'
71-
[DRY-RUN] Would TERMINATE instance i-085e... in us-east-2: Missing billing tag
72-
Instance scan for us-west-2: 11 scanned, 1 actions, 10 protected
73-
Cleanup complete: 31 actions across 17 regions (15.4s)
80+
Processing region for OpenShift cleanup: us-east-2
81+
OpenShift cluster detected: my-cluster (infra-id: my-cluster-abc123)
82+
[DRY-RUN] Would TERMINATE_OPENSHIFT_CLUSTER: my-cluster
83+
OpenShift scan complete for us-east-2: 342 instances scanned, 1 cluster found
84+
Cleanup complete: 1 action across 17 regions (18.2s)
7485
```
7586

7687
## Troubleshooting
@@ -83,13 +94,15 @@ just invoke-aws # Test manually
8394

8495
**Issues:**
8596
- No actions: Set `DryRunMode=false`
86-
- Volume cleanup fails: Check `VolumeCleanupEnabled=true`, volumes `available`
87-
- OpenShift errors: Auto-retries 3 times
97+
- OpenShift errors: Check CloudWatch logs for details
98+
- No clusters detected: Verify master node naming pattern
8899

89100
## Architecture
90101

91102
```
92-
EventBridge → Lambda → EC2/Volumes/EKS/OpenShift → SNS
103+
EventBridge (15min) → Lambda → OpenShift Detection → Cleanup Orchestration → SNS
104+
105+
VPC, ELB, Route53, S3, EC2
93106
```
94107

95-
Justfile retrieves function name from CDK outputs for alignment.
108+
Lambda retrieves function name from CDK outputs for alignment with infrastructure.

IaC/cdk/aws-resources-cleanup/app.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
"""CDK app for AWS Resources Cleanup Lambda."""
2+
"""CDK app for OpenShift Cluster Cleanup Lambda."""
33

44
import os
55
import aws_cdk as cdk
@@ -10,15 +10,15 @@
1010
ResourceCleanupStack(
1111
app,
1212
"AWSResourcesCleanupStack",
13-
description="Comprehensive AWS resource cleanup: EC2, EKS, OpenShift infrastructure",
13+
description="OpenShift cluster infrastructure cleanup for AWS",
1414
env=cdk.Environment(
1515
account=os.getenv('CDK_DEFAULT_ACCOUNT'),
1616
region=os.getenv('CDK_DEFAULT_REGION', 'us-east-2')
1717
),
1818
tags={
1919
"Project": "PlatformEngineering",
2020
"ManagedBy": "CDK",
21-
"iit-billing-tag": "resource-cleanup"
21+
"iit-billing-tag": "openshift-cleanup"
2222
}
2323
)
2424

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1-
"""EC2 Cleanup Lambda - Modular implementation."""
1+
"""OpenShift Cluster Cleanup Lambda for AWS."""
22

33
from .handler import lambda_handler
44

5+
__version__ = "3.0.0"
6+
__description__ = "Automated OpenShift cluster infrastructure cleanup"
7+
58
__all__ = ["lambda_handler"]
Lines changed: 2 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,7 @@
1-
"""EC2 instance management, volume cleanup, and cleanup policies."""
1+
"""EC2 operations for OpenShift cluster cleanup."""
22

3-
from .instances import (
4-
cirrus_ci_add_iit_billing_tag,
5-
is_protected,
6-
execute_cleanup_action,
7-
)
8-
from .policies import (
9-
check_ttl_expiration,
10-
check_stop_after_days,
11-
check_long_stopped,
12-
check_untagged,
13-
)
14-
from .volumes import (
15-
check_unattached_volume,
16-
delete_volume,
17-
is_volume_protected,
18-
)
3+
from .instances import execute_cleanup_action
194

205
__all__ = [
21-
"cirrus_ci_add_iit_billing_tag",
22-
"is_protected",
236
"execute_cleanup_action",
24-
"check_ttl_expiration",
25-
"check_stop_after_days",
26-
"check_long_stopped",
27-
"check_untagged",
28-
"check_unattached_volume",
29-
"delete_volume",
30-
"is_volume_protected",
317
]

0 commit comments

Comments
 (0)