Skip to content

Commit 4289348

Browse files
feat: [PMM-14303] OpenShift cluster cleanup Lambda automation
Automated Lambda function for comprehensive OpenShift cluster infrastructure cleanup across AWS regions. Implements long-term solution for automatic cleanup of OpenShift clusters after retention period. Core Capabilities: - Automatic cluster detection via master node identification - Complete infrastructure removal: VPC, ELB, Route53, S3, EC2 - Dependency-aware cleanup ordering - Multi-region support with configurable targeting Safety & Monitoring: - DRY_RUN mode (default) for safe testing - CloudWatch metrics and alarms - SNS notifications for cleanup reports - Structured JSON logging with AWS Lambda Powertools Technical Stack: - Python 3.13 on ARM64 architecture - 1024MB memory, 600s timeout - CDK infrastructure as code - EventBridge scheduled execution (configurable, default 15min) Testing: - 27 unit tests covering all OpenShift operations - Integration tests for end-to-end workflows - All tests passing Jira: PMM-14303
1 parent 1b65d67 commit 4289348

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+7442
-0
lines changed

IaC/cdk/README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# CDK Projects
2+
3+
AWS CDK (Cloud Development Kit) infrastructure as code implementations.
4+
5+
## Projects
6+
7+
### aws-resources-cleanup
8+
Comprehensive AWS resource cleanup Lambda with CDK deployment.
9+
10+
**Purpose**: Automated cleanup of EC2 instances, EKS clusters, and OpenShift infrastructure based on TTL policies and billing tags.
11+
12+
**Features**:
13+
- TTL-based expiration (8h, 24h policies)
14+
- Billing tag validation (category + Unix timestamps)
15+
- EKS CloudFormation deletion
16+
- OpenShift comprehensive cleanup (VPC, ELB, Route53, S3, NAT, security groups)
17+
- DRY_RUN mode (default)
18+
- SNS notifications
19+
- Hourly EventBridge schedule
20+
21+
**Quick Start**:
22+
```bash
23+
cd aws-resources-cleanup
24+
just install # Install dependencies
25+
just deploy # Deploy in DRY_RUN mode
26+
just logs # Tail CloudWatch logs
27+
```
28+
29+
📖 **Full documentation**: [aws-resources-cleanup/README.md](aws-resources-cleanup/README.md)
30+
31+
## Requirements
32+
33+
- AWS CLI configured with appropriate profile
34+
- `uv` package manager: `brew install uv`
35+
- `just` task runner: `brew install just`
36+
37+
## Common Commands
38+
39+
All projects use Justfile for consistent automation:
40+
41+
| Command | Description |
42+
|---------|-------------|
43+
| `just install` | Install all dependencies |
44+
| `just synth` | Generate CloudFormation template |
45+
| `just diff` | Preview infrastructure changes |
46+
| `just deploy` | Deploy stack |
47+
| `just destroy` | Remove stack |
48+
| `just logs` | Tail CloudWatch logs (if applicable) |
49+
50+
## Adding New CDK Projects
51+
52+
When creating a new CDK project in this directory:
53+
54+
1. Create project directory: `mkdir project-name`
55+
2. Initialize CDK: `cdk init app --language python`
56+
3. Add Justfile for automation
57+
4. Add project-specific README.md
58+
5. Update this README with project description
59+
60+
## Resources
61+
62+
- [AWS CDK Documentation](https://docs.aws.amazon.com/cdk/)
63+
- [CDK Python API Reference](https://docs.aws.amazon.com/cdk/api/v2/python/)
64+
- [Justfile Documentation](https://github.com/casey/just)
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# CDK
2+
cdk.out/
3+
.cdk.staging/
4+
cdk.context.json
5+
6+
# Python
7+
__pycache__/
8+
*.py[cod]
9+
*$py.class
10+
*.so
11+
.Python
12+
build/
13+
develop-eggs/
14+
dist/
15+
downloads/
16+
eggs/
17+
.eggs/
18+
lib/
19+
lib64/
20+
parts/
21+
sdist/
22+
var/
23+
wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# Virtual environments
30+
venv/
31+
ENV/
32+
env/
33+
.venv
34+
35+
# Testing
36+
.pytest_cache/
37+
.coverage
38+
htmlcov/
39+
.tox/
40+
41+
# IDEs
42+
.vscode/
43+
.idea/
44+
*.swp
45+
*.swo
46+
*~
47+
48+
# OS
49+
.DS_Store
50+
Thumbs.db
51+
52+
# Lambda artifacts
53+
*.zip
54+
/tmp/
55+
56+
# Lambda dependencies (installed at build time)
57+
lambda/*.dist-info/
58+
lambda/bin/
59+
lambda/boto3/
60+
lambda/boto3-*.dist-info/
61+
lambda/botocore/
62+
lambda/botocore-*.dist-info/
63+
lambda/aws_lambda_powertools/
64+
lambda/aws_lambda_powertools-*.dist-info/
65+
lambda/aws_xray_sdk/
66+
lambda/aws_xray_sdk-*.dist-info/
67+
lambda/s3transfer/
68+
lambda/s3transfer-*.dist-info/
69+
lambda/jmespath/
70+
lambda/jmespath-*.dist-info/
71+
lambda/dateutil/
72+
lambda/python_dateutil-*.dist-info/
73+
lambda/urllib3/
74+
lambda/urllib3-*.dist-info/
75+
lambda/six.py
76+
lambda/six-*.dist-info/
77+
lambda/typing_extensions.py
78+
lambda/typing_extensions-*.dist-info/
79+
lambda/wrapt/
80+
lambda/wrapt-*.dist-info/
81+
lambda/.lock
82+
83+
# Logs
84+
*.log
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# OpenShift Cluster Cleanup
2+
3+
Automated Lambda for OpenShift cluster cleanup across AWS regions.
4+
5+
**Runtime**: Python 3.13 ARM64, 1024MB, 600s timeout
6+
**Default**: DRY_RUN mode (logs only)
7+
**Concurrency**: 1 (prevents race conditions)
8+
9+
## Features
10+
11+
- **OpenShift Detection**: Automatic discovery via master node identification
12+
- **Comprehensive Cleanup**: VPC, Load Balancers, Route53, S3, EC2 instances
13+
- **Multi-Region**: Scans all or specific AWS regions
14+
- **DRY_RUN Mode**: Safe testing without actual resource deletion
15+
- **Monitoring**: CloudWatch logs, metrics, and SNS notifications
16+
- **Scheduled Execution**: Configurable via EventBridge (default: 15 minutes)
17+
18+
## Quick Start
19+
20+
```bash
21+
brew install uv just
22+
cd IaC/cdk/aws-resources-cleanup
23+
just install
24+
just bootstrap # First time only
25+
just deploy # DRY_RUN mode
26+
just deploy-live # LIVE mode (destructive!)
27+
```
28+
29+
## Commands
30+
31+
```bash
32+
just deploy # Deploy (DRY_RUN)
33+
just logs # Tail logs
34+
just invoke-aws # Manual trigger
35+
just params # Show config
36+
just test # Run tests
37+
```
38+
39+
Run `just` for all commands.
40+
41+
## Configuration
42+
43+
Key parameters (CloudFormation):
44+
45+
| Parameter | Default | Description |
46+
|-----------|---------|-------------|
47+
| `DryRunMode` | `true` | Safe mode - logs only |
48+
| `ScheduleRateMinutes` | `15` | Run frequency |
49+
| `TargetRegions` | `all` | Regions to scan |
50+
| `LogLevel` | `INFO` | Log verbosity |
51+
| `OpenShiftCleanupEnabled` | `true` | Enable OpenShift cleanup |
52+
| `OpenShiftBaseDomain` | `cd.percona.com` | DNS base domain |
53+
54+
View all: `just params`
55+
56+
## OpenShift Cluster Detection
57+
58+
The Lambda automatically detects OpenShift clusters by:
59+
1. Scanning EC2 instances for master nodes (naming pattern: `*-master-*`)
60+
2. Detecting infrastructure ID from VPC tags
61+
3. Identifying all cluster resources (VPC, ELB, Route53, S3)
62+
4. Orchestrating complete cleanup in dependency order
63+
64+
Cleanup process:
65+
1. Load balancers (ELB, ALB, NLB)
66+
2. NAT gateways and Elastic IPs
67+
3. Network interfaces
68+
4. VPC endpoints
69+
5. Security groups
70+
6. Subnets and route tables
71+
7. Internet gateways
72+
8. VPC
73+
9. Route53 DNS records
74+
10. S3 state buckets
75+
11. EC2 instances
76+
77+
## Logging
78+
79+
```
80+
Processing region for OpenShift cleanup: us-east-2
81+
OpenShift cluster detected: my-cluster (infra-id: my-cluster-abc123)
82+
[DRY-RUN] Would TERMINATE_OPENSHIFT_CLUSTER: my-cluster
83+
OpenShift scan complete for us-east-2: 342 instances scanned, 1 cluster found
84+
Cleanup complete: 1 action across 17 regions (18.2s)
85+
```
86+
87+
## Troubleshooting
88+
89+
```bash
90+
just logs-recent # Check logs
91+
just params # Verify config
92+
just invoke-aws # Test manually
93+
```
94+
95+
**Issues:**
96+
- No actions: Set `DryRunMode=false`
97+
- OpenShift errors: Check CloudWatch logs for details
98+
- No clusters detected: Verify master node naming pattern
99+
100+
## Architecture
101+
102+
```
103+
EventBridge (15min) → Lambda → OpenShift Detection → Cleanup Orchestration → SNS
104+
105+
VPC, ELB, Route53, S3, EC2
106+
```
107+
108+
Lambda retrieves function name from CDK outputs for alignment with infrastructure.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#!/usr/bin/env python3
2+
"""CDK app for OpenShift Cluster Cleanup Lambda."""
3+
4+
import os
5+
import aws_cdk as cdk
6+
from stacks.resource_cleanup_stack import ResourceCleanupStack
7+
8+
app = cdk.App()
9+
10+
ResourceCleanupStack(
11+
app,
12+
"AWSResourcesCleanupStack",
13+
description="OpenShift cluster infrastructure cleanup for AWS",
14+
env=cdk.Environment(
15+
account=os.getenv('CDK_DEFAULT_ACCOUNT'),
16+
region=os.getenv('CDK_DEFAULT_REGION', 'us-east-2')
17+
),
18+
tags={
19+
"Project": "PlatformEngineering",
20+
"ManagedBy": "CDK",
21+
"iit-billing-tag": "openshift-cleanup"
22+
}
23+
)
24+
25+
app.synth()
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
{
2+
"app": "python3 app.py",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"requirements*.txt",
11+
"source.bat",
12+
"**/__pycache__",
13+
"**/*.pyc",
14+
".pytest_cache"
15+
]
16+
},
17+
"context": {
18+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
19+
"@aws-cdk/core:checkSecretUsage": true,
20+
"@aws-cdk/core:target-partitions": [
21+
"aws",
22+
"aws-cn"
23+
],
24+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
25+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
26+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
27+
"@aws-cdk/aws-iam:minimizePolicies": true,
28+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
29+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
30+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
31+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
32+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
33+
"@aws-cdk/core:enablePartitionLiterals": true,
34+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
35+
"@aws-cdk/aws-iam:standardizedServicePrincipals": true,
36+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
37+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
38+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
39+
"@aws-cdk/aws-route53-patternslibrary:useCertificate": true,
40+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
41+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
42+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
43+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
44+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
45+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
46+
"@aws-cdk/aws-redshift:columnId": true,
47+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
48+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
49+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
50+
"@aws-cdk/aws-kms:aliasNameRef": true,
51+
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
52+
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
53+
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
54+
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
55+
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
56+
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
57+
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
58+
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
59+
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true,
60+
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true,
61+
"@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true,
62+
"@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true,
63+
"@aws-cdk/aws-codepipeline:defaultPipelineTypeToV2": true,
64+
"@aws-cdk/aws-kms:reduceCrossAccountRegionPolicyScope": true,
65+
"@aws-cdk/aws-eks:nodegroupNameAttribute": true,
66+
"@aws-cdk/aws-ec2:ebsDefaultGp3Volume": true,
67+
"@aws-cdk/aws-ecs:removeDefaultDeploymentAlarm": true,
68+
"@aws-cdk/custom-resources:logApiResponseDataPropertyTrueDefault": false,
69+
"@aws-cdk/aws-s3:keepNotificationInImportedBucket": false,
70+
"@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
71+
"@aws-cdk/aws-appsync:appSyncGraphQLAPIScopeLambdaPermission": true,
72+
"@aws-cdk/aws-cloudfront:defaultSecurityPolicyTLSv1.2_2021": true,
73+
"@aws-cdk/aws-dynamodb:resourcePolicyPerReplica": true,
74+
"@aws-cdk/aws-dynamodb:retainTableReplica": true,
75+
"@aws-cdk/aws-ec2-alpha:useResourceIdForVpcV2Migration": false,
76+
"@aws-cdk/aws-ec2:bastionHostUseAmazonLinux2023ByDefault": true,
77+
"@aws-cdk/aws-ec2:ec2SumTImeoutEnabled": true,
78+
"@aws-cdk/aws-ec2:requirePrivateSubnetsForEgressOnlyInternetGateway": true,
79+
"@aws-cdk/aws-ecs-patterns:secGroupsDisablesImplicitOpenListener": true,
80+
"@aws-cdk/aws-ecs:disableEcsImdsBlocking": true,
81+
"@aws-cdk/aws-ecs:enableImdsBlockingDeprecatedFeature": false,
82+
"@aws-cdk/aws-ecs:reduceEc2FargateCloudWatchPermissions": true,
83+
"@aws-cdk/aws-elasticloadbalancingV2:albDualstackWithoutPublicIpv4SecurityGroupRulesDefault": true,
84+
"@aws-cdk/aws-events:requireEventBusPolicySid": true,
85+
"@aws-cdk/aws-iam:oidcRejectUnauthorizedConnections": true,
86+
"@aws-cdk/aws-kms:applyImportedAliasPermissionsToPrincipal": true,
87+
"@aws-cdk/aws-lambda-nodejs:sdkV3ExcludeSmithyPackages": true,
88+
"@aws-cdk/aws-lambda:createNewPoliciesWithAddToRolePolicy": false,
89+
"@aws-cdk/aws-lambda:recognizeVersionProps": true,
90+
"@aws-cdk/aws-lambda:useCdkManagedLogGroup": true,
91+
"@aws-cdk/aws-rds:lowercaseDbIdentifier": true,
92+
"@aws-cdk/aws-rds:setCorrectValueForDatabaseInstanceReadReplicaInstanceResourceId": true,
93+
"@aws-cdk/aws-route53-patters:useCertificate": true,
94+
"@aws-cdk/aws-route53-targets:userPoolDomainNameMethodWithoutCustomResource": true,
95+
"@aws-cdk/aws-s3:publicAccessBlockedByDefault": true,
96+
"@aws-cdk/aws-s3:setUniqueReplicationRoleName": true,
97+
"@aws-cdk/aws-signer:signingProfileNamePassedToCfn": true,
98+
"@aws-cdk/aws-stepfunctions-tasks:fixRunEcsTaskPolicy": true,
99+
"@aws-cdk/aws-stepfunctions-tasks:useNewS3UriParametersForBedrockInvokeModelTask": true,
100+
"@aws-cdk/aws-stepfunctions:useDistributedMapResultWriterV2": true,
101+
"@aws-cdk/cognito:logUserPoolClientSecretValue": false,
102+
"@aws-cdk/core:aspectPrioritiesMutating": true,
103+
"@aws-cdk/core:aspectStabilization": true,
104+
"@aws-cdk/core:cfnIncludeRejectComplexResourceUpdateCreatePolicyIntrinsics": true,
105+
"@aws-cdk/core:enableAdditionalMetadataCollection": true,
106+
"@aws-cdk/core:explicitStackTags": true,
107+
"@aws-cdk/core:newStyleStackSynthesis": true,
108+
"@aws-cdk/core:stackRelativeExports": true,
109+
"@aws-cdk/pipelines:reduceAssetRoleTrustScope": true,
110+
"@aws-cdk/pipelines:reduceCrossAccountActionRoleTrustScope": true,
111+
"@aws-cdk/pipelines:reduceStageRoleTrustScope": true,
112+
"@aws-cdk/s3-notifications:addS3TrustKeyPolicyForSnsSubscriptions": true
113+
}
114+
}

0 commit comments

Comments
 (0)