Draft
Conversation
daxelrod-rh
commented
Apr 2, 2026
| done | ||
|
|
||
| echo "" | ||
| echo "Cleanup complete: deleted=$DELETED_COUNT skipped=$SKIPPED_COUNT" |
Collaborator
Author
There was a problem hiding this comment.
Not a blocker: No reason for the script to be duplicated in the repo. Either have just the YAML or just this and template it into the YAML.
| - name: HOME | ||
| value: /tmp | ||
| command: | ||
| - /bin/bash |
Collaborator
Author
There was a problem hiding this comment.
Not a blocker: Would this be cleaner in Python? aws-cli containers should have Python in them.
| # 2. Get all daemonset-managed peers (exclude already deleted) | ||
| echo "Querying Route Server for managed BGP peers..." | ||
| ALL_PEERS=$(aws ec2 describe-route-server-peers \ | ||
| --query "RouteServerPeers[?Tags[?Key=='managed-by' && Value=='daemonset'] && State!='deleted'].{Id:RouteServerPeerId,Ip:PeerAddress,State:State,Tags:Tags}" \ |
Collaborator
Author
There was a problem hiding this comment.
Not a blocker: Without a specific cluster tag, this has the limitation that a particular Route Server can only peer with one ROSA cluster.
| AGE_MINUTES=$(( (NOW_EPOCH - CREATED_EPOCH) / 60 )) | ||
| else | ||
| # No created-at tag (legacy peer) - treat as very old to allow deletion | ||
| AGE_MINUTES=999999 |
Collaborator
Author
There was a problem hiding this comment.
Not a blocker: might be safer to not clean up peers without created-at tag.
556c5cd to
242e7a2
Compare
Implements a Kubernetes CronJob that periodically identifies and deletes stale Route Server BGP peers from deleted or replaced ROSA worker nodes. Changes: - Add cleanup-cronjob.yaml: CronJob running every 15 minutes to clean up stale peers by comparing EC2 instance IPs with Route Server peer IPs - Add cleanup-script.sh: Standalone version for manual testing - Update eni-srcdst-iam.tf: Add ec2:DeleteRouteServerPeer and ec2:DescribeInstances permissions - Update deploy.sh: Deploy CronJob with environment variable substitution - Update README.md: Document cleanup functionality, troubleshooting, and manual testing procedures Safety mechanisms: - Only processes peers tagged managed-by=daemonset - Queries EC2 for active router node IPs (source of truth) - Dry-run mode available for testing (DRY_RUN=true) - Concurrency control prevents overlapping runs - Skips peers in 'deleted' state Note: AWS Route Server Peer objects lack CreationTime field, so the planned 30-minute age threshold cannot be reliably implemented. For now, age calculation defaults to 0 minutes. Future enhancement: add created-at timestamp tag when creating peers. Tested: Successfully deleted 6 stale peers from old non-router nodes while preserving 6 active peers on current router nodes (10.0.1.49, 10.0.2.33, 10.0.3.56). Assisted-by: Claude Code (Claude Sonnet 4.5)
Implements proper age-based safety threshold for BGP peer cleanup by adding a created-at timestamp tag when peers are created. Changes: - Update daemonset.yaml: Add created-at tag with ISO8601 timestamp when creating BGP peers - Update cleanup-script.sh: Read created-at from tag instead of non-existent CreationTime field; treat legacy peers without tag as very old (eligible for cleanup if IP is stale) - Update cleanup-cronjob.yaml: Same age calculation logic as script How it works: - DaemonSet adds created-at tag: "2026-04-02T13:06:03Z" - Cleanup reads tag and calculates age in minutes - Only deletes stale peers older than MIN_AGE_MINUTES (default: 30) - Legacy peers without tag get age=999999 (always eligible if stale) Tested: - Created 3 new peers with created-at tags - Cleanup script correctly reads timestamps and calculates ages - CronJob runs successfully with 30-minute threshold - Legacy peers (without tag) handled gracefully This addresses the limitation where AWS Route Server Peer objects lack a CreationTime field, enabling the 30-minute safety threshold to prevent race conditions during peer creation. Assisted-by: Claude Code (Claude Sonnet 4.5)
Use the OpenShift mechanism for assuming pod identity, rather than the incorrect EKS one. This was again somehow uncommitted before.
a09678c to
49e5537
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please merge #8 before this