roachprod: Add support for ASG groups in AWS#161374
roachprod: Add support for ASG groups in AWS#161374craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
c1f1601 to
87076f4
Compare
61a5418 to
822179a
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
nameisbhaskar
left a comment
There was a problem hiding this comment.
Thanks for the change. As discussed, lets make it to smaller modular functions.
822179a to
e0cf40c
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
nameisbhaskar
left a comment
There was a problem hiding this comment.
LGTM! Please do test the flows, including failure flows. Thanks!
golgeek
left a comment
There was a problem hiding this comment.
Left a few comments here and there, but great job!
I wonder how much more time it takes to spin up a managed cluster vs a non-managed cluster?
Idea behind this question is around making grow and shrink something that could be an afterthought as in you provisioned a non-managed cluster, but would like to grow it later on.
I assume that with the creation of the launch template, ASG and attaching the instances, it is much slower than with a regular cluster, and we shouldn't go that route?
That being said, with the way you implemented managed clusters, we could probably be able to convert a regular cluster to a managed one later on?
In the same kind of general direction, what would be your thoughts about using Unmanaged Instance Groups instead of MIGs in GCP?
There was a problem hiding this comment.
Good work! Did a quick pass, nothing jumped out at me, that @golgeek hasn't already mentioned.
e0cf40c to
211d697
Compare
211d697 to
94ce8af
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
|
bors r+ |
|
Build succeeded: |
This PR adds AWS Auto Scaling Group (ASG) support for roachprod, mirroring the existing GCE Managed Instance Group (MIG) implementation. Managed clusters enable dynamic cluster resizing via roachprod grow and roachprod shrink
operations.
Summary
Add --aws-managed flag to create clusters using Launch Templates and Auto Scaling Groups
Implement Grow() and Shrink() operations for managed AWS clusters
Implement proper cleanup of ASG, launch templates, and load balancer resources on cluster destroy
AWS ASG doesn't support creating instances with custom names (unlike GCE MIG which has create-instance --instance ). To preserve roachprod's naming conventions (e.g., clustername-0001, clustername-0002), we:
Create the launch template (stores instance configuration)
Create the ASG with desiredCapacity=0 (management structure only)
Create instances using runInstance() with proper names
Attach instances to the ASG using attach-instances
This approach maintains compatibility with roachprod's cluster naming while gaining ASG benefits for grow/shrink operations.
For managed clusters, target groups are attached directly to the ASG using attach-load-balancer-target-groups. This provides:
Automatic instance registration: New instances launched or attached to the ASG are automatically registered with the target group
Automatic deregistration: Instances removed from the ASG are automatically deregistered
Simplified grow operations: No need for manual register-targets calls when adding nodes
Resource Naming
Launch Template: {cluster}-lt
Auto Scaling Group: {cluster}-{region}-asg
Epic: None
Fixes: #153071