Skip to content

roachprod: Add support for ASG groups in AWS#161374

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
cpj2195:roachprod/add_ASG_support_AWS
Jan 29, 2026
Merged

roachprod: Add support for ASG groups in AWS#161374
craig[bot] merged 1 commit intocockroachdb:masterfrom
cpj2195:roachprod/add_ASG_support_AWS

Conversation

@cpj2195
Copy link
Contributor

@cpj2195 cpj2195 commented Jan 19, 2026

This PR adds AWS Auto Scaling Group (ASG) support for roachprod, mirroring the existing GCE Managed Instance Group (MIG) implementation. Managed clusters enable dynamic cluster resizing via roachprod grow and roachprod shrink
operations.

Summary

Add --aws-managed flag to create clusters using Launch Templates and Auto Scaling Groups
Implement Grow() and Shrink() operations for managed AWS clusters
Implement proper cleanup of ASG, launch templates, and load balancer resources on cluster destroy
AWS ASG doesn't support creating instances with custom names (unlike GCE MIG which has create-instance --instance ). To preserve roachprod's naming conventions (e.g., clustername-0001, clustername-0002), we:

Create the launch template (stores instance configuration)
Create the ASG with desiredCapacity=0 (management structure only)
Create instances using runInstance() with proper names
Attach instances to the ASG using attach-instances
This approach maintains compatibility with roachprod's cluster naming while gaining ASG benefits for grow/shrink operations.

For managed clusters, target groups are attached directly to the ASG using attach-load-balancer-target-groups. This provides:

Automatic instance registration: New instances launched or attached to the ASG are automatically registered with the target group
Automatic deregistration: Instances removed from the ASG are automatically deregistered
Simplified grow operations: No need for manual register-targets calls when adding nodes
Resource Naming

Launch Template: {cluster}-lt
Auto Scaling Group: {cluster}-{region}-asg
Epic: None
Fixes: #153071

@blathers-crl
Copy link

blathers-crl bot commented Jan 19, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cpj2195 cpj2195 marked this pull request as ready for review January 19, 2026 07:54
@cpj2195 cpj2195 requested a review from a team as a code owner January 19, 2026 07:54
@cpj2195 cpj2195 requested review from golgeek and nameisbhaskar and removed request for a team January 19, 2026 07:54
@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Jan 19, 2026
@cpj2195 cpj2195 force-pushed the roachprod/add_ASG_support_AWS branch from c1f1601 to 87076f4 Compare January 19, 2026 12:39
@cockroachdb cockroachdb deleted a comment from github-actions bot Jan 19, 2026
@cpj2195 cpj2195 force-pushed the roachprod/add_ASG_support_AWS branch 3 times, most recently from 61a5418 to 822179a Compare January 20, 2026 15:38
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@cpj2195 cpj2195 self-assigned this Jan 20, 2026
@cpj2195 cpj2195 marked this pull request as draft January 20, 2026 15:47
Copy link
Contributor

@nameisbhaskar nameisbhaskar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. As discussed, lets make it to smaller modular functions.

@cpj2195 cpj2195 force-pushed the roachprod/add_ASG_support_AWS branch from 822179a to e0cf40c Compare January 22, 2026 09:02
@cpj2195 cpj2195 marked this pull request as ready for review January 22, 2026 13:08
@cpj2195 cpj2195 requested a review from nameisbhaskar January 22, 2026 13:09
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

Copy link
Contributor

@nameisbhaskar nameisbhaskar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please do test the flows, including failure flows. Thanks!

Copy link
Contributor

@golgeek golgeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments here and there, but great job!

I wonder how much more time it takes to spin up a managed cluster vs a non-managed cluster?
Idea behind this question is around making grow and shrink something that could be an afterthought as in you provisioned a non-managed cluster, but would like to grow it later on.
I assume that with the creation of the launch template, ASG and attaching the instances, it is much slower than with a regular cluster, and we shouldn't go that route?

That being said, with the way you implemented managed clusters, we could probably be able to convert a regular cluster to a managed one later on?

In the same kind of general direction, what would be your thoughts about using Unmanaged Instance Groups instead of MIGs in GCP?

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Did a quick pass, nothing jumped out at me, that @golgeek hasn't already mentioned.

@cpj2195 cpj2195 force-pushed the roachprod/add_ASG_support_AWS branch from e0cf40c to 211d697 Compare January 29, 2026 08:00
@cpj2195 cpj2195 force-pushed the roachprod/add_ASG_support_AWS branch from 211d697 to 94ce8af Compare January 29, 2026 14:23
@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@cpj2195
Copy link
Contributor Author

cpj2195 commented Jan 29, 2026

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 29, 2026

@craig craig bot merged commit 73b5aa2 into cockroachdb:master Jan 29, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. target-release-26.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachprod: add support for AWS auto scaling groups

5 participants