-
Notifications
You must be signed in to change notification settings - Fork 5
feat: Add launch-ec2-runner-with-fallback action
#4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
292166b to
bcafdf8
Compare
bcafdf8 to
a5896d5
Compare
a5896d5 to
f41dfe8
Compare
f41dfe8 to
d96da09
Compare
d96da09 to
4cfb809
Compare
4cfb809 to
f9f4e55
Compare
f9f4e55 to
28d5400
Compare
28d5400 to
8908abd
Compare
8908abd to
692cc80
Compare
692cc80 to
86dac70
Compare
f2bd398 to
8f2471a
Compare
a01eabd to
c009879
Compare
danmcp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
actions/launch-ec2-runner-with-fallback/launch-ec2-runner-with-fallback.md
Outdated
Show resolved
Hide resolved
|
|
||
| | Name | Description | Example value | | ||
| | --- | --- | --- | | ||
| | `try_spot_instance_first` | If set to "true", then the EC2 instance will be launched as a spot instance rather than a dedicated EC2 instance. If a spot instance cannot be launched in any of the desired availability zones (e.g., due to insufficient capacity on AWS), then a dedicated instance will be tried next. (Note: This option is not always desirable for certain instance types.) | `true` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a request to change this now, but some cases might want the opposite of use spot instances as a fallback. This can be helpful when cost isn't an issue but availability is. So on demand is preferred, but if spot instances are all that's available, they are better than nothing.
actions/launch-ec2-runner-with-fallback/launch-ec2-runner-with-fallback.md
Show resolved
Hide resolved
c009879 to
950ed3e
Compare
ktdreyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some jq notes that do not block merging this:
You don't have to cat a file into jq, because jq can read files directly.
jq has a "-r" flag to print strings to stdout without quoting, so you don't need to pipe to tr -d '"' at the end.
actions/launch-ec2-runner-with-fallback/launch-ec2-runner-with-fallback.md
Outdated
Show resolved
Hide resolved
ktdreyer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Courtney was going to update those changes I referenced, so I'll rescind my approval so that Mergify does not auto-merge.
This action is used to launch an EC2 instance (either as a spot instance or a dedicated instance) in a desired availability zone. If no availability, then backup availability zones will be tried until one is successful. Signed-off-by: Courtney Pacheco <[email protected]>
950ed3e to
38de5b6
Compare
Integration Testing Evidence
Link to pull request(s):
Links to passing CI job(s):
try_spot_instance_first=false): https://github.com/instructlab/instructlab/actions/runs/13664883248Checklist:
Description of this Change
This action is used to launch an EC2 instance (either as a spot instance or a dedicated instance) in a desired availability zone. If no availability, then backup availability zones will be tried until one is successful.
I've tested this action in the
instructlab/instructlabrepo by creating a draft PR there that modifies the Large E2E job workflow to reference this proposed action. (See link above.)Design Considerations
I thought about creating a "single" CI action that is capable of both launching and stopping instances. After all, the base
machulav/ec2-runnerGitHub action that I leverage under the hood can be used to launch and terminate instances. In the case of that particular GitHub action, users can simply set themodeparameter to eitherstartorstopto indicate if they want to launch or terminate an instance.Ultimately, I decided against implementing a
modein this PR because I felt that going that route would add complexity to the existing code in this PR, increase the testing surface required, etc.. I do feel that adding amodewould certainly be a useful enhancement for this CI action, but I see that as an enhancement and not a necessity for the initial work. I can certainly file a follow-up issue to add this enhancement, though.Integration Testing Results
Dedicated Instance Testing
To test the dedicated instance logic, I set
try_spot_instance_first = false. This variable tells the CI action to not bother trying to launch as a spot instance. (This is useful if you can't afford to have your instance reclaimed by AWS.Below are screenshots from the output of a successful run where the CI job launches a dedicated EC2 instance. You can see the failures the CI action encounters when trying to launch in an availability zone that lacks capacity for the desired instance type, but that those failures don't abort the workflow

I then confirmed that the instance was actually cleaned up in AWS. (i.e., we don't have any dangling resources)
Spot Instance Testing
To test the spot instance logic, I set
try_spot_instance_first = true, which tells the action to try launching an EC2 spot instance before attempting to launch it as a dedicated instance.Below is a screenshot of the test run. The spot instance was reclaimed by AWS because AWS needed the capacity back, hence the failures:

See the

!symbols next to the E2E steps that indicate the build was aborted. (I forgot to screenshot evidence on the AWS side).For clarity, the

Stop EC2 Runnerstep failed because the spot instance was deleted (reclaimed) in AWS, and therefore, there was nothing to actually "stop":