You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: mark orphan runners before removing them (#4001)
## Problem
Orphan runners are deleted right after detection. This can be clash with
self termination (ephemeral) runners. Typically the runner is waiting a
few sseconds before exectuing a self termination.
## Solution
In this solution we first mark a runner orphan, but not delete the
runner. In a next cycle of the scale down function. First all orphan
runners are terminated.
## Improvements
- Improved logging, only logging the main flow once at info. All other
logs moved to debug
- Scale-down write permissions limitted to the envirnoment
## Todo
- [x] Update docs
- [x] Test default runner deployment
- [x] Test mult runner deployment
## Example of log
- Two instances
- One made orphan by removing the runner from GitHub
- In the log
- Idle runner got removed
- Orphan get marked as orphan
- Next cycle orphan terminated.
<img width="1283" alt="image"
src="https://github.com/user-attachments/assets/c7cb5372-f32c-4fc4-81bc-8aacec2a483f">
Copy file name to clipboardExpand all lines: docs/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ The "Scale Up Runner" Lambda actively monitors the SQS queue, processing incomin
46
46
47
47
The Lambda first requests a JIT configuration or registration token from GitHub, which is needed later by the runner to register itself. This avoids the case that the EC2 instance, which later in the process will install the agent, needs administration permissions to register the runner. Next, the EC2 spot instance is created via the launch template. The launch template defines the specifications of the required instance and contains a [`user_data`](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) script. This script will install the required software and configure it. The configuration for the runner is shared via EC2 tags and the parameter store (SSM), from which the user data script will fetch it and delete it once it has been retrieved. Once the user data script is finished, the action runner should be online, and the workflow will start in seconds.
48
48
49
-
The current method for scaling down runners employs a straightforward approach: at predefined intervals, the Lambda conducts a thorough examination of each runner (instance) to assess its activity. If a runner is found to be idle, it is deregistered from GitHub, and the associated AWS instance is terminated. For ephemeral runners the the instance is terminated immediately after the workflow is finished. To avoid orphaned runners the scale down lambda is active in this cae as well.
49
+
The current method for scaling down runners employs a straightforward approach: at predefined intervals, the Lambda conducts a thorough examination of each runner (instance) to assess its activity. If a runner is found to be idle, it is deregistered from GitHub, and the associated AWS instance is terminated. For ephemeral runners the the instance is terminated immediately after the workflow is finished. Instances not registered in GitHub as a runner after a minimal boot time will be marked orphan and removed in a next cycle. To avoid orphaned runners the scale down lambda is active in this cae as well.
50
50
51
51
### Pool
52
52
@@ -68,7 +68,7 @@ The AMI cleaner is a lambda that will clean up AMIs that are older than a config
68
68
69
69
> This feature is Beta, changes will not trigger a major release as long in beta.
70
70
71
-
The Instance Termination Watcher is creating log and optional metrics for termination of instances. Currently only spot termination warnings are watched. See [configuration](configuration/) for more details.
71
+
The Instance Termination Watcher is creating log and optional metrics for termination of instances. Currently only spot termination warnings are watched. See [configuration](configuration/) for more details.
0 commit comments