Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ Join our discord community via [this invite link](https://discord.gg/bxgXW8jJGh)
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |
| <a name="input_instance_target_capacity_type"></a> [instance\_target\_capacity\_type](#input\_instance\_target\_capacity\_type) | Default lifecycle used for runner instances, can be either `spot` or `on-demand`. | `string` | `"spot"` | no |
| <a name="input_instance_termination_watcher"></a> [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.<br/><br/>`enable`: Enable or disable the spot termination watcher.<br/>'features': Enable or disable features of the termination watcher.<br/>`memory_size`: Memory size linit in MB of the lambda.<br/>`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.<br/>`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.<br/>`timeout`: Time out of the lambda in seconds.<br/>`zip`: File location of the lambda zip file. | <pre>object({<br/> enable = optional(bool, false)<br/> features = optional(object({<br/> enable_spot_termination_handler = optional(bool, true)<br/> enable_spot_termination_notification_watcher = optional(bool, true)<br/> }), {})<br/> memory_size = optional(number, null)<br/> s3_key = optional(string, null)<br/> s3_object_version = optional(string, null)<br/> timeout = optional(number, null)<br/> zip = optional(string, null)<br/> })</pre> | `{}` | no |
| <a name="input_instance_termination_watcher"></a> [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.<br/><br/>`enable`: Enable or disable the spot termination watcher.<br/>'features': Enable or disable features of the termination watcher.<br/>`memory_size`: Memory size limit in MB of the lambda.<br/>`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.<br/>`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.<br/>`timeout`: Time out of the lambda in seconds.<br/>`zip`: File location of the lambda zip file. | <pre>object({<br/> enable = optional(bool, false)<br/> features = optional(object({<br/> enable_spot_termination_handler = optional(bool, true)<br/> enable_spot_termination_notification_watcher = optional(bool, true)<br/> }), {})<br/> memory_size = optional(number, null)<br/> s3_key = optional(string, null)<br/> s3_object_version = optional(string, null)<br/> timeout = optional(number, null)<br/> zip = optional(string, null)<br/> })</pre> | `{}` | no |
| <a name="input_instance_types"></a> [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win). | `list(string)` | <pre>[<br/> "m5.large",<br/> "c5.large"<br/>]</pre> | no |
| <a name="input_job_queue_retention_in_seconds"></a> [job\_queue\_retention\_in\_seconds](#input\_job\_queue\_retention\_in\_seconds) | The number of seconds the job is held in the queue before it is purged. | `number` | `86400` | no |
| <a name="input_job_retry"></a> [job\_retry](#input\_job\_retry) | Experimental! Can be removed / changed without trigger a major release.Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the instances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the rate limit of the GitHub app.<br/><br/>`enable`: Enable or disable the job retry feature.<br/>`delay_in_seconds`: The delay in seconds before the job retry check lambda will check the job status.<br/>`delay_backoff`: The backoff factor for the delay.<br/>`lambda_memory_size`: Memory size limit in MB for the job retry check lambda.<br/>`lambda_timeout`: Time out of the job retry check lambda in seconds.<br/>`max_attempts`: The maximum number of attempts to retry the job. | <pre>object({<br/> enable = optional(bool, false)<br/> delay_in_seconds = optional(number, 300)<br/> delay_backoff = optional(number, 2)<br/> lambda_memory_size = optional(number, 256)<br/> lambda_timeout = optional(number, 30)<br/> max_attempts = optional(number, 1)<br/> })</pre> | `{}` | no |
Expand Down
33 changes: 16 additions & 17 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,8 @@ This module also allows you to run agents from a prebuilt AMI to gain faster sta

## AMI Configuration

> **Note:** By default, a runner AMI update requires a re-apply of the terraform configuration, as the runner AMI ID is looked up by a terraform data source. To avoid this, you can use or `ami.id_ssm_parameter_arn` to have the scale-up lambda dynamically lookup the runner AMI ID from an SSM parameter at instance launch time. Said SSM parameter is managed outside of this module (e.g. by a runner AMI build workflow).

By default, the module will automatically select appropriate AMI images:

- For Linux x64: Amazon Linux 2023 x86_64
Expand Down Expand Up @@ -203,8 +205,7 @@ ami = {
}
```

> **Note:** The old way of configuring AMIs using individual variables (`ami_filter`, `ami_owners`, `ami_kms_key_arn`) is deprecated and will be removed in a future version. It is recommended to migrate to the new consolidated `ami` object.

> **Note:** The old way of configuring AMIs using individual variables (`ami_filter`, `ami_owners`, `ami_kms_key_arn`, `ami_id_ssm_parameter_arn`, `ami_id_ssm_parameter_name`) is deprecated and will be removed in a future version. It is recommended to migrate to the new consolidated `ami` object. Support for `ami_id_ssm_parameter_name` will be dropped, please specify an arn via `ami.id_ssm_parameter_arn` instead.

## Logging

Expand Down Expand Up @@ -259,11 +260,11 @@ This feature has been disabled by default.

### Multiple runner module in your AWS account

The watcher will act on all spot termination notificatins and log all onses relevant to the runner module. Therefor we suggest to only deploy the watcher once. You can either deploy the watcher by enabling in one of your deployments or deploy the watcher as a stand alone module.
The watcher will act on all spot termination notifications and log the ones relevant to the runner module. Therefor we suggest to only deploy the watcher once. You can either deploy the watcher by enabling in one of your deployments or deploy the watcher as a stand alone module.

## Metrics

The module supports metrics (experimental feature) to monitor the system. The metrics are disabled by default. To enable the metrics set `metrics.enable = true`. If set to true, all module managed metrics are used, you can configure the one by one via the `metrics` object. The metrics are created in the namespace `GitHub Runners`.
The module supports metrics (experimental feature) to monitor the system. The metrics are disabled by default. To enable the metrics set `metrics.enable = true`. If set to true, all module managed metrics are used, you can configure them one by one via the `metrics` object. The metrics are created in the namespace `GitHub Runners`.

### Supported metrics

Expand All @@ -287,28 +288,28 @@ In case the setup does not work as intended, trace the events through this seque

This feature is in early stage and therefore disabled by default. To enable the watcher, set `instance_termination_watcher.enable = true`.

The termination watcher is currently watching for spot terminations. The module is only taken events into account for instances tagged with `ghr:environment` by default when deployment the module as part of one of the main modules (root or multi-runner). The module can also be deployed stand-alone, in this case, the tag filter needs to be tuned.
The termination watcher is currently watching for spot terminations. The module only takes events into account for instances tagged with `ghr:environment` by default, when the module is deployed as part of one of the main modules (root or multi-runner). The module can also be deployed stand-alone, in this case, the tag filter needs to be tuned.

### Termination notification

The watcher is listening for spot termination warnings and create a log message and optionally a metric. The watcher is disabled by default. The feature is enabled once the watcher is enabled, the feature can be disabled explicit by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`.
The watcher is listening for spot termination warnings and creates a log message and optionally a metric. The watcher is disabled by default. The feature is enabled once the watcher is enabled. It can be disabled explicitly by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`.

- Logs: The module will log all termination notifications. For each warning it will look up instance details and log the environment, instance type and time the instance is running. As well some other details.
- Metrics: Metrics are disabled by default, this to avoid costs. Once enabled a metric will be created for each warning with at least dimensions for the environment and instance type. THe metric name space can be configured via the variables. The metric name used is `SpotInterruptionWarning`.
- Logs: The module will log all termination notifications. For each warning it will look up instance details and log the environment, instance type and time the instance is running, as well as some other details.
- Metrics: Metrics are disabled by default, in order to avoid costs. Once enabled a metric will be created for each warning with at least dimensions for the environment and instance type. The metric name space can be configured via the variables. The metric name used is `SpotInterruptionWarning`.

### Termination handler

!!! warning
This feature will only work once the CloudTrail is enabled.
This feature will only work once CloudTrail is enabled.

The termination handler is listening for spot terminations by capture the `BidEvictedEvent` via CloudTrail. The handler will log and optionally create a metric for each termination. The intend is to enhance the logic to inform the user about the termination via the GitHub Job or Workflow run. The feature is disabled by default. The feature is enabled once the watcher is enabled, the feature can be disabled explicit by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`.
The termination handler is listening for spot terminations by capturing the `BidEvictedEvent` via CloudTrail. The handler will log and optionally create a metric for each termination. The intent is to enhance the logic to inform the user about the termination via the GitHub Job or Workflow run. The feature is disabled by default. The feature is enabled once the watcher is enabled. It can be disabled explicitly by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`.

- Logs: The module will log all termination notifications. For each warning it will look up instance details and log the environment, instance type and time the instance is running. As well some other details.
- Metrics: Metrics are disabled by default, this to avoid costs. Once enabled a metric will be created for each termination with at least dimensions for the environment and instance type. THe metric name space can be configured via the variables. The metric name used is `SpotTermination`.
- Logs: The module will log all termination notifications. For each warning it will look up instance details and log the environment, instance type and time the instance is running, as well as some other details.
- Metrics: Metrics are disabled by default, in order to avoid costs. Once enabled a metric will be created for each termination with at least dimensions for the environment and instance type. THe metric name space can be configured via the variables. The metric name used is `SpotTermination`.

### Log example (both warnings and terminations)

Below an example of the the log messages created.
Below is an example of the log messages created.

```
{
Expand All @@ -331,7 +332,7 @@ Below an example of the the log messages created.

### EventBridge

This module can be deployed in using the mode `EventBridge`. The `EventBridge` mode will publish an event to a eventbus. Within the eventbus, there is a target rule set, sending events to the dispatch lambda. The `EventBridge` mode is enabled by default.
This module can be deployed in `EventBridge` mode. The `EventBridge` mode will publish an event to an eventbus. Within the eventbus, there is a target rule set, sending events to the dispatch lambda. The `EventBridge` mode is enabled by default.

Example to extend the EventBridge:

Expand Down Expand Up @@ -389,7 +390,7 @@ resource "aws_iam_role" "event_rule_role" {

data aws_iam_policy_document firehose_stream {
statement {
INSER_YOUR_POIICY_HERE_TO_ACCESS_THE_TARGET
INSERT_YOUR_POLICY_HERE_TO_ACCESS_THE_TARGET
}
}

Expand All @@ -399,5 +400,3 @@ resource "aws_iam_role_policy" "event_rule_firehose_role" {
policy = data.aws_iam_policy_document.firehose_stream.json
}
```

NOTE: By default, a runner AMI update requires a re-apply of this terraform config (the runner AMI ID is looked up by a terraform data source). To avoid this, you can use `ami_id_ssm_parameter_name` to have the scale-up lambda dynamically lookup the runner AMI ID from an SSM parameter at instance launch time. Said SSM parameter is managed outside of this module (e.g. by a runner AMI build workflow).
2 changes: 1 addition & 1 deletion modules/ami-housekeeper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ No modules.
| <a name="input_aws_partition"></a> [aws\_partition](#input\_aws\_partition) | (optional) partition for the base arn if not 'aws' | `string` | `"aws"` | no |
| <a name="input_cleanup_config"></a> [cleanup\_config](#input\_cleanup\_config) | Configuration for AMI cleanup.<br/><br/> `amiFilters` - Filters to use when searching for AMIs to cleanup. Default filter for images owned by the account and that are available.<br/> `dryRun` - If true, no AMIs will be deregistered. Default false.<br/> `launchTemplateNames` - Launch template names to use when searching for AMIs to cleanup. Default no launch templates.<br/> `maxItems` - The maximum number of AMIs that will be queried for cleanup. Default no maximum.<br/> `minimumDaysOld` - Minimum number of days old an AMI must be to be considered for cleanup. Default 30.<br/> `ssmParameterNames` - SSM parameter names to use when searching for AMIs to cleanup. This parameter should be set when using SSM to configure the AMI to use. Default no SSM parameters. | <pre>object({<br/> amiFilters = optional(list(object({<br/> Name = string<br/> Values = list(string)<br/> })),<br/> [{<br/> Name : "state",<br/> Values : ["available"],<br/> },<br/> {<br/> Name : "image-type",<br/> Values : ["machine"],<br/> }]<br/> )<br/> dryRun = optional(bool, false)<br/> launchTemplateNames = optional(list(string))<br/> maxItems = optional(number)<br/> minimumDaysOld = optional(number, 30)<br/> ssmParameterNames = optional(list(string))<br/> })</pre> | `{}` | no |
| <a name="input_lambda_architecture"></a> [lambda\_architecture](#input\_lambda\_architecture) | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions. | `string` | `"arm64"` | no |
| <a name="input_lambda_memory_size"></a> [lambda\_memory\_size](#input\_lambda\_memory\_size) | Memory size linit in MB of the lambda. | `number` | `256` | no |
| <a name="input_lambda_memory_size"></a> [lambda\_memory\_size](#input\_lambda\_memory\_size) | Memory size limit in MB of the lambda. | `number` | `256` | no |
| <a name="input_lambda_principals"></a> [lambda\_principals](#input\_lambda\_principals) | (Optional) add extra principals to the role created for execution of the lambda, e.g. for local testing. | <pre>list(object({<br/> type = string<br/> identifiers = list(string)<br/> }))</pre> | `[]` | no |
| <a name="input_lambda_runtime"></a> [lambda\_runtime](#input\_lambda\_runtime) | AWS Lambda runtime. | `string` | `"nodejs22.x"` | no |
| <a name="input_lambda_s3_bucket"></a> [lambda\_s3\_bucket](#input\_lambda\_s3\_bucket) | S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. | `string` | `null` | no |
Expand Down
2 changes: 1 addition & 1 deletion modules/ami-housekeeper/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ variable "lambda_timeout" {
}

variable "lambda_memory_size" {
description = "Memory size linit in MB of the lambda."
description = "Memory size limit in MB of the lambda."
type = number
default = 256
}
Expand Down
Loading
Loading