-
Notifications
You must be signed in to change notification settings - Fork 10
Add CloudWatch SEARCH() expression support for dynamic metric alarms #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
e2110ca
ba55420
e270a44
efb4049
f274be8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| # Search Expression Alarms | ||
|
|
||
| Search expression alarms use CloudWatch [SEARCH()](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/search-expression-syntax.html) to dynamically match metrics instead of targeting a fixed set of dimensions. This is useful when the physical resource ID changes between deployments, such as Auto Scaling Groups that use replacement update policies in CloudFormation. | ||
|
|
||
| ## The Problem | ||
|
|
||
| When a CloudFormation stack replaces an ASG on deployment, the physical ASG name changes (e.g. `my-app-AsgGroup-abc123` becomes `my-app-AsgGroup-xyz789`). Standard alarms use fixed dimensions that reference the exact ASG name, so they break after every deployment until Guardian is recompiled and redeployed with the new name. | ||
|
|
||
| ## How It Works | ||
|
|
||
| Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits the CloudFormation `Metrics` property (a list of `MetricDataQuery` objects) with: | ||
|
|
||
| 1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name | ||
| 2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation | ||
|
|
||
| ## Configuration | ||
|
|
||
| Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with the alarm `Metrics` property. | ||
|
|
||
| ### Properties | ||
|
|
||
| | Property | Required | Default | Description | | ||
| | --- | --- | --- | --- | | ||
| | `SearchExpression` | Yes | - | A CloudWatch SEARCH() expression string. Supports `${Resource::...}` [variables](variables.md). | | ||
| | `SearchAggregation` | No | `MAX` | Aggregation function applied to the search results. Valid values: `MAX`, `MIN`, `AVG`, `SUM`. | | ||
|
|
||
|
Comment on lines
+22
to
+26
|
||
| ### Overriding Default Alarms | ||
|
|
||
| You can convert existing default alarms to use search expressions by overriding them in the template: | ||
|
|
||
| ```yaml | ||
| Resources: | ||
| AutoScalingGroup: | ||
| - Id: my-app-AsgGroup | ||
|
|
||
| Templates: | ||
| AutoScalingGroup: | ||
| CPUUtilizationHighBase: | ||
| SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"CPUUtilization\" \"${Resource::Id}\"', 'Minimum', 60)" | ||
| SearchAggregation: MAX | ||
| StatusCheckFailed: | ||
| SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"StatusCheckFailed\" \"${Resource::Id}\"', 'Maximum', 60)" | ||
| SearchAggregation: MAX | ||
| ``` | ||
|
|
||
| In this example the `Id` is the stable prefix of the ASG name. The double quotes around `\"${Resource::Id}\"` inside the SEARCH expression perform an exact substring match, so `my-app-AsgGroup-abc123` and `my-app-AsgGroup-xyz789` both match but unrelated ASGs do not. | ||
|
|
||
| ### Creating New Alarms | ||
|
|
||
| You can also create new search expression alarms that don't override any defaults: | ||
|
|
||
| ```yaml | ||
| Templates: | ||
| AutoScalingGroup: | ||
| NetworkOutHigh: | ||
| SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"NetworkOut\" \"${Resource::Id}\"', 'Average', 300)" | ||
| SearchAggregation: SUM | ||
| Threshold: 1000000000 | ||
| ComparisonOperator: GreaterThanThreshold | ||
| EvaluationPeriods: 3 | ||
| AlarmAction: Warning | ||
| ``` | ||
|
|
||
| ### Using With Other Resource Groups | ||
|
|
||
| Search expressions work with any resource group, not just AutoScalingGroup: | ||
|
|
||
| ```yaml | ||
| Resources: | ||
| ECSService: | ||
| - Id: my-service | ||
| Cluster: my-cluster | ||
|
|
||
| Templates: | ||
| ECSService: | ||
| CPUUtilizationHigh: | ||
| SearchExpression: "SEARCH('{AWS/ECS,ServiceName,ClusterName} MetricName=\"CPUUtilization\" \"${Resource::Id}\"', 'Average', 60)" | ||
| SearchAggregation: MAX | ||
| Threshold: 90 | ||
| EvaluationPeriods: 5 | ||
| ``` | ||
|
|
||
| ## Variables | ||
|
|
||
| `${Resource::...}` variables are interpolated inside search expressions the same way as in [dimension variables](variables.md). You can reference any key from the resource definition: | ||
|
|
||
| ```yaml | ||
| Resources: | ||
| AutoScalingGroup: | ||
| - Id: my-app-AsgGroup | ||
| Environment: production | ||
|
|
||
| Templates: | ||
| AutoScalingGroup: | ||
| CPUUtilizationHighBase: | ||
| SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"CPUUtilization\" \"${Resource::Id}\"', 'Minimum', 60)" | ||
| SearchAggregation: MAX | ||
| ``` | ||
|
|
||
| ## CloudWatch SEARCH() Syntax Quick Reference | ||
|
|
||
| The general format is: | ||
|
|
||
| ``` | ||
| SEARCH('{Namespace,DimensionName} SearchTerm', 'Statistic', Period) | ||
| ``` | ||
|
|
||
| - **Partial match**: `my-app` matches any metric with a token `my` or `app` in any dimension value | ||
| - **Exact match**: `"my-app-AsgGroup"` matches only the exact substring `my-app-AsgGroup` | ||
| - **Boolean operators**: `AND`, `OR`, `NOT` can be used to combine terms | ||
| - **Property designators**: `MetricName="CPUUtilization"` restricts matching to the metric name | ||
|
|
||
| See the [CloudWatch search expression syntax documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/search-expression-syntax.html) for full details. | ||
|
|
||
| ## Limitations | ||
|
|
||
| - **2-week lookback**: SEARCH() only finds metrics that have reported data within the last 2 weeks | ||
| - **100 metric limit**: A single SEARCH expression can match up to 100 time series | ||
| - **1024 character limit**: The search expression query string cannot exceed 1024 characters | ||
| - **Aggregation required**: Since SEARCH can return multiple time series, the aggregation function reduces them to a single series for threshold comparison | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -190,9 +190,24 @@ def validate_resources() | |
| @resources.each do |resource| | ||
| case resource.type | ||
| when 'Alarm' | ||
| %w(metric_name namespace).each do |property| | ||
| if resource.send(property).nil? | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has nil value for property #{property.to_camelcase}. This could be due to incorrect spelling of a default alarm name or missing property #{property.to_camelcase} on a new alarm." | ||
| if resource.search_expression | ||
| if !resource.search_expression.is_a?(String) || resource.search_expression.strip.empty? | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has an invalid SearchExpression. Must be a non-empty string." | ||
| end | ||
| if resource.search_aggregation | ||
| valid_aggregations = %w(MAX MIN AVG SUM) | ||
| normalized = resource.search_aggregation.to_s.upcase | ||
| if valid_aggregations.include?(normalized) | ||
| resource.search_aggregation = normalized | ||
| else | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has invalid SearchAggregation '#{resource.search_aggregation}'. Must be one of: #{valid_aggregations.join(', ')}." | ||
| end | ||
|
Comment on lines
+193
to
+204
|
||
| end | ||
|
Comment on lines
+193
to
+205
|
||
| else | ||
| %w(metric_name namespace).each do |property| | ||
| if resource.send(property).nil? | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has nil value for property #{property.to_camelcase}. This could be due to incorrect spelling of a default alarm name or missing property #{property.to_camelcase} on a new alarm." | ||
| end | ||
| end | ||
| end | ||
| when 'Check' | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -110,15 +110,29 @@ def get_alarms(group,overides={}) | |||||
| @alarms.each {|a| a.group = @override_group} | ||||||
| end | ||||||
|
|
||||||
| # String interpolation for alarm dimensions | ||||||
| @alarms.each do |alarm| | ||||||
| next if alarm.dimensions.nil? | ||||||
| alarm.dimensions.each do |k,v| | ||||||
| if v.is_a?(String) && v.match?(/^\${Resource::.*[A-Za-z]}$/) | ||||||
| resource_key = v.tr('${}', '').split('Resource::').last | ||||||
| # String interpolation for alarm dimensions | ||||||
| unless alarm.dimensions.nil? | ||||||
| alarm.dimensions.each do |k,v| | ||||||
| if v.is_a?(String) && v.match?(/^\${Resource::.*[A-Za-z]}$/) | ||||||
| resource_key = v.tr('${}', '').split('Resource::').last | ||||||
| if @resource.has_key?(resource_key) | ||||||
| logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'" | ||||||
| alarm.dimensions[k] = @resource[resource_key] | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
|
|
||||||
| # String interpolation for search expressions | ||||||
| if alarm.search_expression.is_a?(String) | ||||||
| alarm.search_expression = alarm.search_expression.gsub(/\${Resource::([A-Za-z0-9_]+)}/) do | ||||||
| resource_key = Regexp.last_match(1) | ||||||
| if @resource.has_key?(resource_key) | ||||||
| logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'" | ||||||
| alarm.dimensions[k] = @resource[resource_key] | ||||||
| logger.debug "interpolating search_expression variable '#{resource_key}' with value '#{@resource[resource_key]}' for alarm #{alarm.name}" | ||||||
| @resource[resource_key] | ||||||
|
||||||
| @resource[resource_key] | |
| @resource[resource_key].to_s |
Copilot
AI
Apr 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search expression interpolation only matches ${Resource::...} keys made of letters (/\${Resource::([A-Za-z]+)}/). This is more restrictive than the dimension-variable interpolation above and contradicts the docs that say “any key from the resource definition” (keys may contain digits/underscores). Consider broadening the regex (e.g., allow [A-Za-z0-9_]+), and prefer Regexp.last_match(1) over $1 for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow installs Bundler 2.4.22, but the lockfile was generated with Bundler 2.3.19 (
BUNDLED WITH). This often produces warnings and may cause unintended lockfile churn. Consider aligning the workflow Bundler version with the lockfile, or update the lockfile’sBUNDLED WITHto match the workflow version.