Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/build-gem.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ jobs:

- name: rspec
run: |
gem install rspec
rspec
gem install bundler -v 2.4.22
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow installs Bundler 2.4.22, but the lockfile was generated with Bundler 2.3.19 (BUNDLED WITH). This often produces warnings and may cause unintended lockfile churn. Consider aligning the workflow Bundler version with the lockfile, or update the lockfile’s BUNDLED WITH to match the workflow version.

Suggested change
gem install bundler -v 2.4.22
gem install bundler -v 2.3.19

Copilot uses AI. Check for mistakes.
bundle install
bundle exec rspec

- name: build gem
run: |
Expand Down
100 changes: 63 additions & 37 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
cfn-guardian (0.10.4)
cfn-guardian (0.13.0)
aws-sdk-cloudformation (~> 1.76, < 2)
aws-sdk-cloudwatch (~> 1.72, < 2)
aws-sdk-codecommit (~> 1.53, < 2)
Expand All @@ -10,71 +10,97 @@ PATH
aws-sdk-rds (~> 1.174, < 2)
aws-sdk-s3 (~> 1.119, < 2)
cfndsl (~> 1.0, < 2)
rexml
rexml (= 3.3.0)
term-ansicolor (~> 1, < 2)
terminal-table (~> 1, < 2)
thor (~> 0.20)
tins (~> 1.42.0)

GEM
remote: https://rubygems.org/
specs:
aws-eventstream (1.2.0)
aws-partitions (1.737.0)
aws-sdk-cloudformation (1.76.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-cloudwatch (1.72.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-codecommit (1.53.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-codepipeline (1.55.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-core (3.171.0)
aws-eventstream (~> 1, >= 1.0.2)
aws-partitions (~> 1, >= 1.651.0)
aws-eventstream (1.4.0)
aws-partitions (1.1239.0)
aws-sdk-cloudformation (1.150.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-cloudwatch (1.134.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-codecommit (1.97.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-codepipeline (1.113.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-core (3.244.0)
aws-eventstream (~> 1, >= 1.3.0)
aws-partitions (~> 1, >= 1.992.0)
aws-sigv4 (~> 1.9)
base64
bigdecimal
jmespath (~> 1, >= 1.6.1)
aws-sdk-ec2 (1.371.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-kms (1.63.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-rds (1.174.0)
aws-sdk-core (~> 3, >= 3.165.0)
aws-sigv4 (~> 1.1)
aws-sdk-s3 (1.119.2)
aws-sdk-core (~> 3, >= 3.165.0)
logger
aws-sdk-ec2 (1.611.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-kms (1.123.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-rds (1.311.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sigv4 (~> 1.5)
aws-sdk-s3 (1.219.0)
aws-sdk-core (~> 3, >= 3.244.0)
aws-sdk-kms (~> 1)
aws-sigv4 (~> 1.4)
aws-sigv4 (1.5.2)
aws-sigv4 (~> 1.5)
aws-sigv4 (1.12.1)
aws-eventstream (~> 1, >= 1.0.2)
cfndsl (1.6.0)
base64 (0.3.0)
bigdecimal (4.1.1)
cfndsl (1.7.3)
hana (~> 1.3)
diff-lcs (1.6.2)
hana (1.3.7)
jmespath (1.6.2)
logger (1.7.0)
rake (13.0.6)
rexml (3.2.5)
rexml (3.3.0)
strscan
rspec (3.13.2)
rspec-core (~> 3.13.0)
rspec-expectations (~> 3.13.0)
rspec-mocks (~> 3.13.0)
rspec-core (3.13.6)
rspec-support (~> 3.13.0)
rspec-expectations (3.13.5)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.13.0)
rspec-mocks (3.13.8)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.13.0)
rspec-support (3.13.7)
strscan (3.1.8)
sync (0.5.0)
term-ansicolor (1.7.1)
tins (~> 1.0)
term-ansicolor (1.11.3)
tins (~> 1)
terminal-table (1.8.0)
unicode-display_width (~> 1.1, >= 1.1.1)
thor (0.20.3)
tins (1.32.1)
tins (1.42.0)
bigdecimal
sync
unicode-display_width (1.8.0)

PLATFORMS
aarch64-linux
x86_64-darwin-21

DEPENDENCIES
bundler (~> 2.0)
cfn-guardian!
rake (~> 13.0)
rspec (~> 3.0)

BUNDLED WITH
2.3.19
1 change: 1 addition & 0 deletions cfn-guardian.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,5 @@ Gem::Specification.new do |spec|

spec.add_development_dependency "bundler", "~> 2.0"
spec.add_development_dependency "rake", "~> 13.0"
spec.add_development_dependency "rspec", "~> 3.0"
end
3 changes: 2 additions & 1 deletion docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@
8. [Composite Alarms](composite_alarms.md)
9. [Alarms for Custom Metrics](custom_metrics.md)
10. [Dimension Variables](variables.md)
11. [Alarm Tags](alarm_tags.md)
11. [Search Expression Alarms](search_expressions.md)
12. [Alarm Tags](alarm_tags.md)
120 changes: 120 additions & 0 deletions docs/search_expressions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Search Expression Alarms

Search expression alarms use CloudWatch [SEARCH()](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/search-expression-syntax.html) to dynamically match metrics instead of targeting a fixed set of dimensions. This is useful when the physical resource ID changes between deployments, such as Auto Scaling Groups that use replacement update policies in CloudFormation.

## The Problem

When a CloudFormation stack replaces an ASG on deployment, the physical ASG name changes (e.g. `my-app-AsgGroup-abc123` becomes `my-app-AsgGroup-xyz789`). Standard alarms use fixed dimensions that reference the exact ASG name, so they break after every deployment until Guardian is recompiled and redeployed with the new name.

## How It Works

Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits the CloudFormation `Metrics` property (a list of `MetricDataQuery` objects) with:

1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name
2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation

## Configuration

Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with the alarm `Metrics` property.

### Properties

| Property | Required | Default | Description |
| --- | --- | --- | --- |
| `SearchExpression` | Yes | - | A CloudWatch SEARCH() expression string. Supports `${Resource::...}` [variables](variables.md). |
| `SearchAggregation` | No | `MAX` | Aggregation function applied to the search results. Valid values: `MAX`, `MIN`, `AVG`, `SUM`. |

Comment on lines +22 to +26
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown table under "Properties" is malformed (lines start with ||), which renders as an empty first column in most markdown parsers. Use standard table syntax with a single leading/trailing | per row.

Copilot uses AI. Check for mistakes.
### Overriding Default Alarms

You can convert existing default alarms to use search expressions by overriding them in the template:

```yaml
Resources:
AutoScalingGroup:
- Id: my-app-AsgGroup

Templates:
AutoScalingGroup:
CPUUtilizationHighBase:
SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"CPUUtilization\" \"${Resource::Id}\"', 'Minimum', 60)"
SearchAggregation: MAX
StatusCheckFailed:
SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"StatusCheckFailed\" \"${Resource::Id}\"', 'Maximum', 60)"
SearchAggregation: MAX
```

In this example the `Id` is the stable prefix of the ASG name. The double quotes around `\"${Resource::Id}\"` inside the SEARCH expression perform an exact substring match, so `my-app-AsgGroup-abc123` and `my-app-AsgGroup-xyz789` both match but unrelated ASGs do not.

### Creating New Alarms

You can also create new search expression alarms that don't override any defaults:

```yaml
Templates:
AutoScalingGroup:
NetworkOutHigh:
SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"NetworkOut\" \"${Resource::Id}\"', 'Average', 300)"
SearchAggregation: SUM
Threshold: 1000000000
ComparisonOperator: GreaterThanThreshold
EvaluationPeriods: 3
AlarmAction: Warning
```

### Using With Other Resource Groups

Search expressions work with any resource group, not just AutoScalingGroup:

```yaml
Resources:
ECSService:
- Id: my-service
Cluster: my-cluster

Templates:
ECSService:
CPUUtilizationHigh:
SearchExpression: "SEARCH('{AWS/ECS,ServiceName,ClusterName} MetricName=\"CPUUtilization\" \"${Resource::Id}\"', 'Average', 60)"
SearchAggregation: MAX
Threshold: 90
EvaluationPeriods: 5
```

## Variables

`${Resource::...}` variables are interpolated inside search expressions the same way as in [dimension variables](variables.md). You can reference any key from the resource definition:

```yaml
Resources:
AutoScalingGroup:
- Id: my-app-AsgGroup
Environment: production

Templates:
AutoScalingGroup:
CPUUtilizationHighBase:
SearchExpression: "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"CPUUtilization\" \"${Resource::Id}\"', 'Minimum', 60)"
SearchAggregation: MAX
```

## CloudWatch SEARCH() Syntax Quick Reference

The general format is:

```
SEARCH('{Namespace,DimensionName} SearchTerm', 'Statistic', Period)
```

- **Partial match**: `my-app` matches any metric with a token `my` or `app` in any dimension value
- **Exact match**: `"my-app-AsgGroup"` matches only the exact substring `my-app-AsgGroup`
- **Boolean operators**: `AND`, `OR`, `NOT` can be used to combine terms
- **Property designators**: `MetricName="CPUUtilization"` restricts matching to the metric name

See the [CloudWatch search expression syntax documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/search-expression-syntax.html) for full details.

## Limitations

- **2-week lookback**: SEARCH() only finds metrics that have reported data within the last 2 weeks
- **100 metric limit**: A single SEARCH expression can match up to 100 time series
- **1024 character limit**: The search expression query string cannot exceed 1024 characters
- **Aggregation required**: Since SEARCH can return multiple time series, the aggregation function reduces them to a single series for threshold comparison
2 changes: 1 addition & 1 deletion docs/variables.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Dimension Variables

variables can be used to reference resource group values such as the resource Id within the dimensions section of an alarm template.
Variables can be used to reference resource group values such as the resource Id within the dimensions section of an alarm template. They are also supported inside [search expressions](search_expressions.md).

For example here we are creating an alarm for a disk usage metric for a group of EC2 instances.

Expand Down
21 changes: 18 additions & 3 deletions lib/cfnguardian/compile.rb
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,24 @@ def validate_resources()
@resources.each do |resource|
case resource.type
when 'Alarm'
%w(metric_name namespace).each do |property|
if resource.send(property).nil?
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has nil value for property #{property.to_camelcase}. This could be due to incorrect spelling of a default alarm name or missing property #{property.to_camelcase} on a new alarm."
if resource.search_expression
if !resource.search_expression.is_a?(String) || resource.search_expression.strip.empty?
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has an invalid SearchExpression. Must be a non-empty string."
end
if resource.search_aggregation
valid_aggregations = %w(MAX MIN AVG SUM)
normalized = resource.search_aggregation.to_s.upcase
if valid_aggregations.include?(normalized)
resource.search_aggregation = normalized
else
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has invalid SearchAggregation '#{resource.search_aggregation}'. Must be one of: #{valid_aggregations.join(', ')}."
end
Comment on lines +193 to +204
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New validation behavior was added for SearchExpression (non-empty string) and SearchAggregation (must be one of MAX/MIN/AVG/SUM and normalized to uppercase), but there are no specs asserting these error cases and normalization. Add RSpec coverage that (1) invalid/blank SearchExpression causes CfnGuardian::ValidationError, and (2) invalid SearchAggregation is rejected while valid lowercase values are normalized.

Copilot uses AI. Check for mistakes.
end
Comment on lines +193 to +205
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource.search_expression.empty? assumes search_expression is a String. If a user misconfigures YAML (e.g., SearchExpression: []), this will raise a NoMethodError during validation instead of producing a helpful validation message. Consider validating search_expression.is_a?(String) and treating blank/whitespace-only strings as invalid (e.g., strip.empty?) to keep validation robust.

Copilot uses AI. Check for mistakes.
else
%w(metric_name namespace).each do |property|
if resource.send(property).nil?
@errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has nil value for property #{property.to_camelcase}. This could be due to incorrect spelling of a default alarm name or missing property #{property.to_camelcase} on a new alarm."
end
end
end
when 'Check'
Expand Down
6 changes: 5 additions & 1 deletion lib/cfnguardian/models/alarm.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ class BaseAlarm
:unit,
:maintenance_groups,
:additional_notifiers,
:tags
:tags,
:search_expression,
:search_aggregation

def initialize(resource)
@type = 'Alarm'
Expand Down Expand Up @@ -60,6 +62,8 @@ def initialize(resource)
@maintenance_groups = []
@additional_notifiers = []
@tags = {}
@search_expression = nil
@search_aggregation = nil
end

def metric_name=(metric_name)
Expand Down
28 changes: 21 additions & 7 deletions lib/cfnguardian/resources/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -110,15 +110,29 @@ def get_alarms(group,overides={})
@alarms.each {|a| a.group = @override_group}
end

# String interpolation for alarm dimensions
@alarms.each do |alarm|
next if alarm.dimensions.nil?
alarm.dimensions.each do |k,v|
if v.is_a?(String) && v.match?(/^\${Resource::.*[A-Za-z]}$/)
resource_key = v.tr('${}', '').split('Resource::').last
# String interpolation for alarm dimensions
unless alarm.dimensions.nil?
alarm.dimensions.each do |k,v|
if v.is_a?(String) && v.match?(/^\${Resource::.*[A-Za-z]}$/)
resource_key = v.tr('${}', '').split('Resource::').last
if @resource.has_key?(resource_key)
logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'"
alarm.dimensions[k] = @resource[resource_key]
end
end
end
end

# String interpolation for search expressions
if alarm.search_expression.is_a?(String)
alarm.search_expression = alarm.search_expression.gsub(/\${Resource::([A-Za-z0-9_]+)}/) do
resource_key = Regexp.last_match(1)
if @resource.has_key?(resource_key)
logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'"
alarm.dimensions[k] = @resource[resource_key]
logger.debug "interpolating search_expression variable '#{resource_key}' with value '#{@resource[resource_key]}' for alarm #{alarm.name}"
@resource[resource_key]
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gsub replacement block must return a String. If a resource value is numeric/boolean (e.g., a Port), returning it directly will raise TypeError during interpolation. Convert interpolated values to strings (e.g., @resource[resource_key].to_s) before returning from the block.

Suggested change
@resource[resource_key]
@resource[resource_key].to_s

Copilot uses AI. Check for mistakes.
else
"${Resource::#{resource_key}}"
end
Comment on lines +127 to 136
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search expression interpolation only matches ${Resource::...} keys made of letters (/\${Resource::([A-Za-z]+)}/). This is more restrictive than the dimension-variable interpolation above and contradicts the docs that say “any key from the resource definition” (keys may contain digits/underscores). Consider broadening the regex (e.g., allow [A-Za-z0-9_]+), and prefer Regexp.last_match(1) over $1 for clarity.

Copilot uses AI. Check for mistakes.
end
end
Expand Down
Loading
Loading