Skip to content

Commit 44fbbbb

Browse files
feat(alarm): add minSampleCountToEvaluateDatapoint (#453)
Fixes #452 Currently, when using `minMetricSamplesToAlarm` the number of samples is evaluated for a different period than the main alarm. This makes monitoring sensitive to false positives as not every breaching datapoint must have sufficient number of samples (see #452 for more details). Moreover, the current approach for adjusting alarms to respect `minMetricSamplesToAlarm` is to create 2 extra alarms - one for `NoSamples` and one for a top-level composite. Each of these monitors incurs extra costs ($0.10 for `NoSamples` monitor and $0.50 for the Composite, see https://aws.amazon.com/cloudwatch/pricing/ for reference). This means that using `minMetricSamplesToAlarm` increases the cost from $0.10 per alarm to $0.70 per alarm ($0.60 of overhead!). It's possible to use Math Expression instead. Instead of adding separate alarm for `NoSamples`, we can model it a Sample Count metric, and instead of the Composite, we can use the MathExpression that conditionally emits the data based on the number of samples. The charge for Math Expression-based alarms is per metric in the Math Expression, so that comes down to $0.20 per alarm. That's a 70% cost improvement. Additionally, it reduces the overall number of alarms, effectively making it easier to fit your alarming in the CloudWatch quota and decluttering the UI. To avoid breaking any customers that rely on `minMetricSamplesToAlarm` generating alarms (e.g. #403), deprecating it and adding `minSampleCountToEvaluateDatapoint` with updated behaviour next to it. --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
1 parent 3c1f088 commit 44fbbbb

File tree

4 files changed

+182
-20
lines changed

4 files changed

+182
-20
lines changed

API.md

Lines changed: 36 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

lib/common/alarm/AlarmFactory.ts

Lines changed: 65 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import {
77
CompositeAlarm,
88
HorizontalAnnotation,
99
IAlarmRule,
10+
MathExpression,
1011
TreatMissingData,
1112
} from "aws-cdk-lib/aws-cloudwatch";
1213
import { Construct } from "constructs";
@@ -186,6 +187,18 @@ export interface AddAlarmProps {
186187
*/
187188
readonly evaluateLowSampleCountPercentile?: boolean;
188189

190+
/**
191+
* Specifies how many samples (N) of the metric is needed in a datapoint to be evaluated for alarming.
192+
* If this property is specified, your metric will be subject to MathExpression that will add an IF condition
193+
* to your metric to make sure that each datapoint is evaluated only if it has sufficient number of samples.
194+
* If the number of samples is not sufficient, the datapoint will be treated as missing data and will be evaluated
195+
* according to the treatMissingData parameter.
196+
* If specified, deprecated minMetricSamplesToAlarm has no effect.
197+
*
198+
* @default - default behaviour - no condition on sample count will be used
199+
*/
200+
readonly minSampleCountToEvaluateDatapoint?: number;
201+
189202
/**
190203
* Specifies how many samples (N) of the metric is needed to trigger the alarm.
191204
* If this property is specified, an artificial composite alarm is created of the following:
@@ -195,6 +208,9 @@ export interface AddAlarmProps {
195208
* </ul>
196209
* The newly created composite alarm will be returned as a result, and it will take the original alarm actions.
197210
* @default - default behaviour - no condition on sample count will be added to the alarm
211+
* @deprecated Use minSampleCountToEvaluateDatapoint instead. minMetricSamplesAlarm uses different evaluation
212+
* period for its child alarms, so it doesn't guarantee that each datapoint in the evaluation period has
213+
* sufficient number of samples
198214
*/
199215
readonly minMetricSamplesToAlarm?: number;
200216

@@ -511,6 +527,9 @@ export class AlarmFactory {
511527
props
512528
);
513529

530+
// metric that will be ultimately used to create the alarm
531+
let alarmMetric: MetricWithAlarmSupport = adjustedMetric;
532+
514533
// prepare primary alarm properties
515534

516535
const actionsEnabled = this.determineActionsEnabled(
@@ -549,32 +568,58 @@ export class AlarmFactory {
549568
);
550569
}
551570

571+
// apply metric math for minimum metric samples
572+
573+
if (props.minSampleCountToEvaluateDatapoint) {
574+
if (adjustedMetric instanceof MathExpression) {
575+
throw new Error(
576+
"minSampleCountToEvaluateDatapoint is not supported for MathExpressions. " +
577+
"If you already use MathExpression, you can extend your expression to evaluate " +
578+
"the sample count using IF statement, e.g. IF(sampleCount > X, mathExpression)."
579+
);
580+
}
581+
582+
const metricSampleCount = adjustedMetric.with({
583+
statistic: MetricStatistic.N,
584+
label: "Sample count",
585+
});
586+
587+
alarmMetric = new MathExpression({
588+
label: `${adjustedMetric}`,
589+
expression: `IF(sampleCount > ${props.minSampleCountToEvaluateDatapoint}, metric)`,
590+
usingMetrics: {
591+
metric: adjustedMetric,
592+
sampleCount: metricSampleCount,
593+
},
594+
});
595+
}
596+
552597
// create primary alarm
553598

554-
const primaryAlarm = adjustedMetric.createAlarm(
555-
this.alarmScope,
599+
const primaryAlarm = alarmMetric.createAlarm(this.alarmScope, alarmName, {
556600
alarmName,
557-
{
558-
alarmName,
559-
alarmDescription,
560-
threshold: props.threshold,
561-
comparisonOperator: props.comparisonOperator,
562-
treatMissingData: props.treatMissingData,
563-
// default value (undefined) means "evaluate"
564-
evaluateLowSampleCountPercentile: evaluateLowSampleCountPercentile
565-
? undefined
566-
: "ignore",
567-
datapointsToAlarm,
568-
evaluationPeriods,
569-
actionsEnabled,
570-
}
571-
);
601+
alarmDescription,
602+
threshold: props.threshold,
603+
comparisonOperator: props.comparisonOperator,
604+
treatMissingData: props.treatMissingData,
605+
// default value (undefined) means "evaluate"
606+
evaluateLowSampleCountPercentile: evaluateLowSampleCountPercentile
607+
? undefined
608+
: "ignore",
609+
datapointsToAlarm,
610+
evaluationPeriods,
611+
actionsEnabled,
612+
});
572613

573614
let alarm: AlarmBase = primaryAlarm;
574615

575616
// create composite alarm for min metric samples (if defined)
617+
// deprecated in favour of minSampleCountToEvaluateDatapoint
576618

577-
if (props.minMetricSamplesToAlarm) {
619+
if (
620+
!props.minSampleCountToEvaluateDatapoint &&
621+
props.minMetricSamplesToAlarm
622+
) {
578623
const metricSampleCount = adjustedMetric.with({
579624
statistic: MetricStatistic.N,
580625
});
@@ -627,6 +672,8 @@ export class AlarmFactory {
627672
datapointsToAlarm,
628673
dedupeString,
629674
minMetricSamplesToAlarm: props.minMetricSamplesToAlarm,
675+
minSampleCountToEvaluateDatapoint:
676+
props.minSampleCountToEvaluateDatapoint,
630677
fillAlarmRange: props.fillAlarmRange ?? false,
631678
overrideAnnotationColor: props.overrideAnnotationColor,
632679
overrideAnnotationLabel: props.overrideAnnotationLabel,

lib/common/alarm/IAlarmAnnotationStrategy.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ export interface AlarmAnnotationStrategyProps extends AlarmMetadata {
1313
readonly metric: MetricWithAlarmSupport;
1414
readonly comparisonOperator: ComparisonOperator;
1515
readonly minMetricSamplesToAlarm?: number;
16+
readonly minSampleCountToEvaluateDatapoint?: number;
1617
readonly threshold: number;
1718
readonly datapointsToAlarm: number;
1819
readonly evaluationPeriods: number;

test/common/alarm/AlarmFactory.test.ts

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
import { Duration, Stack } from "aws-cdk-lib";
2-
import { Capture, Template } from "aws-cdk-lib/assertions";
2+
import { Capture, Match, Template } from "aws-cdk-lib/assertions";
33
import {
44
Alarm,
55
CfnAlarm,
66
ComparisonOperator,
7+
MathExpression,
78
Metric,
89
Shading,
910
TreatMissingData,
@@ -330,6 +331,84 @@ test("addAlarm: check created alarms when minMetricSamplesToAlarm is used", () =
330331
});
331332
});
332333

334+
test("addAlarm: check created alarms when minSampleCountToEvaluateDatapoint is used", () => {
335+
const stack = new Stack();
336+
const factory = new AlarmFactory(stack, {
337+
globalMetricDefaults,
338+
globalAlarmDefaults,
339+
localAlarmNamePrefix: "prefix",
340+
});
341+
factory.addAlarm(metric, {
342+
...props,
343+
alarmNameSuffix: "none",
344+
comparisonOperator: ComparisonOperator.LESS_THAN_THRESHOLD,
345+
minSampleCountToEvaluateDatapoint: 42,
346+
minMetricSamplesToAlarm: 55, // not used if minSampleCountToEvaluateDatapoint defined
347+
});
348+
349+
const template = Template.fromStack(stack);
350+
template.hasResourceProperties("AWS::CloudWatch::Alarm", {
351+
AlarmName: "DummyServiceAlarms-prefix-none",
352+
AlarmDescription: "Description",
353+
ComparisonOperator: "LessThanThreshold",
354+
DatapointsToAlarm: 10,
355+
EvaluationPeriods: 10,
356+
TreatMissingData: "notBreaching",
357+
Metrics: [
358+
Match.objectLike({
359+
Expression: "IF(sampleCount > 42, metric)",
360+
Label: "DummyMetric1",
361+
}),
362+
{
363+
Id: "metric",
364+
MetricStat: {
365+
Metric: Match.objectLike({
366+
MetricName: "DummyMetric1",
367+
}),
368+
Period: 300,
369+
Stat: "Average",
370+
},
371+
ReturnData: false,
372+
},
373+
{
374+
Id: "sampleCount",
375+
MetricStat: {
376+
Metric: Match.objectLike({
377+
MetricName: "DummyMetric1",
378+
}),
379+
Period: 300,
380+
Stat: "SampleCount",
381+
},
382+
ReturnData: false,
383+
},
384+
],
385+
});
386+
});
387+
388+
test("addAlarm: minSampleCountToEvaluateDatapoint used with Math Expression throws error", () => {
389+
const stack = new Stack();
390+
const factory = new AlarmFactory(stack, {
391+
globalMetricDefaults,
392+
globalAlarmDefaults,
393+
localAlarmNamePrefix: "prefix",
394+
});
395+
const mathExpression = new MathExpression({
396+
expression: "MAX(metric)",
397+
usingMetrics: {
398+
metric,
399+
},
400+
});
401+
402+
expect(() =>
403+
factory.addAlarm(mathExpression, {
404+
...props,
405+
minSampleCountToEvaluateDatapoint: 42,
406+
})
407+
).toThrow(
408+
"minSampleCountToEvaluateDatapoint is not supported for MathExpressions"
409+
);
410+
});
411+
333412
test("addCompositeAlarm: snapshot for operator", () => {
334413
const stack = new Stack();
335414
const factory = new AlarmFactory(stack, {

0 commit comments

Comments
 (0)