Skip to content

Commit ff3471a

Browse files
authored
feat: add default() evaluate helper. allow empty datadog result. Fixes argoproj#1548 (argoproj#1551)
Signed-off-by: Ryan M Smith <[email protected]>
1 parent 7c77744 commit ff3471a

File tree

6 files changed

+254
-79
lines changed

6 files changed

+254
-79
lines changed

USERS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Organizations below are **officially** using Argo Rollouts. Please send a PR wit
66
1. [Ambassador Labs](https://www.getambassador.io)
77
1. [Ant Group](https://www.antgroup.com/)
88
1. [Bucketplace](https://www.bucketplace.co.kr/)
9+
1. [Calm](https://www.calm.com/)
910
1. [Codefresh](https://codefresh.io/)
1011
1. [Databricks](https://github.com/databricks)
1112
1. [Devtron Labs](https://github.com/devtron-labs/devtron)

docs/features/analysis.md

Lines changed: 46 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ This example highlights:
3636

3737
=== "Rollout"
3838

39-
```yaml
39+
```yaml
4040
apiVersion: argoproj.io/v1alpha1
4141
kind: Rollout
4242
metadata:
@@ -65,7 +65,7 @@ This example highlights:
6565

6666
=== "AnalysisTemplate"
6767

68-
```yaml
68+
```yaml
6969
apiVersion: argoproj.io/v1alpha1
7070
kind: AnalysisTemplate
7171
metadata:
@@ -86,7 +86,7 @@ This example highlights:
8686
query: |
8787
sum(irate(
8888
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
89-
)) /
89+
)) /
9090
sum(irate(
9191
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
9292
))
@@ -114,7 +114,7 @@ metadata:
114114
spec:
115115
...
116116
strategy:
117-
canary:
117+
canary:
118118
steps:
119119
- setWeight: 20
120120
- pause: {duration: 5m}
@@ -148,13 +148,13 @@ spec:
148148
query: |
149149
sum(irate(
150150
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
151-
)) /
151+
)) /
152152
sum(irate(
153153
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
154154
))
155155
```
156156

157-
Multiple measurements can be performed over a longer duration period, by specifying the `count` and
157+
Multiple measurements can be performed over a longer duration period, by specifying the `count` and
158158
`interval` fields:
159159

160160
```yaml hl_lines="4 5"
@@ -174,8 +174,8 @@ Multiple measurements can be performed over a longer duration period, by specify
174174
!!! important
175175
Available since v0.9.0
176176

177-
A Rollout can reference a Cluster scoped AnalysisTemplate called a
178-
`ClusterAnalysisTemplate`. This can be useful when you want to share an AnalysisTemplate across multiple Rollouts;
177+
A Rollout can reference a Cluster scoped AnalysisTemplate called a
178+
`ClusterAnalysisTemplate`. This can be useful when you want to share an AnalysisTemplate across multiple Rollouts;
179179
in different namespaces, and avoid duplicating the same template in every namespace. Use the field
180180
`clusterScope: true` to reference a ClusterAnalysisTemplate instead of an AnalysisTemplate.
181181

@@ -189,7 +189,7 @@ in different namespaces, and avoid duplicating the same template in every namesp
189189
spec:
190190
...
191191
strategy:
192-
canary:
192+
canary:
193193
steps:
194194
- setWeight: 20
195195
- pause: {duration: 5m}
@@ -203,7 +203,7 @@ in different namespaces, and avoid duplicating the same template in every namesp
203203
```
204204

205205
=== "ClusterAnalysisTemplate"
206-
206+
207207
```yaml
208208
apiVersion: argoproj.io/v1alpha1
209209
kind: ClusterAnalysisTemplate
@@ -223,7 +223,7 @@ in different namespaces, and avoid duplicating the same template in every namesp
223223
query: |
224224
sum(irate(
225225
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
226-
)) /
226+
)) /
227227
sum(irate(
228228
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
229229
))
@@ -234,7 +234,7 @@ in different namespaces, and avoid duplicating the same template in every namesp
234234

235235
## Analysis with Multiple Templates
236236

237-
A Rollout can reference multiple AnalysisTemplates when constructing an AnalysisRun. This allows users to compose
237+
A Rollout can reference multiple AnalysisTemplates when constructing an AnalysisRun. This allows users to compose
238238
analysis from multiple AnalysisTemplates. If multiple templates are referenced, then the controller will merge the
239239
templates together. The controller combines the `metrics` and `args` fields of all the templates.
240240

@@ -332,7 +332,7 @@ templates together. The controller combines the `metrics` and `args` fields of a
332332
query: |
333333
sum(irate(
334334
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
335-
)) /
335+
)) /
336336
sum(irate(
337337
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
338338
))
@@ -346,13 +346,13 @@ templates together. The controller combines the `metrics` and `args` fields of a
346346
query: |
347347
sum(irate(
348348
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code=~"5.*"}[5m]
349-
)) /
349+
)) /
350350
sum(irate(
351351
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
352352
))
353-
```
353+
```
354354

355-
!!! note
355+
!!! note
356356
The controller will error when merging the templates if:
357357

358358
* Multiple metrics in the templates have the same name
@@ -388,12 +388,12 @@ spec:
388388
successCondition: result == 'true'
389389
provider:
390390
web:
391-
# placeholders are resolved when an AnalysisRun is created
391+
# placeholders are resolved when an AnalysisRun is created
392392
url: "{{ args.api-url }}?service={{ args.service-name }}"
393393
headers:
394394
- key: Authorization
395395
value: "Bearer {{ args.api-token }}"
396-
jsonPath: "{$.results.ok}"
396+
jsonPath: "{$.results.ok}"
397397
```
398398

399399
Analysis arguments defined in a Rollout are merged with the args from the AnalysisTemplate when the AnalysisRun is created.
@@ -411,7 +411,7 @@ spec:
411411
templates:
412412
- templateName: args-example
413413
args:
414-
# required value
414+
# required value
415415
- name: service-name
416416
value: guestbook-svc.default.svc.cluster.local
417417
# override default value
@@ -485,19 +485,19 @@ spec:
485485
```
486486

487487
In this example, the Rollout creates a pre-promotion AnalysisRun once the new ReplicaSet is fully available.
488-
The Rollout will not switch traffic to the new version until the analysis run finishes successfully.
488+
The Rollout will not switch traffic to the new version until the analysis run finishes successfully.
489489

490490
Note: if the`autoPromotionSeconds` field is specified and the Rollout has waited auto promotion seconds amount of time,
491491
the Rollout marks the AnalysisRun successful and switches the traffic to a new version automatically. If the AnalysisRun
492-
completes before then, the Rollout will not create another AnalysisRun and wait out the rest of the
492+
completes before then, the Rollout will not create another AnalysisRun and wait out the rest of the
493493
`autoPromotionSeconds`.
494494

495495
## BlueGreen Post Promotion Analysis
496496

497497
A Rollout using a BlueGreen strategy can launch an analysis run *after* the traffic switch to the new version using
498498
post-promotion analysis. If post-promotion Analysis fails or errors, the Rollout enters an aborted state and switches traffic back to the
499499
previous stable Replicaset. When post-analysis is Successful, the Rollout is considered fully promoted and
500-
the new ReplicaSet will be marked as stable. The old ReplicaSet will then be scaled down according to
500+
the new ReplicaSet will be marked as stable. The old ReplicaSet will then be scaled down according to
501501
`scaleDownDelaySeconds` (default 30 seconds).
502502

503503
```yaml
@@ -522,8 +522,8 @@ spec:
522522

523523
## Failure Conditions
524524

525-
`failureCondition` can be used to cause an analysis run to fail. The following example continually polls a prometheus
526-
server to get the total number of errors every 5 minutes, causing the analysis run to fail if 10 or more errors were
525+
`failureCondition` can be used to cause an analysis run to fail. The following example continually polls a prometheus
526+
server to get the total number of errors every 5 minutes, causing the analysis run to fail if 10 or more errors were
527527
encountered.
528528

529529
```yaml hl_lines="4"
@@ -546,7 +546,7 @@ encountered.
546546
Analysis runs can also be considered `Inconclusive`, which indicates the run was neither successful,
547547
nor failed. Inconclusive runs causes a rollout to become paused at its current step. Manual
548548
intervention is then needed to either resume the rollout, or abort. One example of how analysis runs
549-
could become `Inconclusive`, is when a metric defines no success or failure conditions.
549+
could become `Inconclusive`, is when a metric defines no success or failure conditions.
550550

551551
```yaml
552552
metrics:
@@ -575,17 +575,17 @@ A use case for having `Inconclusive` analysis runs are to enable Argo Rollouts t
575575
whether or not measurement value is acceptable and decide to proceed or abort.
576576

577577
## Delay Analysis Runs
578-
If the analysis run does not need to start immediately (i.e give the metric provider time to collect
578+
If the analysis run does not need to start immediately (i.e give the metric provider time to collect
579579
metrics on the canary version), Analysis Runs can delay the specific metric analysis. Each metric
580-
can be configured to have a different delay. In additional to the metric specific delays, the rollouts
580+
can be configured to have a different delay. In additional to the metric specific delays, the rollouts
581581
with background analysis can delay creating an analysis run until a certain step is reached
582582

583583
Delaying a specific analysis metric:
584584
```yaml hl_lines="3 4"
585585
metrics:
586586
- name: success-rate
587587
# Do not start this analysis until 5 minutes after the analysis run starts
588-
initialDelay: 5m
588+
initialDelay: 5m
589589
successCondition: result[0] >= 0.90
590590
provider:
591591
prometheus:
@@ -602,7 +602,7 @@ metadata:
602602
name: guestbook
603603
spec:
604604
strategy:
605-
canary:
605+
canary:
606606
analysis:
607607
templates:
608608
- templateName: success-rate
@@ -642,7 +642,7 @@ spec:
642642
web:
643643
headers:
644644
- key: Authorization
645-
value: "Bearer {{ args.api-token }}"
645+
value: "Bearer {{ args.api-token }}"
646646
```
647647

648648
## Handling Metric Results
@@ -758,6 +758,8 @@ status:
758758

759759
### Empty array
760760

761+
#### Prometheus
762+
761763
Metric providers can sometimes return empty array, e.g., no data returned from prometheus query.
762764

763765
Here are two examples where a metric result of empty array is considered successful and failed respectively.
@@ -801,3 +803,17 @@ status:
801803
phase: Failed
802804
startedAt: "2021-09-08T19:19:44Z"
803805
```
806+
807+
#### Datadog
808+
809+
Datadog queries can return empty results if the query takes place during a time interval with no metrics. The Datadog provider will return a `nil` value yielding an error during the evaluation phase like:
810+
811+
```
812+
invalid operation: < (mismatched types <nil> and float64)
813+
```
814+
815+
However, empty query results yielding a `nil` value can be handled using the `default()` function. Here is a succeeding example using the `default()` function:
816+
817+
```yaml
818+
successCondition: default(result, 0) < 0.05
819+
```

metricproviders/datadog/datadog.go

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -142,18 +142,28 @@ func (p *Provider) parseResponse(metric v1alpha1.Metric, response *http.Response
142142
return "", v1alpha1.AnalysisPhaseError, fmt.Errorf("Could not parse JSON body: %v", err)
143143
}
144144

145-
if len(res.Series) < 1 || len(res.Series[0].Pointlist) < 1 {
146-
return "", v1alpha1.AnalysisPhaseError, fmt.Errorf("Datadog returned no value: %s", string(bodyBytes))
145+
// Handle an empty query result
146+
if len(res.Series) == 0 || len(res.Series[0].Pointlist) == 0 {
147+
var nilFloat64 *float64
148+
status, err := evaluate.EvaluateResult(nilFloat64, metric, p.logCtx)
149+
seriesBytes, jsonErr := json.Marshal(res.Series)
150+
if jsonErr != nil {
151+
return "", v1alpha1.AnalysisPhaseError, fmt.Errorf("Failed to marshall JSON empty series: %v", jsonErr)
152+
}
153+
154+
return string(seriesBytes), status, err
147155
}
148156

157+
// Handle a populated query result
149158
series := res.Series[0]
150159
datapoint := series.Pointlist[len(series.Pointlist)-1]
151-
if len(datapoint) < 1 {
152-
return "", v1alpha1.AnalysisPhaseError, fmt.Errorf("Datadog returned no value: %s", string(bodyBytes))
160+
if len(datapoint) != 2 {
161+
return "", v1alpha1.AnalysisPhaseError, fmt.Errorf("Datapoint does not have 2 values")
153162
}
154163

155-
status, err := evaluate.EvaluateResult(datapoint[1], metric, p.logCtx)
156-
return strconv.FormatFloat(datapoint[1], 'f', -1, 64), status, err
164+
value := datapoint[1]
165+
status, err := evaluate.EvaluateResult(value, metric, p.logCtx)
166+
return strconv.FormatFloat(value, 'f', -1, 64), status, err
157167
}
158168

159169
// Resume should not be used the Datadog provider since all the work should occur in the Run method

0 commit comments

Comments
 (0)