Skip to content

Commit 699055a

Browse files
joepeeplesTecoddy
andauthored
Change Failure Detection preview (DORA) [DOCS-12660] (#32810)
* set up skeleton - create new page - add to nav file (main.en.yaml) * New Change failure detection page * Updated to include custom rules * Update _index.md * Include Revert in custom rules Included Revert within custom rules + few improvements * Updated default rules * Update _index.md * Removed incident patterns as default rules * remove blog_linker.py This script should have been gitignored, but it might have gotten added before the .gitignore file was updated. * editorial style revisions, add tables * revise headings * delete alias redirect I think I added this by mistake when creating the initial skeleton. There's no need for a redirect because this is a brand-new page, and AFAIK no page ever existed at the alias path. * add formula syntax * line edits --------- Co-authored-by: Teddy Gesbert <teddy.gesbert@datadoghq.com> Co-authored-by: Teddy Gesbert <94978881+Tecoddy@users.noreply.github.com>
1 parent 40c4489 commit 699055a

File tree

3 files changed

+150
-1
lines changed

3 files changed

+150
-1
lines changed

config/_default/menus/main.en.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5784,11 +5784,16 @@ menu:
57845784
parent: dora_metrics_setup
57855785
identifier: dora_metrics_setup_failures
57865786
weight: 102
5787+
- name: Change Failure Detection
5788+
url: dora_metrics/change_failure_detection/
5789+
parent: dora_metrics
5790+
identifier: dora_metrics_change_failure_detection
5791+
weight: 2
57875792
- name: Data Collected
57885793
url: dora_metrics/data_collected/
57895794
parent: dora_metrics
57905795
identifier: dora_metrics_data_collected
5791-
weight: 2
5796+
weight: 3
57925797
- name: Feature Flags
57935798
url: feature_flags/
57945799
pre: ci
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: Change Failure Detection
3+
description: "Learn how to configure change failure detection in DORA Metrics using rollbacks, revert PRs, and custom PR filters."
4+
further_reading:
5+
- link: '/dora_metrics/'
6+
tag: 'Documentation'
7+
text: 'Learn about DORA Metrics'
8+
- link: '/dora_metrics/setup/'
9+
tag: 'Documentation'
10+
text: 'Set up data sources for DORA Metrics'
11+
---
12+
13+
{{< jqmath-vanilla >}}
14+
15+
{{< callout url="" btn_hidden="true" header="Join the Preview!" >}}
16+
Change Failure Detection is in Preview.
17+
{{< /callout >}}
18+
19+
## Overview
20+
21+
Datadog Change Failure Detection automatically identifies deployments that remediate previously failed deployments. By connecting deployment data with failure events, it provides a complete view of delivery performance, helping teams balance release velocity with operational stability.
22+
23+
A **change failure** is a deployment that causes issues in production and requires remediation. Change failures are used to calculate the following metrics:
24+
25+
- Change Failure Rate
26+
: The percentage of deployments causing a failure in production, calculated as the following:
27+
28+
$$\text"Change Failure Rate" = \text"Number of change failures" / (\text"Total deployments" - \text"Rollback deployments")$$
29+
30+
- Failed Deployment Recovery Time
31+
: The median duration between a failed deployment and its remediation, either through a rollback or rollforward deployment.
32+
33+
Change Failure Detection identifies two types of remediation deployments:
34+
- **Rollbacks**: Automatically detected when a previously deployed version is redeployed
35+
- **Rollforwards**: Detected through custom rules that match metadata patterns (such as revert PRs and hotfix labels)
36+
37+
38+
## Rollbacks
39+
40+
A rollback occurs when a previously deployed version is redeployed to restore the system after a failed or faulty change.
41+
42+
### How rollback classification works
43+
44+
A deployment is classified as a rollback when it deploys a version that matches a previously deployed version but differs from the immediately preceding deployment.
45+
46+
- If Git metadata is present, the match is based on the commit SHA.
47+
- If Git metadata is not present, the match is based on the version tag.
48+
49+
When a rollback is detected, the change failure is the first deployment after the rollback target (the version you reverted to).
50+
51+
### Example: Rollback detection
52+
53+
For the sequence V1 → V2 → V3 → V1, the rollback target is the original V1, so V2 is marked as the change failure and V1 as a rollback deployment.
54+
55+
{{< img src="dora_metrics/rollback_example.png" alt="An example of a detected rollback deployment" style="width:100%;" >}}
56+
57+
**Note**: Redeploying the same version back‑to‑back (for example, V1 → V1) is not considered a rollback.
58+
59+
## Rollforwards
60+
61+
A rollforward occurs when a new deployment is made to fix or override a failed or faulty change. Unlike rollbacks (which redeploy a previous version), rollforwards deploy new code to remediate issues. This can include revert pull requests that restore previous behavior through a new release.
62+
63+
Rollforwards are detected through custom rules that match deployment metadata patterns. Custom rules are configured in the [DORA Settings page][1].
64+
65+
## Custom rules
66+
67+
You can define custom rules to automatically classify rollforward deployments based on repository or release metadata. Rules can operate in two ways:
68+
- **Linking deployments**: Match deployments through shared variable values (for example, PR number or version)
69+
- **Static patterns**: Match metadata patterns without variables (for example, labels or branch names)
70+
71+
### Rules linked to failed deployments
72+
73+
Use these rules to identify rollforward deployments that should be linked to a specific earlier failed deployment. These rules use regular expression (regex) patterns with variables to match deployments through shared references.
74+
75+
You can enter regex rules that include one of these variables:
76+
| Variable | Description |
77+
|---------------|-----------------------|
78+
| `$pr_title` | Matches PR titles |
79+
| `$pr_number` | Matches PR numbers |
80+
| `$version` | Matches version tags |
81+
82+
#### How variable-based classification works
83+
84+
When a rule matches a deployment, the following actions occur:
85+
1. The variable value is extracted from the current deployment.
86+
2. The system finds the earlier deployment with the same extracted value.
87+
3. The current deployment is marked as a rollforward linked to that earlier deployment.
88+
4. The earlier deployment is marked as the change failure.
89+
90+
These rules work best when the failed deployment can be identified by a shared commit SHA, version tag, or PR reference.
91+
92+
#### Example: Revert pull requests
93+
94+
Revert pull requests are a common recovery pattern. For example, a PR titled `Revert "Add feature X"` references the original PR.
95+
96+
```
97+
Revert "$pr_title"
98+
```
99+
100+
When a PR title matches this pattern, the following actions occur:
101+
1. The system extracts the original PR title from the revert PR (the value of `$pr_title`).
102+
2. It finds the earlier deployment that includes that original PR title.
103+
3. The current deployment (with the revert) is marked as the rollforward.
104+
4. The earlier deployment is marked as the change failure.
105+
106+
**Note**: If the original PR isn't found in any prior deployment, or if both the original PR and its revert are in the same deployment, no classification is applied.
107+
108+
### Static rules
109+
110+
Static rules classify rollforward deployments based on metadata patterns without using variables. These rules match broad indicators of remediation.
111+
112+
You can define regex rules that match specific types of metadata. The following table shows some example patterns you can use, but you may adjust them to fit your processes:
113+
114+
| Metadata Type | Example Regex Pattern | Description |
115+
|------------------|------------------------|-------------------------------------|
116+
| **PR title** | `.*rollforward.*` | Matches PR titles containing `rollforward` |
117+
| **PR label** | `.*hotfix.*` | Matches PR labels containing `hotfix` |
118+
| **PR branch name** | `recovery/.*` | Matches branch names starting with `recovery/`|
119+
| **Commit&nbsp;message** | `^Revert ".*"$ ` | Matches commit messages starting with `Revert` and ending with `"`|
120+
| **Version tag** | `.*_hotfix` | Matches version tags ending with `_hotfix` |
121+
122+
#### How static rule classification works
123+
124+
When a static rule matches a deployment, the following actions occur:
125+
1. The current deployment is marked as a rollforward.
126+
2. The immediately preceding deployment is marked as the change failure.
127+
128+
Use static rules for broad remediation indicators like hotfix labels, branch prefixes, or version tag conventions.
129+
130+
131+
### Default rules
132+
133+
Datadog provides default rules that are automatically enabled:
134+
135+
- **Revert PRs**: PR titles following revert naming conventions (for example, "Revert" referencing a prior PR) are treated as rollforwards. The earlier deployment containing the original change is marked as the change failure, using the variable-based linking rules described above.
136+
- **Hotfix indicators**: PR labels, titles, or branch names containing "hotfix" are treated as rollforwards, with the preceding deployment marked as the change failure.
137+
138+
These default rules are fully configurable in the [DORA metrics settings][1] page. They are intended as opinionated starting points that interpret common signals as likely rollforward activity. You should adapt the patterns (such as naming conventions, labels, or version tags) as needed to reflect your own workflows and improve accuracy over time.
139+
140+
## Further reading
141+
142+
{{< partial name="whats-next/whats-next.html" >}}
143+
144+
[1]: https://app.datadoghq.com/ci/settings/dora
444 KB
Loading

0 commit comments

Comments
 (0)