Skip to content

Commit 2c5f33d

Browse files
Merge pull request #497 from MateSaary/development-guidelines
Add investigation graduation guideline steps
2 parents 6920b9b + ed0c24d commit 2c5f33d

File tree

2 files changed

+26
-9
lines changed

2 files changed

+26
-9
lines changed

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
### Test Coverage
1010
#### Guidelines for CAD investigations
1111
- New investgations should be accompanied by unit tests and/or step-by-step manual tests in the investigation README.
12-
- E2E testing is desired for actioning investigations. See README for more info on investigation graduation process.
12+
- Actioning investigations should be locally tested in staging, and E2E testing is desired. See [README](https://github.com/openshift/configuration-anomaly-detection/blob/main/README.md#graduating-an-investigation) for more info on investigation graduation process.
1313

1414
#### Test coverage checks
1515
- [ ] Added tests

README.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
- [Contributing](#contributing)
1212
- [Building](#building)
1313
- [Adding a new investigation](#adding-a-new-investigation)
14+
- [Graduating an investigation](#graduating-an-investigation)
1415
- [Testing locally](#testing-locally)
1516
- [Pre-requirements](#pre-requirements)
1617
- [Running cadctl for an incident ID](#running-cadctl-for-an-incident-id)
@@ -19,7 +20,6 @@
1920
- [Integrations](#integrations)
2021
- [Templates](#templates)
2122
- [Dashboards](#dashboards)
22-
- [Deployment](#deployment)
2323
- [Boilerplate](#boilerplate)
2424
- [PipelinePruner](#pipelinepruner)
2525
- [Required ENV variables](#required-env-variables)
@@ -71,6 +71,29 @@ To add a new alert investigation:
7171
- investigation.Resources contain initialized clients for the clusters aws environment, ocm and more. See [Integrations](#integrations)
7272
- Add test objects or scripts used to recreate the alert symptoms to the `pkg/investigations/$INVESTIGATION_NAME/testing/` directory for future use. Be sure to clearly document the testing procedure under the `Testing` section of the investigation-specific README.md file
7373

74+
### Graduating an investigation
75+
76+
New investigations and their remediation steps should be deployed in advancing stages through a progressive deployment strategy.
77+
78+
1. **Informing Stage (Read-only):**
79+
The investigation is merely informative through PagerDuty at this stage; remediation _**does not involve any write operations**_. Notes are collected throughout the investigation, and upon the investigation's conclusion are posted to PagerDuty.
80+
81+
**Aim:** Validating the investigation's accuracy and usefulness **without performing any write actions**.
82+
83+
**Validation Criteria:**
84+
* The investigation successfully carries out each step on it's respective incident type, on both staging and production environments.
85+
* It provides useful information (equivalent to a manual investigation) to SREs through PagerDuty.
86+
* The investigation should be accompanied by unit tests and/or step-by-step manual tests in the investigation's testing README, including:
87+
* A clear step-by-step process to manually test the investigation (e.g. cluster setup, other expected conditions).
88+
89+
2. **Actioning Stage (Read/Write):**
90+
The investigation's remediation capabilities, including **read and write** operations, are performed on all applicable clusters.
91+
92+
**Validation Criteria:**
93+
* The investigation is verified to conduct remediations on staging as expected.
94+
* The investigation should be locally tested in staging against a live alert.
95+
* E2E testing is desired for actioning investigations; the tests should cover the execution of remediative steps as well as verification of their effectiveness.
96+
7497
### Integrations
7598

7699
> **Note:** When writing an investiation, you can use them right away.
@@ -180,12 +203,6 @@ Investigation specific documentation can be found in the according investigation
180203
181204
Grafana dashboard configmaps are stored in the [Dashboards](./dashboards/) directory. See app-interface for further documentation on dashboards.
182205
183-
### Deployment
184-
185-
* [Tekton](./deploy/README.md) -- Installation/configuration of Tekton and triggering pipeline runs.
186-
* [Skip Webhooks](./deploy/skip-webhook/README.md) -- Skipping the eventlistener and creating the pipelinerun directly.
187-
* [Namespace](./deploy/namespace/README.md) -- Allowing the code to ignore the namespace.
188-
189206
### Boilerplate
190207
191208
* [Boilerplate](./boilerplate/openshift/osd-container-image/README.md) -- Conventions for OSD containers.
@@ -223,4 +240,4 @@ For Red Hat employees, these environment variables can be found in the SRE-P vau
223240
224241
- `LOG_LEVEL`: refers to the CAD log level, if not set, the default is `info`. See
225242
226-
- `CAD_HCM_AI_TOKEN`: required for requests to the ai model
243+
- `CAD_HCM_AI_TOKEN`: required for requests to the ai model

0 commit comments

Comments
 (0)