Merge pull request #497 from MateSaary/development-guidelines

openshift-merge-bot[bot] · web-flow · commit 2c5f33d8e84f · 2025-07-10T10:16:25.000Z
Add investigation graduation guideline steps
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -9,7 +9,7 @@
 ### Test Coverage
 #### Guidelines for CAD investigations
 - New investgations should be accompanied by unit tests and/or step-by-step manual tests in the investigation README.
-- E2E testing is desired for actioning investigations. See README for more info on investigation graduation process.
+- Actioning investigations should be locally tested in staging, and E2E testing is desired. See [README](https://github.com/openshift/configuration-anomaly-detection/blob/main/README.md#graduating-an-investigation) for more info on investigation graduation process.
 
 #### Test coverage checks
 - [ ] Added tests
diff --git a/README.md b/README.md
@@ -11,6 +11,7 @@
   - [Contributing](#contributing)
     - [Building](#building)
     - [Adding a new investigation](#adding-a-new-investigation)
+    - [Graduating an investigation](#graduating-an-investigation)
   - [Testing locally](#testing-locally)
     - [Pre-requirements](#pre-requirements)
     - [Running cadctl for an incident ID](#running-cadctl-for-an-incident-id)
@@ -19,7 +20,6 @@
     - [Integrations](#integrations)
     - [Templates](#templates)
     - [Dashboards](#dashboards)
-    - [Deployment](#deployment)
     - [Boilerplate](#boilerplate)
     - [PipelinePruner](#pipelinepruner)
     - [Required ENV variables](#required-env-variables)
@@ -71,6 +71,29 @@ To add a new alert investigation:
 - investigation.Resources contain initialized clients for the clusters aws environment, ocm and more. See [Integrations](#integrations)
 - Add test objects or scripts used to recreate the alert symptoms to the `pkg/investigations/$INVESTIGATION_NAME/testing/` directory for future use. Be sure to clearly document the testing procedure under the `Testing` section of the investigation-specific README.md file
 
+### Graduating an investigation
+
+New investigations and their remediation steps should be deployed in advancing stages through a progressive deployment strategy.
+
+1. **Informing Stage (Read-only):**
+    The investigation is merely informative through PagerDuty at this stage; remediation _**does not involve any write operations**_. Notes are collected throughout the investigation, and upon the investigation's conclusion are posted to PagerDuty.
+
+    **Aim:** Validating the investigation's accuracy and usefulness **without performing any write actions**.
+
+    **Validation Criteria:**
+    * The investigation successfully carries out each step on it's respective incident type, on both staging and production environments.
+    * It provides useful information (equivalent to a manual investigation) to SREs through PagerDuty.
+    * The investigation should be accompanied by unit tests and/or step-by-step manual tests in the investigation's testing README, including:
+        * A clear step-by-step process to manually test the investigation (e.g. cluster setup, other expected conditions).
+
+2. **Actioning Stage (Read/Write):**
+    The investigation's remediation capabilities, including **read and write** operations, are performed on all applicable clusters.
+
+    **Validation Criteria:**
+    * The investigation is verified to conduct remediations on staging as expected.
+    * The investigation should be locally tested in staging against a live alert.
+    * E2E testing is desired for actioning investigations; the tests should cover the execution of remediative steps as well as verification of their effectiveness.
+
 ### Integrations
 
 > **Note:** When writing an investiation, you can use them right away.
@@ -180,12 +203,6 @@ Investigation specific documentation can be found in the according investigation
 
 Grafana dashboard configmaps are stored in the [Dashboards](./dashboards/) directory. See app-interface for further documentation on dashboards.
 
-### Deployment
-
-* [Tekton](./deploy/README.md) -- Installation/configuration of Tekton and triggering pipeline runs.
-* [Skip Webhooks](./deploy/skip-webhook/README.md) -- Skipping the eventlistener and creating the pipelinerun directly.
-* [Namespace](./deploy/namespace/README.md) -- Allowing the code to ignore the namespace.
-
 ### Boilerplate
 
 * [Boilerplate](./boilerplate/openshift/osd-container-image/README.md) -- Conventions for OSD containers.
@@ -223,4 +240,4 @@ For Red Hat employees, these environment variables can be found in the SRE-P vau
 
 - `LOG_LEVEL`: refers to the CAD log level, if not set, the default is `info`. See
 
-- `CAD_HCM_AI_TOKEN`: required for requests to the ai model
+- `CAD_HCM_AI_TOKEN`: required for requests to the ai model