You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-9Lines changed: 20 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,8 +66,26 @@ The required investigation is identified by CAD based on the incident and its pa
66
66
As PagerDuty itself does not provide finer granularity for webhooks than service-based, CAD filters out the alerts it should investigate. For more information, please refer to https://support.pagerduty.com/docs/webhooks.
67
67
68
68
To add a new alert investigation:
69
-
- create a mapping for the alert to the `GetInvestigation` function in `mapping.go` and write a corresponding CAD investigation (e.g. `Investigate()` in `chgm.go`).
70
-
- if the alert is not yet routed to CAD, add a webhook to the service your alert fires on. For production, the service should also have an escalation policy that escalates to SRE on CAD automation timeout.
69
+
- Create a mapping for the alert in `registry.go` and write a corresponding CAD investigation (e.g. `Investigate()` in `chgm.go`).
70
+
- investigation.Resources contain initialized clients for the clusters aws environment, ocm and more. See [Integrations](#integrations)
71
+
72
+
### Integrations
73
+
74
+
> **Note:** When writing an investiation, you can use them right away.
75
+
They are initialized for you and passed to the investigation via investigation.Resources.
76
+
77
+
78
+
*[AWS](https://github.com/aws/aws-sdk-go) -- Logging into the cluster, retreiving instance info and AWS CloudTrail events.
79
+
- See `pkg/aws`
80
+
*[PagerDuty](https://github.com/PagerDuty/go-pagerduty) -- Retrieving alert info, esclating or silencing incidents, and adding notes.
81
+
- See `pkg/pagerduty`
82
+
*[OCM](https://github.com/openshift-online/ocm-sdk-go) -- Retrieving cluster info, sending service logs, and managing (post, delete) limited support reasons.
83
+
- See `pkg/ocm`
84
+
- In case of missing permissions to query an ocm resource, add it to the Configuration-Anomaly-Detection role in uhc-account-manager
85
+
*[osd-network-verifier](https://github.com/openshift/osd-network-verifier) -- Tool to verify the pre-configured networking components for ROSA and OSD CCS clusters.
86
+
*[k8sclient](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/client) -- Interact with clusters kube-api
87
+
- Requires RBAC definitions for your investigation to be added to `metadata.yaml`
88
+
71
89
72
90
## Testing locally
73
91
@@ -98,13 +116,6 @@ Every alert managed by CAD corresponds to an investigation, representing the exe
98
116
99
117
Investigation specific documentation can be found in the according investigation folder, e.g. for [ClusterHasGoneMissing](./pkg/investigations/chgm/README.md).
100
118
101
-
### Integrations
102
-
103
-
*[AWS](https://github.com/aws/aws-sdk-go) -- Logging into the cluster, retreiving instance info and AWS CloudTrail events.
104
-
*[PagerDuty](https://github.com/PagerDuty/go-pagerduty) -- Retrieving alert info, esclating or silencing incidents, and adding notes.
105
-
*[OCM](https://github.com/openshift-online/ocm-sdk-go) -- Retrieving cluster info, sending service logs, and managing (post, delete) limited support reasons.
106
-
*[osd-network-verifier](https://github.com/openshift/osd-network-verifier) -- Tool to verify the pre-configured networking components for ROSA and OSD CCS clusters.
0 commit comments