Skip to content

Commit 7970719

Browse files
committed
docs(cpt): add info
1 parent 8d8f38c commit 7970719

File tree

6 files changed

+137
-23
lines changed

6 files changed

+137
-23
lines changed
98.3 KB
Loading
78.1 KB
Loading
30.7 KB
Loading
42.2 KB
Loading
102 KB
Loading

pages/cockpit/how-to/configure-alerts-for-scw-resources.mdx

Lines changed: 137 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,44 +8,158 @@ content:
88
categories:
99
- observability cockpit
1010
dates:
11-
validation: 2025-04-08
11+
validation: 2025-05-07
1212
posted: 2023-11-06
1313
---
1414

15-
This page shows you how to configure alerts for Scaleway resources in Grafana using the `Scaleway Metrics` data source.
15+
This page shows you how to create alert rules in Grafana for monitoring Scaleway resources like Instances, Object Storage, Kubernetes, and Cockpit. It explains how to use the `Scaleway Metrics` data source, interpret metrics, set alert conditions, and activate alerts.
16+
17+
<Message type="important">
18+
Cockpit does not support Grafana's alerting system. This means that:
19+
- Grafana's built-in contact points and notification policies will not trigger any emails or notifications. You **must enable the Scaleway alert manager and create contact points to receive notifications**.
20+
- You must use the **Switch to data source-managed alert rule** button in Grafana, and use PromQL queries for alerting.
21+
</Message>
1622

1723
<Macro id="requirements" />
1824

1925
- A Scaleway account logged into the [console](https://console.scaleway.com)
2026
- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
2127
- Scaleway resources you can monitor
2228
- [Created Grafana credentials](/cockpit/how-to/retrieve-grafana-credentials/) with the **Editor** role
23-
- [Enabled](/cockpit/how-to/enable-alert-manager/) the alert manager, and [activated preconfigured alerts](/cockpit/how-to/activate-managed-alerts/)
24-
- [Created](/cockpit/how-to/add-contact-points/) at least one contact point
29+
- [Enabled](/cockpit/how-to/enable-alert-manager/) the Scaleway alert manager
30+
- [Created](/cockpit/how-to/add-contact-points/) at least one contact point **in the Scaleway console**
2531
- Selected the **Scaleway Alerting** alert manager in Grafana
2632

33+
## Use data source managed alerts rules
34+
35+
Data source managed alerts rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system which is not supported by Cockpit.
36+
2737
1. [Log in to Grafana](/cockpit/how-to/access-grafana-and-managed-dashboards/) using your credentials.
2838
2. Click the **Toggle menu** then click **Alerting**.
2939
3. Click **Alert rules** and **+ New alert rule**.
30-
4. Scroll down to the **Define query and alert condition** section and click **Switch to data source-managed alert rule**.
31-
<Message type="important">
32-
This allows you to configure alert rules managed by the data source of your choice, instead of using Grafana's managed alert rules.
33-
</Message>
34-
5. Type in a name for your alert.
35-
6. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
36-
7. In the **Metrics browser** drop-down, select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `instance_server_cpu_seconds_total` metric.
37-
8. Select labels that apply to the metric you have selected in the previous step, to target your desired resources and fine-tune your alert.
38-
9. Select one or more values for your labels.
39-
10. Click **Use query** to generate your alert based on the conditions you have defined. For example, the alert below will be triggered whenever **the rate of our Instance's total CPU usage per second exceeds 10%**.
40-
<Lightbox src="scaleway-instance-grafana-alert.webp" alt="" />
41-
11. In the **Set alert evaluation behavior** field, configure the amount of time during which the alert can be in breach of the condition(s) you have defined until it fires.
42-
<Message type="note">
43-
For example, if you wish to be alerted after your alert has been in breach of the condition for 2 minutes without interruption, type `2` and select `minutes` in the drop-down.
44-
</Message>
45-
12. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
46-
13. Enter a name for your alert's group in the **Group** field and click **Enter**.
47-
14. Optionally, add a summary and a description.
48-
15. Click **Save rule** at the top right corner of your screen to save your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert is firing.
40+
4. In the **Define query and alert condition** section, scroll to the **Grafana-managed alert rule** information banner and click **Switch to data source-managed alert rule**. You are redirected to the alert creation process.
41+
<Lightbox src="scaleway-switch-to-managed-alerts-button.webp" alt="" />
42+
43+
## Define your metric and alert conditions
44+
45+
Switch between the tabs below to create alerts for a Scaleway Instance, an Object Storage bucket, a Kubernetes cluster pod, or Cockpit logs.
46+
47+
<Tabs id="install">
48+
<TabsTab label="Scaleway Instance">
49+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **your Instance consumes more than 10% of a single CPU core over the past 5 minutes.**
50+
51+
1. Type a name for your alert.
52+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
53+
3. Click the **Metrics browser** drop-down.
54+
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
55+
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
56+
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `instance_server_cpu_seconds_total` metric.
57+
<Message type="tip">
58+
The `instance_server_cpu_seconds_total` metric records how many seconds of CPU time your Instance has used in total. It is helpful to detect unexpected CPU usage spikes.
59+
</Message>
60+
5. Select the appropriate labels to filter your metric and target specific resources.
61+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
62+
<Lightbox src="scaleway-metric-selection.webp" alt="" />
63+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
64+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_id` and `resource_name`) correspond to those of the target resource.
65+
```bash
66+
rate(instance_server_cpu_seconds_total{resource_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",resource_name="name-of-your-resource"}[5m]) > 0.1
67+
```
68+
<Lightbox src="scaleway-instance-grafana-alert.webp" alt="" />
69+
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
70+
<Message type="note">
71+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
72+
</Message>
73+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
74+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
75+
12. Optionally, add a summary and a description.
76+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
77+
</TabsTab>
78+
<TabsTab label="Object Storage bucket">
79+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **the object count in your bucket exceeds a specific threshold**.
80+
81+
1. Type a name for your alert.
82+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
83+
3. Click the **Metrics browser** drop-down.
84+
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
85+
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
86+
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `object_storage_bucket_objects_total` metric.
87+
<Message type="tip">
88+
The `object_storage_bucket_objects_total` metric indicates the total number of objects stored in a given Object Storage bucket. It is useful to monitor and control object growth in your bucket and avoid hitting thresholds.
89+
</Message>
90+
5. Select the appropriate labels to filter your metric and target specific resources.
91+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
92+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
93+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_id` and `region`) correspond to those of the target resource.
94+
```bash
95+
object_storage_bucket_objects_total{region="fr-par", resource_id="my-bucket"} > 2000
96+
```
97+
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
98+
<Message type="note">
99+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
100+
</Message>
101+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
102+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
103+
12. Optionally, add a summary and a description.
104+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
105+
</TabsTab>
106+
<TabsTab label="Kubernetes pod">
107+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no new pod activity occurs, which could mean your cluster is stuck or unresponsive.**
108+
109+
1. Type a name for your alert.
110+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
111+
3. Click the **Metrics browser** drop-down.
112+
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
113+
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
114+
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric.
115+
<Message type="tip">
116+
The `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric represents the total number of pods currently running across all nodes in your Kubernetes cluster. It is helpful to monitor current pod consumption per node pool or cluster, and help track resource saturation or unexpected workload spikes.
117+
</Message>
118+
5. Select the appropriate labels to filter your metric and target specific resources.
119+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
120+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
121+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_name`) correspond to those of the target resource.
122+
```bash
123+
rate(kubernetes_cluster_k8s_shoot_nodes_pods_usage_total{resource_name="k8s-par-quizzical-chatelet"}[15m]) == 0
124+
```
125+
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
126+
<Message type="note">
127+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
128+
</Message>
129+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
130+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
131+
12. Optionally, add a summary and a description.
132+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
133+
</TabsTab>
134+
<TabsTab label="Cockpit logs">
135+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no logs are stored for 5 minutes, which may indicate your app or system is broken**.
136+
137+
1. Type a name for your alert.
138+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
139+
3. Click the **Metrics browser** drop-down.
140+
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
141+
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
142+
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m` metric.
143+
<Message type="tip">
144+
The `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m` metric represents the number of chunks (log storage blocks) have been written over the last 5 minutes for a specific resource. It is useful to monitor log ingestion activity and detect issues such as crash of the logging agent, or your application not producing logs.
145+
</Message>
146+
5. Select the appropriate labels to filter your metric and target specific resources.
147+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
148+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
149+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_name`) correspond to those of the target resource.
150+
```bash
151+
observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m{resource_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"} == 0
152+
```
153+
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
154+
<Message type="note">
155+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
156+
</Message>
157+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
158+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
159+
12. Optionally, add a summary and a description.
160+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
161+
</TabsTab>
162+
</Tabs>
49163

50164
<Message type="important">
51165
You can configure up to a maximum of 10 alerts for the `Scaleway Metrics` data source.

0 commit comments

Comments
 (0)