Skip to content

Commit cac9f98

Browse files
committed
docs(cpt): add alert examples
1 parent 7970719 commit cac9f98

File tree

2 files changed

+32
-28
lines changed

2 files changed

+32
-28
lines changed
24.4 KB
Loading

pages/cockpit/how-to/configure-alerts-for-scw-resources.mdx

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,14 @@ content:
88
categories:
99
- observability cockpit
1010
dates:
11-
validation: 2025-05-07
11+
validation: 2025-05-09
1212
posted: 2023-11-06
1313
---
1414

15-
This page shows you how to create alert rules in Grafana for monitoring Scaleway resources like Instances, Object Storage, Kubernetes, and Cockpit. It explains how to use the `Scaleway Metrics` data source, interpret metrics, set alert conditions, and activate alerts.
1615

17-
<Message type="important">
18-
Cockpit does not support Grafana's alerting system. This means that:
19-
- Grafana's built-in contact points and notification policies will not trigger any emails or notifications. You **must enable the Scaleway alert manager and create contact points to receive notifications**.
20-
- You must use the **Switch to data source-managed alert rule** button in Grafana, and use PromQL queries for alerting.
21-
</Message>
16+
Cockpit does not support Grafana-managed alerting. It integrates with Grafana to visualize metrics, but alerts are managed through the Scaleway alert manager. You should use Grafana only to define alert rules, not to evaluate or receive alert notifications. Once the conditions of your alert rule are met, the Scaleway alert manager evaluates the rule and sends a notification to the **contact points you have configured in the Scaleway console**.
17+
18+
This page shows you how to create alert rules in Grafana for monitoring Scaleway resources integrated with Cockpit, such as Instances, Object Storage, and Kubernetes. These alerts rely on Scaleway-provided metrics, which are preconfigured and available in the **Metrics browser** drop-down when using the **Scaleway Metrics data source** in the Grafana interface. This page explains how to use the `Scaleway Metrics` data source, interpret metrics, set alert conditions, and activate alerts.
2219

2320
<Macro id="requirements" />
2421

@@ -27,17 +24,17 @@ This page shows you how to create alert rules in Grafana for monitoring Scalewa
2724
- Scaleway resources you can monitor
2825
- [Created Grafana credentials](/cockpit/how-to/retrieve-grafana-credentials/) with the **Editor** role
2926
- [Enabled](/cockpit/how-to/enable-alert-manager/) the Scaleway alert manager
30-
- [Created](/cockpit/how-to/add-contact-points/) at least one contact point **in the Scaleway console**
27+
- [Created](/cockpit/how-to/add-contact-points/) at least one contact point **in the Scaleway console**, otherwise, alerts will not be delivered
3128
- Selected the **Scaleway Alerting** alert manager in Grafana
3229

33-
## Use data source managed alerts rules
30+
## Switch to data source managed alerts rules
3431

3532
Data source managed alerts rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system which is not supported by Cockpit.
3633

3734
1. [Log in to Grafana](/cockpit/how-to/access-grafana-and-managed-dashboards/) using your credentials.
3835
2. Click the **Toggle menu** then click **Alerting**.
3936
3. Click **Alert rules** and **+ New alert rule**.
40-
4. In the **Define query and alert condition** section, scroll to the **Grafana-managed alert rule** information banner and click **Switch to data source-managed alert rule**. You are redirected to the alert creation process.
37+
4. In the **Define query and alert condition** section, scroll to the **Grafana-managed alert rule** information banner and click **Switch to data source-managed alert rule**. This step is **required** because Cockpit does not support Grafana’s built-in alerting system, but only alerts configured and evaluated by the data source itself. You are redirected to the alert creation process.
4138
<Lightbox src="scaleway-switch-to-managed-alerts-button.webp" alt="" />
4239

4340
## Define your metric and alert conditions
@@ -53,7 +50,7 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
5350
3. Click the **Metrics browser** drop-down.
5451
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
5552
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
56-
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `instance_server_cpu_seconds_total` metric.
53+
4. Select the metric you want to configure an alert for. For example, `instance_server_cpu_seconds_total`.
5754
<Message type="tip">
5855
The `instance_server_cpu_seconds_total` metric records how many seconds of CPU time your Instance has used in total. It is helpful to detect unexpected CPU usage spikes.
5956
</Message>
@@ -65,15 +62,15 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
6562
```bash
6663
rate(instance_server_cpu_seconds_total{resource_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",resource_name="name-of-your-resource"}[5m]) > 0.1
6764
```
68-
<Lightbox src="scaleway-instance-grafana-alert.webp" alt="" />
6965
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
7066
<Message type="note">
7167
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
7268
</Message>
73-
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
74-
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
69+
10. Enter a namespace in the **Namespace** field to help you categorize and manage your alert, then click **Enter**.
70+
11. Enter a name in the **Group** field to help you categorize and manage your alert, then click **Enter**.
7571
12. Optionally, add a summary and a description.
76-
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
72+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert.
73+
14. Optionally, check that your configuration works by temporarily lowering the threshold. This will trigger the alert and your [contact point](/cockpit/concepts/#contact-points) should receive an email informing them that the alert is firing.
7774
</TabsTab>
7875
<TabsTab label="Object Storage bucket">
7976
The steps below explain how to create the metric selection and configure an alert condition that triggers when **the object count in your bucket exceeds a specific threshold**.
@@ -83,7 +80,7 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
8380
3. Click the **Metrics browser** drop-down.
8481
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
8582
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
86-
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `object_storage_bucket_objects_total` metric.
83+
4. Select the metric you want to configure an alert for. For example, `object_storage_bucket_objects_total`.
8784
<Message type="tip">
8885
The `object_storage_bucket_objects_total` metric indicates the total number of objects stored in a given Object Storage bucket. It is useful to monitor and control object growth in your bucket and avoid hitting thresholds.
8986
</Message>
@@ -98,10 +95,11 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
9895
<Message type="note">
9996
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
10097
</Message>
101-
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
102-
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
98+
10. Enter a namespace in the **Namespace** field to help you categorize and manage your alert, then click **Enter**.
99+
11. Enter a name in the **Group** field to help you categorize and manage your alert, then click **Enter**.
103100
12. Optionally, add a summary and a description.
104-
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
101+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert.
102+
14. Optionally, check that your configuration works by temporarily lowering the threshold. This will trigger the alert and your [contact point](/cockpit/concepts/#contact-points) should receive an email informing them that the alert is firing.
105103
</TabsTab>
106104
<TabsTab label="Kubernetes pod">
107105
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no new pod activity occurs, which could mean your cluster is stuck or unresponsive.**
@@ -111,7 +109,7 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
111109
3. Click the **Metrics browser** drop-down.
112110
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
113111
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
114-
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric.
112+
4. Select the metric you want to configure an alert for. For example, `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total`.
115113
<Message type="tip">
116114
The `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric represents the total number of pods currently running across all nodes in your Kubernetes cluster. It is helpful to monitor current pod consumption per node pool or cluster, and help track resource saturation or unexpected workload spikes.
117115
</Message>
@@ -126,10 +124,11 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
126124
<Message type="note">
127125
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
128126
</Message>
129-
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
130-
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
127+
10. Enter a namespace in the **Namespace** field to help you categorize and manage your alert, then click **Enter**.
128+
11. Enter a name in the **Group** field to help you categorize and manage your alert, then click **Enter**.
131129
12. Optionally, add a summary and a description.
132-
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
130+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert.
131+
14. Optionally, check that your configuration works by temporarily lowering the threshold. This will trigger the alert and your [contact point](/cockpit/concepts/#contact-points) should receive an email informing them that the alert is firing.
133132
</TabsTab>
134133
<TabsTab label="Cockpit logs">
135134
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no logs are stored for 5 minutes, which may indicate your app or system is broken**.
@@ -139,7 +138,7 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
139138
3. Click the **Metrics browser** drop-down.
140139
<Lightbox src="scaleway-metrics-browser.webp" alt="" />
141140
<Lightbox src="scaleway-metrics-displayed.webp" alt="" />
142-
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m` metric.
141+
4. Select the metric you want to configure an alert for. For example, `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m`.
143142
<Message type="tip">
144143
The `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m` metric represents the number of chunks (log storage blocks) have been written over the last 5 minutes for a specific resource. It is useful to monitor log ingestion activity and detect issues such as crash of the logging agent, or your application not producing logs.
145144
</Message>
@@ -154,15 +153,20 @@ Switch between the tabs below to create alerts for a Scaleway Instance, an Objec
154153
<Message type="note">
155154
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
156155
</Message>
157-
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
158-
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
156+
10. Enter a namespace in the **Namespace** field to help you categorize and manage your alert, then click **Enter**.
157+
11. Enter a name in the **Group** field to help you categorize and manage your alert, then click **Enter**.
159158
12. Optionally, add a summary and a description.
160-
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
159+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Your alert will start evaluating based on the rule you have defined.
160+
14. Optionally, check that your configuration works by temporarily lowering the threshold. This will trigger the alert and your [contact point](/cockpit/concepts/#contact-points) should receive an email informing them that the alert is firing.
161161
</TabsTab>
162162
</Tabs>
163163

164+
You can view your firing alerts in the **Alert rules** section of your Grafana (Home > Alerting > Alerts rules).
165+
166+
<Lightbox src="scaleway-alerts-firing.webp" alt="" />
167+
164168
<Message type="important">
165-
You can configure up to a maximum of 10 alerts for the `Scaleway Metrics` data source.
169+
You can configure up to a **maximum of 10 alerts** for the `Scaleway Metrics` data source.
166170
</Message>
167171

168172
<Message type="tip">

0 commit comments

Comments
 (0)