You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page shows you how to configure alerts for Scaleway resources in Grafana using the `Scaleway Metrics` data source.
15
+
This page shows you how to create alert rules in Grafana for monitoring Scaleway resources like Instances, Object Storage, Kubernetes, and Cockpit. It explains how to use the `Scaleway Metrics` data source, interpret metrics, set alert conditions, and activate alerts.
16
+
17
+
<Messagetype="important">
18
+
Cockpit does not support Grafana's alerting system. This means that:
19
+
- Grafana's built-in contact points and notification policies will not trigger any emails or notifications. You **must enable the Scaleway alert manager and create contact points to receive notifications**.
20
+
- You must use the **Switch to data source-managed alert rule** button in Grafana, and use PromQL queries for alerting.
21
+
</Message>
16
22
17
23
<Macroid="requirements" />
18
24
19
25
- A Scaleway account logged into the [console](https://console.scaleway.com)
20
26
-[Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
21
27
- Scaleway resources you can monitor
22
28
-[Created Grafana credentials](/cockpit/how-to/retrieve-grafana-credentials/) with the **Editor** role
23
-
-[Enabled](/cockpit/how-to/enable-alert-manager/) the alert manager, and [activated preconfigured alerts](/cockpit/how-to/activate-managed-alerts/)
24
-
-[Created](/cockpit/how-to/add-contact-points/) at least one contact point
29
+
-[Enabled](/cockpit/how-to/enable-alert-manager/) the Scaleway alert manager
30
+
-[Created](/cockpit/how-to/add-contact-points/) at least one contact point**in the Scaleway console**
25
31
- Selected the **Scaleway Alerting** alert manager in Grafana
26
32
33
+
## Use data source managed alerts rules
34
+
35
+
Data source managed alerts rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system which is not supported by Cockpit.
36
+
27
37
1.[Log in to Grafana](/cockpit/how-to/access-grafana-and-managed-dashboards/) using your credentials.
28
38
2. Click the **Toggle menu** then click **Alerting**.
29
39
3. Click **Alert rules** and **+ New alert rule**.
30
-
4. Scroll down to the **Define query and alert condition** section and click **Switch to data source-managed alert rule**.
31
-
<Messagetype="important">
32
-
This allows you to configure alert rules managed by the data source of your choice, instead of using Grafana's managed alert rules.
33
-
</Message>
34
-
5. Type in a name for your alert.
35
-
6. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
36
-
7. In the **Metrics browser** drop-down, select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `instance_server_cpu_seconds_total` metric.
37
-
8. Select labels that apply to the metric you have selected in the previous step, to target your desired resources and fine-tune your alert.
38
-
9. Select one or more values for your labels.
39
-
10. Click **Use query** to generate your alert based on the conditions you have defined. For example, the alert below will be triggered whenever **the rate of our Instance's total CPU usage per second exceeds 10%**.
11. In the **Set alert evaluation behavior** field, configure the amount of time during which the alert can be in breach of the condition(s) you have defined until it fires.
42
-
<Messagetype="note">
43
-
For example, if you wish to be alerted after your alert has been in breach of the condition for 2 minutes without interruption, type `2` and select `minutes` in the drop-down.
44
-
</Message>
45
-
12. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
46
-
13. Enter a name for your alert's group in the **Group** field and click **Enter**.
47
-
14. Optionally, add a summary and a description.
48
-
15. Click **Save rule** at the top right corner of your screen to save your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert is firing.
40
+
4. In the **Define query and alert condition** section, scroll to the **Grafana-managed alert rule** information banner and click **Switch to data source-managed alert rule**. You are redirected to the alert creation process.
Switch between the tabs below to create alerts for a Scaleway Instance, an Object Storage bucket, a Kubernetes cluster pod, or Cockpit logs.
46
+
47
+
<Tabsid="install">
48
+
<TabsTablabel="Scaleway Instance">
49
+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **your Instance consumes more than 10% of a single CPU core over the past 5 minutes.**
50
+
51
+
1. Type a name for your alert.
52
+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `instance_server_cpu_seconds_total` metric.
57
+
<Messagetype="tip">
58
+
The `instance_server_cpu_seconds_total` metric records how many seconds of CPU time your Instance has used in total. It is helpful to detect unexpected CPU usage spikes.
59
+
</Message>
60
+
5. Select the appropriate labels to filter your metric and target specific resources.
61
+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
64
+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_id` and `resource_name`) correspond to those of the target resource.
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
70
+
<Messagetype="note">
71
+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
72
+
</Message>
73
+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
74
+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
75
+
12. Optionally, add a summary and a description.
76
+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
77
+
</TabsTab>
78
+
<TabsTablabel="Object Storage bucket">
79
+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **the object count in your bucket exceeds a specific threshold**.
80
+
81
+
1. Type a name for your alert.
82
+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `object_storage_bucket_objects_total` metric.
87
+
<Messagetype="tip">
88
+
The `object_storage_bucket_objects_total` metric indicates the total number of objects stored in a given Object Storage bucket. It is useful to monitor and control object growth in your bucket and avoid hitting thresholds.
89
+
</Message>
90
+
5. Select the appropriate labels to filter your metric and target specific resources.
91
+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
92
+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
93
+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_id` and `region`) correspond to those of the target resource.
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
98
+
<Messagetype="note">
99
+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
100
+
</Message>
101
+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
102
+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
103
+
12. Optionally, add a summary and a description.
104
+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
105
+
</TabsTab>
106
+
<TabsTablabel="Kubernetes pod">
107
+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no new pod activity occurs, which could mean your cluster is stuck or unresponsive.**
108
+
109
+
1. Type a name for your alert.
110
+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric.
115
+
<Messagetype="tip">
116
+
The `kubernetes_cluster_k8s_shoot_nodes_pods_usage_total` metric represents the total number of pods currently running across all nodes in your Kubernetes cluster. It is helpful to monitor current pod consumption per node pool or cluster, and help track resource saturation or unexpected workload spikes.
117
+
</Message>
118
+
5. Select the appropriate labels to filter your metric and target specific resources.
119
+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
120
+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
121
+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_name`) correspond to those of the target resource.
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
126
+
<Messagetype="note">
127
+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
128
+
</Message>
129
+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
130
+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
131
+
12. Optionally, add a summary and a description.
132
+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
133
+
</TabsTab>
134
+
<TabsTablabel="Cockpit logs">
135
+
The steps below explain how to create the metric selection and configure an alert condition that triggers when **no logs are stored for 5 minutes, which may indicate your app or system is broken**.
136
+
137
+
1. Type a name for your alert.
138
+
2. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
4. Select the metric you want to configure an alert for. For the sake of this documentation, we are choosing the `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m` metric.
143
+
<Messagetype="tip">
144
+
The `observability_cockpit_loki_chunk_store_stored_chunks_total:increase5m` metric represents the number of chunks (log storage blocks) have been written over the last 5 minutes for a specific resource. It is useful to monitor log ingestion activity and detect issues such as crash of the logging agent, or your application not producing logs.
145
+
</Message>
146
+
5. Select the appropriate labels to filter your metric and target specific resources.
147
+
6. Choose values for your selected labels. The **Resulting selector** field displays your final query selector.
148
+
7. Click **Use query** to validate your metric selection. Your selection displays in the query field next to the **Metrics browser** button. This prepares it for use in the alert condition, which we will define in the next steps.
149
+
8. In the query field, paste the following query. Make sure that the values for the labels you have selected (for example, `resource_name`) correspond to those of the target resource.
9. In the **Set alert evaluation behavior** field, specify how long the condition must be true before triggering the alert.
154
+
<Messagetype="note">
155
+
For example, to wait until the condition has been met continuously for 5 minutes, type `5` and select `minutes` in the drop-down.
156
+
</Message>
157
+
10. Enter a namespace for your alert in the **Namespace** field and click **Enter**.
158
+
11. Enter a name for your alert's group in the **Group** field and click **Enter**.
159
+
12. Optionally, add a summary and a description.
160
+
13. Click **Save rule** in the top right corner of your screen to save and activate your alert. Once the alert meets the conditions you have configured, your [contact point](/cockpit/concepts/#contact-points) will receive an email informing them that the alert is firing.
161
+
</TabsTab>
162
+
</Tabs>
49
163
50
164
<Messagetype="important">
51
165
You can configure up to a maximum of 10 alerts for the `Scaleway Metrics` data source.
0 commit comments