You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page shows you how to configure alerts for Scaleway Serverless Jobs using Scaleway Cockpit and Grafana.
12
14
@@ -17,137 +19,144 @@ This page shows you how to configure alerts for Scaleway Serverless Jobs using S
17
19
- Scaleway resources you can monitor
18
20
-[Created Grafana credentials](/cockpit/how-to/retrieve-grafana-credentials/) with the **Editor** role
19
21
-[Enabled](/cockpit/how-to/enable-alert-manager/) the alert manager
20
-
-[Created](/cockpit/how-to/add-contact-points/) at least one contact point
22
+
-[Added](/cockpit/how-to/add-contact-points/) at least one contact in the Scaleway console or contact points in Grafana
21
23
- Selected the **Scaleway Alerting** alert manager in Grafana
22
24
23
25
1.[Log in to Grafana](/cockpit/how-to/access-grafana-and-managed-dashboards/) using your credentials.
24
-
2. Click the **Toggle menu** then click **Alerting**.
25
-
3. Click **Alert rules** and **+ New alert rule**.
26
-
4. Scroll down to the **Define query and alert condition** section and click **Switch to data source-managed alert rule**.
26
+
2. Click the Grafana icon in the top left side of your screen to open the menu.
27
+
3. Click the arrow next to **Alerting** on the left-side menu, then click **Alert rules**.
28
+
4. Click **+ New alert rule**.
29
+
5. Enter a name for your alert.
30
+
6. In the **Define query and alert condition** section, toggle **Advanced options**.
31
+
<Lightboximage={AdvancedOptionsGrafana}alt="" />
32
+
7. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
33
+
8. In the **Rule type** subsection, click the **Data source-managed** tab.
34
+
<Lightboximage={DataSourceManaged}alt="" />
35
+
27
36
<Messagetype="important">
28
-
This allows you to configure alert rules managed by the data source of your choice, instead of using Grafana's managed alert rules.
37
+
Data source managed alert rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system **which is not supported by Cockpit**.
38
+
This step is **mandatory** because Cockpit does not support Grafana’s built-in alerting system, but only alerts configured and evaluated by the data source itself.
29
39
</Message>
40
+
9. In the query field next to the **Loading metrics... >** button, select the metric you want to configure an alert for. Refer to the table below for details on each alert for Serverless Jobs.
30
41
31
-
5. Type in a name for your alert.
32
-
6. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
33
-
7. In the Metrics browser drop-down, select the metric you want to configure an alert for. Refer to the table below for details on each alert for Serverless Jobs.
34
-
35
-
**AnyJobError**
42
+
**AnyJobError**
36
43
37
-
Pending period
44
+
Pending period
38
45
39
-
: 5s
46
+
: 5s
40
47
41
-
Summary
48
+
Summary
42
49
43
-
: Job run `{{ $labels.resource_id }}` is in error.
50
+
: Job run `{{ $labels.resource_id }}` is in error.
44
51
45
-
Query and alert condition
52
+
Query and alert condition
46
53
47
-
: `(serverless_job_run:state_failed == 1)` OR `(serverless_job_run:state_internal_error == 1)`
54
+
: `(serverless_job_run:state_failed == 1)` OR `(serverless_job_run:state_internal_error == 1)`
48
55
49
-
Description
56
+
Description
50
57
51
-
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
58
+
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
52
59
53
-
**JobError**
60
+
**JobError**
54
61
55
-
Pending period
62
+
Pending period
56
63
57
-
: 5s
64
+
: 5s
58
65
59
-
Summary
66
+
Summary
60
67
61
-
: Job run `{{ $labels.resource_id }}` is in error.
68
+
: Job run `{{ $labels.resource_id }}` is in error.
62
69
63
-
Query and alert condition
70
+
Query and alert condition
64
71
65
-
: `(serverless_job_run:state_failed{resource_name="your-job-name-here"} == 1)` OR `(serverless_job_run:state_internal_error{resource_name="your-job-name-here"} == 1)`
72
+
: `(serverless_job_run:state_failed{resource_name="your-job-name-here"} == 1)` OR `(serverless_job_run:state_internal_error{resource_name="your-job-name-here"} == 1)`
66
73
67
-
Description
74
+
Description
68
75
69
-
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
76
+
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
70
77
71
-
**AnyJobHighCPUUsage**
78
+
**AnyJobHighCPUUsage**
72
79
73
-
Pending period
80
+
Pending period
74
81
75
-
: 10s
82
+
: 10s
76
83
77
-
Summary
84
+
Summary
78
85
79
-
: High CPU usage for job run `{{ $labels.resource_id }}`.
86
+
: High CPU usage for job run `{{ $labels.resource_id }}`.
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
94
+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
88
95
89
-
**JobHighCPUUsage**
96
+
**JobHighCPUUsage**
90
97
91
-
Pending period
98
+
Pending period
92
99
93
-
: 10s
100
+
: 10s
94
101
95
-
Summary
102
+
Summary
96
103
97
-
: High CPU usage for job run `{{ $labels.resource_job definition }}`.
104
+
: High CPU usage for job run `{{ $labels.resource_job definition }}`.
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
112
+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
106
113
107
-
**AnyJobHighMemoryUsage**
114
+
**AnyJobHighMemoryUsage**
108
115
109
-
Pending period
116
+
Pending period
110
117
111
-
: 10s
118
+
: 10s
112
119
113
-
Summary
120
+
Summary
114
121
115
-
: High memory usage for job run `{{ $labels.resource_job definition }}`.
122
+
: High memory usage for job run `{{ $labels.resource_job definition }}`.
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
130
+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
124
131
125
-
**JobHighMemoryUsage**
132
+
**JobHighMemoryUsage**
126
133
127
-
Pending period
134
+
Pending period
128
135
129
-
: 10s
136
+
: 10s
130
137
131
-
Summary
138
+
Summary
132
139
133
-
: High memory usage for job run `{{ $labels.resource_id }}`.
140
+
: High memory usage for job run `{{ $labels.resource_id }}`.
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
148
+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
142
149
143
-
8. Select labels that apply to the metric you have selected in the previous step, to target your desired resources and fine-tune your alert.
144
-
9. Select one or more values for your labels.
145
-
10. Click **Use query** to generate your alert based on the conditions you have defined.
146
-
11. Select a folder to store your rule, or create a new one. Folders allow you to easily manage your different rules.
147
-
12. Select an evaluation group to add your rule to. Rules within the same group are evaluated sequentially over the same time interval.
148
-
13. In the **Set alert evaluation behavior** field, configure the amount of time during which the alert can be in breach of the condition(s) you have defined until it triggers.
149
-
<Messagetype="note">
150
-
For example, if you wish to be alerted after your alert has been in breach of the condition for 2 minutes without interruption, type `2` and select `minutes` in the drop-down.
151
-
</Message>
152
-
14. Optionally, add a summary and a description.
153
-
15. Click **Save rule** at the top right corner of your screen to save your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert has been triggered.
150
+
10. Make sure that the values for the labels you have selected correspond to those of the target resource.
151
+
11. In the **Set alert evaluation behavior** section, specify how long the condition must be met before triggering the alert.
152
+
12. Enter a name in the **Namespace** and **Group** fields to categorize and manage your alert rules. Rules that share the same group will use the same configuration, including the evaluation interval which determines how often the rule is evaluated (by default: every 1 minute). You can modify this interval later in the group settings.
153
+
<Messagetype="note">
154
+
The evaluation interval is different from the pending period set in step 2. The evaluation interval controls how often the rule is checked, while the pending period defines how long the condition must be continuously met before the alert fires.
155
+
</Message>
156
+
13. In the **Configure labels and notifications** section, click **+ Add labels**. A pop-up appears.
157
+
14. Enter a label and value name and click **Save**. You can skip this step if you want your alerts to be sent to the contacts you may already have created in the Scaleway console.
158
+
<Messagetype="note">
159
+
In Grafana, notifications are sent by matching alerts to notification policies based on labels. This step is about deciding how alerts will reach you or your team (Slack, email, etc.) based on labels you attach to them. Then, you can set up rules that define who receives notifications in the **Notification policies** page.
160
+
Find out how to [configure notification policies in Grafana](/tutorials/configure-slack-alerting/#configuring-a-notification-policy).
161
+
</Message>
162
+
15. Click **Save rule and exit** in the top right corner of your screen to save and activate your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert has been triggered.
0 commit comments