Skip to content

Commit 42977dc

Browse files
committed
docs(fix): update with new grafana version
1 parent 7e60bd6 commit 42977dc

File tree

3 files changed

+84
-75
lines changed

3 files changed

+84
-75
lines changed
141 KB
Loading
197 KB
Loading

pages/serverless-jobs/how-to/configure-alerts-jobs.mdx

Lines changed: 84 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@ title: How to configure alerts for a job
33
description: Learn how to add monitoring alerts to Serverless Jobs with Scaleway.
44
tags: jobs alerts grafana threshold monitoring cockpit
55
dates:
6-
validation: 2025-09-02
6+
validation: 2025-09-19
77
posted: 2025-02-10
88
---
99
import Requirements from '@macros/iam/requirements.mdx'
10+
import AdvancedOptionsGrafana from './assets/scaleway-advanced-options.webp'
11+
import DataSourceManaged from './assets/scaleway-datasource-managed.webp'
1012

1113
This page shows you how to configure alerts for Scaleway Serverless Jobs using Scaleway Cockpit and Grafana.
1214

@@ -17,137 +19,144 @@ This page shows you how to configure alerts for Scaleway Serverless Jobs using S
1719
- Scaleway resources you can monitor
1820
- [Created Grafana credentials](/cockpit/how-to/retrieve-grafana-credentials/) with the **Editor** role
1921
- [Enabled](/cockpit/how-to/enable-alert-manager/) the alert manager
20-
- [Created](/cockpit/how-to/add-contact-points/) at least one contact point
22+
- [Added](/cockpit/how-to/add-contact-points/) at least one contact in the Scaleway console or contact points in Grafana
2123
- Selected the **Scaleway Alerting** alert manager in Grafana
2224

2325
1. [Log in to Grafana](/cockpit/how-to/access-grafana-and-managed-dashboards/) using your credentials.
24-
2. Click the **Toggle menu** then click **Alerting**.
25-
3. Click **Alert rules** and **+ New alert rule**.
26-
4. Scroll down to the **Define query and alert condition** section and click **Switch to data source-managed alert rule**.
26+
2. Click the Grafana icon in the top left side of your screen to open the menu.
27+
3. Click the arrow next to **Alerting** on the left-side menu, then click **Alert rules**.
28+
4. Click **+ New alert rule**.
29+
5. Enter a name for your alert.
30+
6. In the **Define query and alert condition** section, toggle **Advanced options**.
31+
<Lightbox image={AdvancedOptionsGrafana} alt="" />
32+
7. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
33+
8. In the **Rule type** subsection, click the **Data source-managed** tab.
34+
<Lightbox image={DataSourceManaged} alt="" />
35+
2736
<Message type="important">
28-
This allows you to configure alert rules managed by the data source of your choice, instead of using Grafana's managed alert rules.
37+
Data source managed alert rules allow you to configure alerts managed by the data source of your choice, instead of using Grafana's managed alerting system **which is not supported by Cockpit**.
38+
This step is **mandatory** because Cockpit does not support Grafana’s built-in alerting system, but only alerts configured and evaluated by the data source itself.
2939
</Message>
40+
9. In the query field next to the **Loading metrics... >** button, select the metric you want to configure an alert for. Refer to the table below for details on each alert for Serverless Jobs.
3041

31-
5. Type in a name for your alert.
32-
6. Select the data source you want to configure alerts for. For the sake of this documentation, we are choosing the **Scaleway Metrics** data source.
33-
7. In the Metrics browser drop-down, select the metric you want to configure an alert for. Refer to the table below for details on each alert for Serverless Jobs.
34-
35-
**AnyJobError**
42+
**AnyJobError**
3643

37-
Pending period
44+
Pending period
3845

39-
: 5s
46+
: 5s
4047

41-
Summary
48+
Summary
4249

43-
: Job run `{{ $labels.resource_id }}` is in error.
50+
: Job run `{{ $labels.resource_id }}` is in error.
4451

45-
Query and alert condition
52+
Query and alert condition
4653

47-
: `(serverless_job_run:state_failed == 1)` OR `(serverless_job_run:state_internal_error == 1)`
54+
: `(serverless_job_run:state_failed == 1)` OR `(serverless_job_run:state_internal_error == 1)`
4855

49-
Description
56+
Description
5057

51-
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
58+
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
5259

53-
**JobError**
60+
**JobError**
5461

55-
Pending period
62+
Pending period
5663

57-
: 5s
64+
: 5s
5865

59-
Summary
66+
Summary
6067

61-
: Job run `{{ $labels.resource_id }}` is in error.
68+
: Job run `{{ $labels.resource_id }}` is in error.
6269

63-
Query and alert condition
70+
Query and alert condition
6471

65-
: `(serverless_job_run:state_failed{resource_name="your-job-name-here"} == 1)` OR `(serverless_job_run:state_internal_error{resource_name="your-job-name-here"} == 1)`
72+
: `(serverless_job_run:state_failed{resource_name="your-job-name-here"} == 1)` OR `(serverless_job_run:state_internal_error{resource_name="your-job-name-here"} == 1)`
6673

67-
Description
74+
Description
6875

69-
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
76+
: Job run `{{ $labels.resource_id }}` from the job definition `{{ $labels.resource_name }}` finish in error. Check the console to find out the error message.
7077

71-
**AnyJobHighCPUUsage**
78+
**AnyJobHighCPUUsage**
7279

73-
Pending period
80+
Pending period
7481

75-
: 10s
82+
: 10s
7683

77-
Summary
84+
Summary
7885

79-
: High CPU usage for job run `{{ $labels.resource_id }}`.
86+
: High CPU usage for job run `{{ $labels.resource_id }}`.
8087

81-
Query and alert condition
88+
Query and alert condition
8289

83-
: `serverless_job_run:cpu_usage_seconds_total:rate30s / serverless_job_run:cpu_limit * 100 > 90`
90+
: `serverless_job_run:cpu_usage_seconds_total:rate30s / serverless_job_run:cpu_limit * 100 > 90`
8491

85-
Description
92+
Description
8693

87-
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
94+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
8895

89-
**JobHighCPUUsage**
96+
**JobHighCPUUsage**
9097

91-
Pending period
98+
Pending period
9299

93-
: 10s
100+
: 10s
94101

95-
Summary
102+
Summary
96103

97-
: High CPU usage for job run `{{ $labels.resource_job definition }}`.
104+
: High CPU usage for job run `{{ $labels.resource_job definition }}`.
98105

99-
Query and alert condition
106+
Query and alert condition
100107

101-
: `serverless_job_run:cpu_usage_seconds_total:rate30s{resource_name="your-job-name-here"} / serverless_job_run:cpu_limit{resource_name="your-job-name-here"} * 100 > 90`
108+
: `serverless_job_run:cpu_usage_seconds_total:rate30s{resource_name="your-job-name-here"} / serverless_job_run:cpu_limit{resource_name="your-job-name-here"} * 100 > 90`
102109

103-
Description
110+
Description
104111

105-
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
112+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available CPU since 10s.
106113

107-
**AnyJobHighMemoryUsage**
114+
**AnyJobHighMemoryUsage**
108115

109-
Pending period
116+
Pending period
110117

111-
: 10s
118+
: 10s
112119

113-
Summary
120+
Summary
114121

115-
: High memory usage for job run `{{ $labels.resource_job definition }}`.
122+
: High memory usage for job run `{{ $labels.resource_job definition }}`.
116123

117-
Query and alert condition
124+
Query and alert condition
118125

119-
: `(serverless_job_run:memory_usage_bytes / serverless_job_run:memory_limit_bytes ) * 100 > 80`
126+
: `(serverless_job_run:memory_usage_bytes / serverless_job_run:memory_limit_bytes ) * 100 > 80`
120127

121-
Description
128+
Description
122129

123-
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
130+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
124131

125-
**JobHighMemoryUsage**
132+
**JobHighMemoryUsage**
126133

127-
Pending period
134+
Pending period
128135

129-
: 10s
136+
: 10s
130137

131-
Summary
138+
Summary
132139

133-
: High memory usage for job run `{{ $labels.resource_id }}`.
140+
: High memory usage for job run `{{ $labels.resource_id }}`.
134141

135-
Query and alert condition
142+
Query and alert condition
136143

137-
: `(serverless_job_run:memory_usage_bytes{resource_id="your-job-name-here"} / serverless_job_run:memory_limit_bytes{resource_id="your-job-name-here"}) * 100 > 80`
144+
: `(serverless_job_run:memory_usage_bytes{resource_id="your-job-name-here"} / serverless_job_run:memory_limit_bytes{resource_id="your-job-name-here"}) * 100 > 80`
138145

139-
Description
146+
Description
140147

141-
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
148+
: Job run `{{ $labels.resource_name }}` from the job definition `{{ $labels.resource_name }}` is using more than `{{ printf "%.0f" $value }}`% of its available RAM since 10s.
142149

143-
8. Select labels that apply to the metric you have selected in the previous step, to target your desired resources and fine-tune your alert.
144-
9. Select one or more values for your labels.
145-
10. Click **Use query** to generate your alert based on the conditions you have defined.
146-
11. Select a folder to store your rule, or create a new one. Folders allow you to easily manage your different rules.
147-
12. Select an evaluation group to add your rule to. Rules within the same group are evaluated sequentially over the same time interval.
148-
13. In the **Set alert evaluation behavior** field, configure the amount of time during which the alert can be in breach of the condition(s) you have defined until it triggers.
149-
<Message type="note">
150-
For example, if you wish to be alerted after your alert has been in breach of the condition for 2 minutes without interruption, type `2` and select `minutes` in the drop-down.
151-
</Message>
152-
14. Optionally, add a summary and a description.
153-
15. Click **Save rule** at the top right corner of your screen to save your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert has been triggered.
150+
10. Make sure that the values for the labels you have selected correspond to those of the target resource.
151+
11. In the **Set alert evaluation behavior** section, specify how long the condition must be met before triggering the alert.
152+
12. Enter a name in the **Namespace** and **Group** fields to categorize and manage your alert rules. Rules that share the same group will use the same configuration, including the evaluation interval which determines how often the rule is evaluated (by default: every 1 minute). You can modify this interval later in the group settings.
153+
<Message type="note">
154+
The evaluation interval is different from the pending period set in step 2. The evaluation interval controls how often the rule is checked, while the pending period defines how long the condition must be continuously met before the alert fires.
155+
</Message>
156+
13. In the **Configure labels and notifications** section, click **+ Add labels**. A pop-up appears.
157+
14. Enter a label and value name and click **Save**. You can skip this step if you want your alerts to be sent to the contacts you may already have created in the Scaleway console.
158+
<Message type="note">
159+
In Grafana, notifications are sent by matching alerts to notification policies based on labels. This step is about deciding how alerts will reach you or your team (Slack, email, etc.) based on labels you attach to them. Then, you can set up rules that define who receives notifications in the **Notification policies** page.
160+
Find out how to [configure notification policies in Grafana](/tutorials/configure-slack-alerting/#configuring-a-notification-policy).
161+
</Message>
162+
15. Click **Save rule and exit** in the top right corner of your screen to save and activate your alert. Once your alert meets the requirements you have configured, you will receive an email to inform you that your alert has been triggered.

0 commit comments

Comments
 (0)