Skip to content

Commit fb0ca08

Browse files
committed
RHDEVDOCS-4430 - w/ peer rev & qe feedback
1 parent 8c11823 commit fb0ca08

7 files changed

+202
-0
lines changed

logging/v5_6/logging-5-6-configuration.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ toc::[]
99
include::snippets/logging-crs-by-operator-snip.adoc[]
1010

1111
include::snippets/logging-supported-config-snip.adoc[]
12+
13+
include::modules/logging-loki-retention.adoc[leveloffset=+1]

logging/v5_7/logging-5-7-configuration.adoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,11 @@ toc::[]
99
include::snippets/logging-crs-by-operator-snip.adoc[]
1010

1111
include::snippets/logging-supported-config-snip.adoc[]
12+
13+
include::modules/logging-loki-retention.adoc[leveloffset=+1]
14+
15+
//include::modules/logging-loki-alerts.adoc[leveloffset=+1]
16+
17+
//[role="_additional-resources"]
18+
//.Additional resources
19+
//* xref:../../monitoring/enabling-alert-routing-for-user-defined-projects.html#enabling-a-separate-alertmanager-instance-for-user-defined-alert-routing_enabling-alert-routing-for-user-defined-projects[Enabling a separate alertmanager instance]

modules/logging-loki-alerts.adoc

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
// Module included in the following assemblies:
2+
// logging-5-7-configuration
3+
4+
:_content-type: PROCEDURE
5+
[id="logging-loki-alerts_{context}"]
6+
= Enabling log based alerts with Loki
7+
Loki alerting rules use link:https://grafana.com/docs/loki/latest/logql/[LogQL] and follow link:https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules[Prometheus formatting]. You can set log based alerts by creating an `AlertingRule` custom resource (CR). `AlertingRule` CRs may be created for `application`, `audit`, or `infrastructure` tenants.
8+
9+
[options="header"]
10+
|================================================
11+
| Tenant type | Valid namespaces
12+
| application |
13+
| audit | `openshift-logging`
14+
| infrastructure | `openshift-/*`, `kube-/*`, `default`
15+
|================================================
16+
17+
Application, Audit, and Infrastructure alerts are sent to the Cluster Monitoring Operator (CMO) Alertmanager in the `openshift-monitoring` namespace by default unless you have disabled the local `Alertmanager` instance.
18+
19+
Application alerts are not sent to the CMO Alertmanager in the `openshift-user-workload-monitoring` namespace by default unless you have enabled a separate `Alertmanager` instance.
20+
21+
The `AlertingRule` CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:
22+
23+
* If an `AlertingRule` CR includes an invalid `interval` period, it is an invalid alerting rule
24+
* If an `AlertingRule` CR includes an invalid `for` period, it is an invalid alerting rule.
25+
* If an `AlertingRule` CR includes an invalid LogQL `expr`, it is an invalid alerting rule.
26+
* If an `AlertingRule` CR includes two groups with the same name, it is an invalid alerting rule.
27+
* If none of above applies, an `AlertingRule` is considered a valid alerting rule.
28+
29+
.Prerequisites
30+
31+
* {logging-title-uc} Operator 5.7 and later
32+
* {product-title} 4.13 and later
33+
34+
.Procedure
35+
36+
1. Create an AlertingRule CR:
37+
38+
--
39+
include::snippets/logging-create-apply-cr-snip.adoc[lines=9..12]
40+
--
41+
42+
2. Populate your AlertingRule CR using the appropriate example below:
43+
44+
--
45+
include::snippets/logging-alertingrule-inf-callouts-snip.adoc[]
46+
--
47+
48+
--
49+
include::snippets/logging-alertingrule-app-callouts-snip.adoc[]
50+
--
51+
52+
3. Apply the CR.
53+
54+
--
55+
include::snippets/logging-create-apply-cr-snip.adoc[lines=14..17]
56+
--
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
.Example application AlertingRule CR
2+
[source,yaml]
3+
----
4+
apiVersion: loki.grafana.com/v1
5+
kind: AlertingRule
6+
metadata:
7+
name: app-user-workload
8+
namespace: app-ns <1>
9+
labels: <2>
10+
openshift.io/cluster-monitoring: "true"
11+
spec:
12+
tenantID: "application"
13+
groups:
14+
- name: AppUserWorkloadHighError
15+
rules:
16+
- alert:
17+
expr: | <3>
18+
sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
19+
for: 10s
20+
labels:
21+
severity: critical <4>
22+
annotations:
23+
summary: <5>
24+
description: <6>
25+
----
26+
<1> The `namespace` where this AlertingRule is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
27+
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
28+
<3> Value for `kubernetes_namespace_name:` must match the value for `metadata.namespace`.
29+
<4> Mandatory field. Must be `critical`, `warning`, or `info`.
30+
<5> Mandatory field. Summary of the rule.
31+
<6> Mandatory field. Detailed description of the rule.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
.Example infrastructure AlertingRule CR
2+
[source,yaml]
3+
----
4+
apiVersion: loki.grafana.com/v1
5+
kind: AlertingRule
6+
metadata:
7+
name: loki-operator-alerts
8+
namespace: openshift-operators-redhat <1>
9+
labels: <2>
10+
openshift.io/cluster-monitoring: "true"
11+
spec:
12+
tenantID: "infrastructure" <3>
13+
groups:
14+
- name: LokiOperatorHighReconciliationError
15+
rules:
16+
- alert: HighPercentageError
17+
expr: | <4>
18+
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
19+
/
20+
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
21+
> 0.01
22+
for: 10s
23+
labels:
24+
severity: critical <5>
25+
annotations:
26+
summary: High Loki Operator Reconciliation Errors <6>
27+
description: High Loki Operator Reconciliation Errors <7>
28+
----
29+
<1> The `namespace` where this AlertingRule is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
30+
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
31+
<3> AlertingRules for `infrastructure` tenants are only supported in the `openshift-\*`, `kube-\*`, or `default` namespaces.
32+
<4> Value for `kubernetes_namespace_name:` must match the value for `metadata.namespace`.
33+
<5> Mandatory field. Must be `critical`, `warning`, or `info`.
34+
<6> Mandatory field.
35+
<7> Mandatory field.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
.Example AlertingRule CR
2+
[source,yaml]
3+
----
4+
apiVersion: loki.grafana.com/v1
5+
kind: AlertingRule
6+
metadata:
7+
name: loki-operator-alerts
8+
namespace: openshift-operators-redhat <1>
9+
labels: <2>
10+
openshift.io/cluster-monitoring: "true"
11+
spec:
12+
tenantID: "infrastructure" <3> <4> <5>
13+
groups:
14+
- name: LokiOperatorHighReconciliationError
15+
rules:
16+
- alert: HighPercentageError
17+
expr: | <6>
18+
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
19+
/
20+
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
21+
> 0.01
22+
for: 10s
23+
labels:
24+
severity: critical <7>
25+
annotations:
26+
summary: High Loki Operator Reconciliation Errors <8>
27+
description: High Loki Operator Reconciliation Errors <9>
28+
----
29+
<1> The `namespace` where this AlertingRule is created must have a label matching the LokiStack `spec.rules.namespaceSelector` definition.
30+
<2> The `labels` block must match the LokiStack `spec.rules.selector` definition.
31+
<3> Must be `application`, `infrastructure`, or `audit`.
32+
<4> AlertingRules for `infrastructure` tenants are only supported in the `openshift-\*`, `kube-\*`, or `default` namespaces.
33+
<5> AlertingRules for `audit` tenants are only supported in the `openshift-logging` namespace.
34+
<6> Value for `kubernetes_namespace_name:` must match the value for `metadata.namespace`.
35+
<7> Mandatory field. Must be `critical`, `warning`, or `info`.
36+
<8> Mandatory field.
37+
<9> Mandatory field.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Text snippet included in the following assemblies:
2+
// Text snippet included in the following modules:
3+
4+
:_content-type: SNIPPET
5+
6+
.Example AlertingRule CR
7+
[source,yaml]
8+
----
9+
apiVersion: loki.grafana.com/v1
10+
kind: AlertingRule
11+
metadata:
12+
name: loki-operator-alerts
13+
namespace: openshift-operators-redhat
14+
labels:
15+
openshift.io/cluster-monitoring: "true"
16+
spec:
17+
tenantID: "infrastructure"
18+
groups:
19+
- name: LokiOperatorHighReconciliationError
20+
rules:
21+
- alert: HighPercentageError
22+
expr: |
23+
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
24+
/
25+
sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
26+
> 0.01
27+
for: 10s
28+
labels:
29+
severity: critical
30+
annotations:
31+
summary: High Loki Operator Reconciliation Errors
32+
description: High Loki Operator Reconciliation Errors
33+
----

0 commit comments

Comments
 (0)