Skip to content

Commit 8012330

Browse files
authored
Merge pull request #63276 from skrthomas/OSDOCS-6754
OSDOCS-6754: Network Observability easier configuration
2 parents 8241acc + ea323a5 commit 8012330

7 files changed

+93
-7
lines changed

modules/network-observability-flowcollector-view.adoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,14 +57,15 @@ spec:
5757
type: configmap
5858
name: loki-gateway-ca-bundle
5959
certFile: service-ca.crt
60+
namespace: loki-namespace # <5>
6061
consolePlugin:
6162
register: true
6263
logLevel: info
6364
portNaming:
6465
enable: true
6566
portNames:
6667
"3100": loki
67-
quickFilters: <5>
68+
quickFilters: <6>
6869
- name: Applications
6970
filter:
7071
src_namespace!: 'openshift-,netobserv'
@@ -87,4 +88,5 @@ spec:
8788
<2> You can set the Sampling specification, `spec.agent.ebpf.sampling`, to manage resources. Lower sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of 100 means 1 flow every 100 is sampled. A value of 0 or 1 means all flows are captured. The lower the value, the increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. It is recommend to start with default values and refine empirically, to determine which setting your cluster can manage.
8889
<3> The optional specifications `spec.processor.logTypes`, `spec.processor.conversationHeartbeatInterval`, and `spec.processor.conversationEndTimeout` can be set to enable conversation tracking. When enabled, conversation events are queryable in the web console. The values for `spec.processor.logTypes` are as follows: `FLOWS` `CONVERSATIONS`, `ENDED_CONVERSATIONS`, or `ALL`. Storage requirements are highest for `ALL` and lowest for `ENDED_CONVERSATIONS`.
8990
<4> The Loki specification, `spec.loki`, specifies the Loki client. The default values match the Loki install paths mentioned in the Installing the Loki Operator section. If you used another installation method for Loki, specify the appropriate client information for your install.
90-
<5> The `spec.quickFilters` specification defines filters that show up in the web console. The `Application` filter keys,`src_namespace` and `dst_namespace`, are negated (`!`), so the `Application` filter shows all traffic that _does not_ originate from, or have a destination to, any `openshift-` or `netobserv` namespaces. For more information, see Configuring quick filters below.
91+
<5> The original certificates are copied to the Network Observability instance namespace and watched for updates. When not provided, the namespace defaults to be the same as "spec.namespace". If you chose to install Loki in a different namespace, you must specify it in the `spec.loki.tls.caCert.namespace` field. Similarly, the `spec.exporters.kafka.tls.caCert.namespace` field is available for Kafka installed in a different namespace.
92+
<6> The `spec.quickFilters` specification defines filters that show up in the web console. The `Application` filter keys,`src_namespace` and `dst_namespace`, are negated (`!`), so the `Application` filter shows all traffic that _does not_ originate from, or have a destination to, any `openshift-` or `netobserv` namespaces. For more information, see Configuring quick filters below.

modules/network-observability-lokistack-create.adoc

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ It is recommended to deploy the LokiStack in the same namespace referenced by th
2020
kind: LokiStack
2121
metadata:
2222
name: loki
23-
namespace: netobserv
23+
namespace: netobserv <1>
2424
spec:
2525
size: 1x.small
2626
storage:
@@ -30,11 +30,12 @@ It is recommended to deploy the LokiStack in the same namespace referenced by th
3030
secret:
3131
name: loki-s3
3232
type: s3
33-
storageClassName: gp3 <1>
33+
storageClassName: gp3 <2>
3434
tenants:
3535
mode: openshift-network
3636
----
37-
<1> Use a storage class name that is available on the cluster for `ReadWriteOnce` access mode. You can use `oc get storageclasses` to see what is available on your cluster.
37+
<1> The installation examples in this documentation use the same namespace, `netobserv`, across all components. You can optionally use a different namespace.
38+
<2> Use a storage class name that is available on the cluster for `ReadWriteOnce` access mode. You can use `oc get storageclasses` to see what is available on your cluster.
3839
+
3940
[IMPORTANT]
4041
====
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
//
2+
// network_observability/configuring-operator.adoc
3+
4+
:_content-type: CONCEPT
5+
[id="network-observability-netobserv-dashboard-rate-limit-alerts_{context}"]
6+
= Creating Loki rate limit alerts for the NetObserv dashboard
7+
You can create custom rules for the *Netobserv* dashboard metrics to trigger alerts when Loki rate limits have been reached.
8+
9+
An example of an alerting rule configuration YAML file is as follows:
10+
[source,yaml]
11+
----
12+
apiVersion: monitoring.coreos.com/v1
13+
kind: PrometheusRule
14+
metadata:
15+
name: loki-alerts
16+
namespace: openshift-operators-redhat
17+
spec:
18+
groups:
19+
- name: LokiRateLimitAlerts
20+
rules:
21+
- alert: LokiTenantRateLimit
22+
annotations:
23+
message: |-
24+
{{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
25+
summary: "At any number of requests are responded with the rate limit error code."
26+
expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
27+
for: 10s
28+
labels:
29+
severity: warning
30+
----
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
// Module included in the following assemblies:
2+
3+
// * networking/network_observability/troubleshooting-network-observability.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="network-observability-troubleshooting-loki-tenant-rate-limit_{context}"]
7+
= LokiStack rate limit errors
8+
A rate-limit placed on the Loki tenant can result in potential temporary loss of data and a 429 error: `Per stream rate limit exceeded (limit:xMB/sec) while attempting to ingest for stream`. You might consider having an alert set to notify you of this error. For more information, see "Creating Loki rate limit alerts for the NetObserv dashboard" in the Additional resources of this section.
9+
10+
You can update the LokiStack CRD with the `perStreamRateLimit` and `perStreamRateLimitBurst` specifications, as shown in the following procedure.
11+
12+
.Procedure
13+
. Navigate to *Operators* -> *Installed Operators*, viewing *All projects* from the *Project* dropdown.
14+
. Look for *Loki Operator*, and select the *LokiStack* tab.
15+
. Create or edit an existing *LokiStack* instance using the *YAML view* to add the `perStreamRateLimit` and `perStreamRateLimitBurst` specifications:
16+
+
17+
[source, yaml]
18+
----
19+
apiVersion: loki.grafana.com/v1
20+
kind: LokiStack
21+
metadata:
22+
name: loki
23+
namespace: netobserv
24+
spec:
25+
limits:
26+
global:
27+
ingestion:
28+
perStreamRateLimit: 6 <1>
29+
perStreamRateLimitBurst: 30 <2>
30+
tenants:
31+
mode: openshift-network
32+
managementState: Managed
33+
----
34+
<1> The default value for `perStreamRateLimit` is `3`.
35+
<2> The default value for `perStreamRateLimitBurst` is `15`.
36+
37+
. Click *Save*.
38+
39+
.Verification
40+
Once you update the `perStreamRateLimit` and `perStreamRateLimitBurst` specifications, the pods in your cluster restart and the 429 rate-limit error no longer occurs.

networking/network_observability/installing-operators.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ include::modules/network-observability-without-loki.adoc[leveloffset=+1]
2222
2323
include::modules/network-observability-loki-install.adoc[leveloffset=+1]
2424
include::modules/network-observability-loki-secret.adoc[leveloffset=+2]
25+
[role="_additional-resources"]
26+
.Additional resources
27+
* For more information about the option to use different namespaces for the separate components, see the `spec.loki.tls.caCert.namespace` specification in the xref:../network_observability/flowcollector-api.adoc#network-observability-flowcollector-api-specifications_network_observability[Flow Collector API Reference] and callout number 5 in the xref:../network_observability/configuring-operator.adoc#network-observability-flowcollector-view_network_observability[Flow Collector sample resource].
28+
2529
include::modules/network-observability-lokistack-create.adoc[leveloffset=+2]
2630
include::modules/network-observability-lokistack-ingestion-query.adoc[leveloffset=+2]
2731
include::modules/network-observability-auth-multi-tenancy.adoc[leveloffset=+1]

networking/network_observability/network-observability-operator-monitoring.adoc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,9 @@ You can use the web console to monitor alerts related to the health of the Netwo
1010

1111

1212
include::modules/network-observability-viewing-alerts.adoc[leveloffset=+1]
13-
include::modules/network-observability-disabling-health-alerts.adoc[leveloffset=+2]
13+
include::modules/network-observability-disabling-health-alerts.adoc[leveloffset=+2]
14+
include::modules/network-observability-rate-limit-alert.adoc[leveloffset=+1]
15+
16+
[role="_additional-resources"]
17+
.Additional resources
18+
* For more information about creating alerts that you can see on the dashboard, see xref:../../monitoring/managing-alerts.adoc#creating-alerting-rules-for-user-defined-projects_managing-alerts[Creating alerting rules for user-defined projects].

networking/network_observability/troubleshooting-network-observability.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,8 @@ include::modules/troubleshooting-network-observability-flowlogs-pipeline-kafka.a
1616

1717
include::modules/troubleshooting-network-observability-network-flow.adoc[leveloffset=+1]
1818

19-
include::modules/troubleshooting-network-observability-controller-manager-pod-out-of-memory.adoc[leveloffset=+1]
19+
include::modules/troubleshooting-network-observability-controller-manager-pod-out-of-memory.adoc[leveloffset=+1]
20+
21+
== Resource troubleshooting
22+
23+
include::modules/troubleshooting-network-observability-loki-tenant-rate-limit.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)