Skip to content

Commit 6a5579b

Browse files
authored
Merge pull request #77522 from skrthomas/no-1.6-integration-ocpdoc-main
No 1.6 integration ocpdoc main
2 parents 885bac2 + b0b7617 commit 6a5579b

File tree

45 files changed

+2505
-275
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2505
-275
lines changed

_topic_maps/_topic_map.yml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2913,9 +2913,22 @@ Topics:
29132913
File: metrics-alerts-dashboards
29142914
- Name: Monitoring the Network Observability Operator
29152915
File: network-observability-operator-monitoring
2916-
- Name: API reference
2916+
- Name: Scheduling resources
2917+
File: network-observability-scheduling-resources
2918+
- Name: Network Observability CLI
2919+
Dir: netobserv_cli
2920+
Topics:
2921+
- Name: Installing the Network Observability CLI
2922+
File: netobserv-cli-install
2923+
- Name: Using the Network Observability CLI
2924+
File: netobserv-cli-using
2925+
- Name: Network Observability CLI reference
2926+
File: netobserv-cli-reference
2927+
- Name: FlowCollector API reference
29172928
File: flowcollector-api
2918-
- Name: JSON flows format reference
2929+
- Name: FlowMetric API reference
2930+
File: flowmetric-api
2931+
- Name: Flows format reference
29192932
File: json-flows-format-reference
29202933
- Name: Troubleshooting Network Observability
29212934
File: troubleshooting-network-observability

modules/network-observability-RTT-overview.adoc

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,24 +5,24 @@
55
:_mod-docs-content-type: CONCEPT
66
[id="network-observability-RTT-overview_{context}"]
77
= Round-Trip Time
8-
You can use TCP handshake Round-Trip Time (RTT) to analyze network flows. You can use RTT captured from the `fentry/tcp_rcv_established` eBPF hookpoint to read SRTT from the TCP socket to help with the following:
8+
You can use TCP smoothed Round-Trip Time (sRTT) to analyze network flow latencies. You can use RTT captured from the `fentry/tcp_rcv_established` eBPF hookpoint to read sRTT from the TCP socket to help with the following:
99

1010

11-
* Network Monitoring: Gain insights into TCP handshakes, helping
11+
* Network Monitoring: Gain insights into TCP latencies, helping
1212
network administrators identify unusual patterns, potential bottlenecks, or
1313
performance issues.
1414
* Troubleshooting: Debug TCP-related issues by tracking latency and identifying
1515
misconfigurations.
1616
17-
By default, when RTT is enabled, you can see the following TCP handshake RTT metrics represented in the *Overview*:
17+
By default, when RTT is enabled, you can see the following TCP RTT metrics represented in the *Overview*:
1818

19-
* Top X 90th percentile TCP handshake Round Trip Time with overall
20-
* Top X average TCP handshake Round Trip Time with overall
21-
* Bottom X minimum TCP handshake Round Trip Time with overall
19+
* Top X 90th percentile TCP Round Trip Time with overall
20+
* Top X average TCP Round Trip Time with overall
21+
* Bottom X minimum TCP Round Trip Time with overall
2222
2323
Other RTT panels can be added in *Manage panels*:
2424

25-
* Top X maximum TCP handshake Round Trip Time with overall
26-
* Top X 99th percentile TCP handshake Round Trip Time with overall
25+
* Top X maximum TCP Round Trip Time with overall
26+
* Top X 99th percentile TCP Round Trip Time with overall
2727
2828
See the _Additional Resources_ in this section for more information about enabling and working with this view.

modules/network-observability-RTT.adoc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ metadata:
2323
name: cluster
2424
spec:
2525
namespace: netobserv
26-
deploymentModel: Direct
2726
agent:
2827
type: eBPF
2928
ebpf:
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
//Module included in the following assemblies:
2+
//
3+
// observability/network_observability/netobserv_cli/netobserv-cli-using.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="network-observability-cli-capturing-flows_{context}"]
7+
= Capturing flows
8+
9+
You can capture flows and filter on any resource or zone in the data to solve use cases, such as displaying Round-Trip Time (RTT) between two zones. Table visualization in the CLI provides viewing and flow search capabilities.
10+
11+
.Prerequisites
12+
* Install the {oc-first}.
13+
* Install the Network Observability CLI (`oc netobserv`) plugin.
14+
15+
.Procedure
16+
. Capture flows with filters enabled by running the following command:
17+
+
18+
[source,terminal]
19+
----
20+
$ oc netobserv flows --enable_filter=true --action=Accept --cidr=0.0.0.0/0 --protocol=TCP --port=49051
21+
----
22+
. Add filters to the `live table filter` prompt in the terminal to further refine the incoming flows. For example:
23+
+
24+
[source,terminal]
25+
----
26+
live table filter: [SrcK8S_Zone:us-west-1b] press enter to match multiple regular expressions at once
27+
----
28+
. To stop capturing, press kbd:[Ctrl+C]. The data that was captured is written to two separate files in an `./output` directory located in the same path used to install the CLI.
29+
. View the captured data in the `./output/flow/<capture_date_time>.json` JSON file, which contains JSON arrays of the captured data.
30+
+
31+
.Example JSON file
32+
[source,json]
33+
----
34+
{
35+
"AgentIP": "10.0.1.76",
36+
"Bytes": 561,
37+
"DnsErrno": 0,
38+
"Dscp": 20,
39+
"DstAddr": "f904:ece9:ba63:6ac7:8018:1e5:7130:0",
40+
"DstMac": "0A:58:0A:80:00:37",
41+
"DstPort": 9999,
42+
"Duplicate": false,
43+
"Etype": 2048,
44+
"Flags": 16,
45+
"FlowDirection": 0,
46+
"IfDirection": 0,
47+
"Interface": "ens5",
48+
"K8S_FlowLayer": "infra",
49+
"Packets": 1,
50+
"Proto": 6,
51+
"SrcAddr": "3e06:6c10:6440:2:a80:37:b756:270f",
52+
"SrcMac": "0A:58:0A:80:00:01",
53+
"SrcPort": 46934,
54+
"TimeFlowEndMs": 1709741962111,
55+
"TimeFlowRttNs": 121000,
56+
"TimeFlowStartMs": 1709741962111,
57+
"TimeReceived": 1709741964
58+
}
59+
----
60+
. You can use SQLite to inspect the `./output/flow/<capture_date_time>.db` database file. For example:
61+
.. Open the file by running the following command:
62+
+
63+
[source,terminal]
64+
----
65+
$ sqlite3 ./output/flow/<capture_date_time>.db
66+
----
67+
68+
.. Query the data by running a SQLite `SELECT` statement, for example:
69+
+
70+
[source,terminal]
71+
----
72+
sqlite> SELECT DnsLatencyMs, DnsFlagsResponseCode, DnsId, DstAddr, DstPort, Interface, Proto, SrcAddr, SrcPort, Bytes, Packets FROM flow WHERE DnsLatencyMs >10 LIMIT 10;
73+
----
74+
+
75+
.Example output
76+
[source,terminal]
77+
----
78+
12|NoError|58747|10.128.0.63|57856||17|172.30.0.10|53|284|1
79+
11|NoError|20486|10.128.0.52|56575||17|169.254.169.254|53|225|1
80+
11|NoError|59544|10.128.0.103|51089||17|172.30.0.10|53|307|1
81+
13|NoError|32519|10.128.0.52|55241||17|169.254.169.254|53|254|1
82+
12|NoError|32519|10.0.0.3|55241||17|169.254.169.254|53|254|1
83+
15|NoError|57673|10.128.0.19|59051||17|172.30.0.10|53|313|1
84+
13|NoError|35652|10.0.0.3|46532||17|169.254.169.254|53|183|1
85+
32|NoError|37326|10.0.0.3|52718||17|169.254.169.254|53|169|1
86+
14|NoError|14530|10.0.0.3|58203||17|169.254.169.254|53|246|1
87+
15|NoError|40548|10.0.0.3|45933||17|169.254.169.254|53|174|1
88+
----
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
//Module included in the following assemblies:
2+
//
3+
// observability/network_observability/netobserv_cli/netobserv-cli-using.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="network-observability-cli-capturing-packets_{context}"]
7+
= Capturing packets
8+
You can capture packets using the Network Observability CLI.
9+
10+
.Prerequisites
11+
* Install the {oc-first}.
12+
* Install the Network Observability CLI (`oc netobserv`) plugin.
13+
14+
.Procedure
15+
. Run the packet capture with filters enabled:
16+
+
17+
[source,terminal]
18+
----
19+
$ oc netobserv packets --filter=tcp,80
20+
----
21+
. Add filters to the `live table filter` prompt in the terminal to refine the incoming packets. An example filter is as follows:
22+
+
23+
[source,terminal]
24+
----
25+
live table filter: [SrcK8S_Zone:us-west-1b] press enter to match multiple regular expressions at once
26+
----
27+
. To stop capturing, press kbd:[Ctrl+C].
28+
. View the captured data, which is written to a single file in an `./output/pcap` directory located in the same path that was used to install the CLI:
29+
.. The `./output/pcap/<capture_date_time>.pcap` file can be opened with Wireshark.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
// Module included in the following assemblies:
2+
//
3+
// network_observability/metrics-alerts-dashboards.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="network-observability-configuring-custom-metrics_{context}"]
7+
= Configuring custom metrics by using FlowMetric API
8+
You can configure the `FlowMetric` API to create custom metrics by using flowlogs data fields as Prometheus labels. You can add multiple `FlowMetric` resources to a project to see multiple dashboard views.
9+
10+
.Procedure
11+
12+
. In the web console, navigate to *Operators* -> *Installed Operators*.
13+
. In the *Provided APIs* heading for the *NetObserv Operator*, select *FlowMetric*.
14+
. In the *Project:* dropdown list, select the project of the Network Observability Operator instance.
15+
. Click *Create FlowMetric*.
16+
. Configure the `FlowMetric` resource, similar to the following sample configurations:
17+
+
18+
.Generate a metric that tracks ingress bytes received from cluster external sources
19+
[%collapsible]
20+
====
21+
[source,yaml]
22+
----
23+
apiVersion: flows.netobserv.io/v1alpha1
24+
kind: FlowMetric
25+
metadata:
26+
name: flowmetric-cluster-external-ingress-traffic
27+
namespace: netobserv <1>
28+
spec:
29+
metricName: cluster_external_ingress_bytes_total <2>
30+
type: Counter <3>
31+
valueField: Bytes
32+
direction: Ingress <4>
33+
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType] <5>
34+
filters: <6>
35+
- field: SrcSubnetLabel
36+
matchType: Absence
37+
----
38+
<1> The `FlowMetric` resources need to be created in the namespace defined in the `FlowCollector` `spec.namespace`, which is `netobserv` by default.
39+
<2> The name of the Prometheus metric, which in the web console appears with the prefix `netobserv-<metricName>`.
40+
<3> The `type` specifies the type of metric. The `Counter` `type` is useful for counting bytes or packets.
41+
<4> The direction of traffic to capture. If not specified, both ingress and egress are captured, which can lead to duplicated counts.
42+
<5> Labels define what the metrics look like and the relationship between the different entities and also define the metrics cardinality. For example, `SrcK8S_Name` is a high cardinality metric.
43+
<6> Refines results based on the listed criteria. In this example, selecting only the cluster external traffic is done by matching only flows where `SrcSubnetLabel` is absent. This assumes the subnet labels feature is enabled (via `spec.processor.subnetLabels`), which is done by default.
44+
45+
.Verification
46+
. Once the pods refresh, navigate to *Observe* -> *Metrics*.
47+
. In the *Expression* field, type the metric name to view the corresponding result. You can also enter an expression, such as `topk(5, sum(rate(netobserv_cluster_external_ingress_bytes_total{DstK8S_Namespace="my-namespace"}[2m])) by (DstK8S_HostName, DstK8S_OwnerName, DstK8S_OwnerType))`
48+
====
49+
+
50+
.Show RTT latency for cluster external ingress traffic
51+
[%collapsible]
52+
====
53+
[source,yaml]
54+
----
55+
apiVersion: flows.netobserv.io/v1alpha1
56+
kind: FlowMetric
57+
metadata:
58+
name: flowmetric-cluster-external-ingress-rtt
59+
namespace: netobserv <1>
60+
spec:
61+
metricName: cluster_external_ingress_rtt_seconds
62+
type: Histogram <2>
63+
valueField: TimeFlowRttNs
64+
direction: Ingress
65+
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
66+
filters:
67+
- field: SrcSubnetLabel
68+
matchType: Absence
69+
- field: TimeFlowRttNs
70+
matchType: Presence
71+
divider: "1000000000" <3>
72+
buckets: [".001", ".005", ".01", ".02", ".03", ".04", ".05", ".075", ".1", ".25", "1"] <4>
73+
----
74+
<1> The `FlowMetric` resources need to be created in the namespace defined in the `FlowCollector` `spec.namespace`, which is `netobserv` by default.
75+
<2> The `type` specifies the type of metric. The `Histogram` `type` is useful for a latency value (`TimeFlowRttNs`).
76+
<3> Since the Round-trip time (RTT) is provided as nanos in flows, use a divider of 1 billion to convert into seconds, which is standard in Prometheus guidelines.
77+
<4> The custom buckets specify precision on RTT, with optimal precision ranging between 5ms and 250ms.
78+
79+
.Verification
80+
. Once the pods refresh, navigate to *Observe* -> *Metrics*.
81+
. In the *Expression* field, you can type the metric name to view the corresponding result.
82+
====
83+
84+
85+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
// Module included in the following assemblies:
2+
//
3+
// network_observability/metrics-alerts-dashboards.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="network-observability-custom-metrics_{context}"]
7+
= Custom metrics
8+
You can create custom metrics out of the flowlogs data using the `FlowMetric` API. In every flowlogs data that is collected, there are a number of fields labeled per log, such as source name and destination name. These fields can be leveraged as Prometheus labels to enable the customization of cluster information on your dashboard.

modules/network-observability-dns-tracking.adoc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ metadata:
2727
name: cluster
2828
spec:
2929
namespace: netobserv
30-
deploymentModel: Direct
3130
agent:
3231
type: eBPF
3332
ebpf:
@@ -36,7 +35,7 @@ spec:
3635
sampling: 1 <2>
3736
----
3837
<1> You can set the `spec.agent.ebpf.features` parameter list to enable DNS tracking of each network flow in the web console.
39-
<2> You can set `sampling` to a value of `1` for more accurate metrics.
38+
<2> You can set `sampling` to a value of `1` for more accurate metrics and to capture *DNS latency*. For a `sampling` value greater than 1, you can observe flows with *DNS Response Code* and *DNS Id*, and it is unlikely that *DNS Latency* can be observed.
4039

4140
. When you refresh the *Network Traffic* page, there are new DNS representations you can choose to view in the *Overview* and *Traffic Flow* views and new filters you can apply.
4241
.. Select new DNS choices in *Manage panels* to display graphical visualizations and DNS metrics in the *Overview*.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
// Module included in the following assemblies:
2+
// * network_observability/network-observability-operator-monitoring.adoc
3+
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="network-observability-netobserv-dashboard-ebpf-agent-alerts_{context}"]
6+
= Using the eBPF agent alert
7+
8+
An alert, `NetObservAgentFlowsDropped`, is triggered when the Network Observability eBPF agent hashmap table is full or when the capacity limiter is triggered. If you see this alert, consider increasing the `cacheMaxFlows` in the `FlowCollector`, as shown in the following example.
9+
10+
[NOTE]
11+
====
12+
Increasing the `cacheMaxFlows` might increase the memory usage of the eBPF agent.
13+
====
14+
15+
.Procedure
16+
17+
. In the web console, navigate to *Operators* -> *Installed Operators*.
18+
19+
. Under the *Provided APIs* heading for the *Network Observability Operator*, select *Flow Collector*.
20+
21+
. Select *cluster*, and then select the *YAML* tab.
22+
23+
. Increase the `spec.agent.ebpf.cacheMaxFlows` value, as shown in the following YAML sample:
24+
[source,yaml]
25+
----
26+
apiVersion: flows.netobserv.io/v1beta2
27+
kind: FlowCollector
28+
metadata:
29+
name: cluster
30+
spec:
31+
namespace: netobserv
32+
deploymentModel: Direct
33+
agent:
34+
type: eBPF
35+
ebpf:
36+
cacheMaxFlows: 200000 <1>
37+
----
38+
<1> Increase the `cacheMaxFlows` value from its value at the time of the `NetObservAgentFlowsDropped` alert.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
// Module included in the following assemblies:
2+
//
3+
// network_observability/observing-network-traffic.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="network-observability-ebpf-flow-rule-filter_{context}"]
7+
= eBPF flow rule filter
8+
You can use rule-based filtering to control the volume of packets cached in the eBPF flow table. For example, a filter can specify that only packets coming from port 100 should be recorded. Then only the packets that match the filter are cached and the rest are not cached.
9+
10+
[id="ingress-and-egress-traffic-filtering_{context}"]
11+
== Ingress and egress traffic filtering
12+
CIDR notation efficiently represents IP address ranges by combining the base IP address with a prefix length. For both ingress and egress traffic, the source IP address is first used to match filter rules configured with CIDR notation. If there is a match, then the filtering proceeds. If there is no match, then the destination IP is used to match filter rules configured with CIDR notation.
13+
14+
After matching either the source IP or the destination IP CIDR, you can pinpoint specific endpoints using the `peerIP` to differentiate the destination IP address of the packet. Based on the provisioned action, the flow data is either cached in the eBPF flow table or not cached.
15+
16+
[id="dashboard-and-metrics-integrations_{context}"]
17+
== Dashboard and metrics integrations
18+
When this option is enabled, the *Netobserv/Health* dashboard for *eBPF agent statistics* now has the *Filtered flows rate* view. Additionally, in *Observe* -> *Metrics* you can query `netobserv_agent_filtered_flows_total` to observe metrics with the reason in *FlowFilterAcceptCounter*, *FlowFilterNoMatchCounter* or *FlowFilterRecjectCounter*.

0 commit comments

Comments
 (0)