OSDOCS-5543:Adding performance and resource recommendatiaons

skrthomas · skrthomas · commit a6e6923e55cd · 2023-08-09T17:50:27.000-04:00
diff --git a/modules/network-observability-resource-recommendations.adoc b/modules/network-observability-resource-recommendations.adoc
@@ -0,0 +1,20 @@
+//module included in the following assemblies:
+// * network_observability/configuring_operator.adoc
+
+:_content-type: REFERENCE
+[id="network-observability-resource-recommendations_{context}"]
+= Resource management and performance considerations
+
+The amount of resources required by Network Observability depends on the size of your cluster and your requirements for the cluster to ingest and store observability data. To manage resources and set performance criteria for your cluster, consider configuring the following settings. Configuring these settings might meet your optimal setup and observability needs.
+
+The following settings can help you manage resources and performance from the outset:
+   
+eBPF Sampling:: You can set the Sampling specification, `spec.agent.ebpf.sampling`, to manage resources. Smaller sampling values might consume a large amount of computational, memory and storage resources. You can mitigate this by specifying a sampling ratio value. A value of `100` means 1 flow every 100 is sampled. A value of `0` or `1` means all flows are captured. Smaller values result in an increase in returned flows and the accuracy of derived metrics. By default, eBPF sampling is set to a value of 50, so 1 flow every 50 is sampled. Note that more sampled flows also means more storage needed. Consider starting with the default values and refine empirically, in order to determine which setting your cluster can manage.
+
+Restricting or excluding interfaces::  Reduce the overall observed traffic by setting the values for `spec.agent.ebpf.interfaces` and `spec.agent.ebpf.excludeInterfaces`. By default, the agent fetches all the interfaces in the system, except the ones listed in `excludeInterfaces` and `lo` (local interface). Note that the interface names might vary according to the Container Network Interface (CNI) used.
+
+The following settings can be used to fine-tune performance after the Network Observability has been running for a while:
+
+Resource requirements and limits:: Adapt the resource requirements and limits to the load and memory usage you expect on your cluster by using the `spec.agent.ebpf.resources` and `spec.processor.resources` specifications. The default limits of 800MB might be sufficient for most medium-sized clusters.
+    
+Cache max flows timeout:: Control how often flows are reported by the agents by using the eBPF agent's `spec.agent.ebpf.cacheMaxFlows` and `spec.agent.ebpf.cacheActiveTimeout` specifications. A larger value results in less traffic being generated by the agents, which correlates with a lower CPU load. However, a larger value leads to a slightly higher memory consumption, and might generate more latency in the flow collection.
diff --git a/modules/network-observability-resources-table.adoc b/modules/network-observability-resources-table.adoc
@@ -0,0 +1,32 @@
+// Module included in the following assemblies:
+// * network_observability/configuring_operator.adoc
+
+:_content-type: REFERENCE
+[id="network-observability-resources-table_{context}"]
+= Resource considerations
+The following table outlines examples of resource considerations for clusters with certain workload sizes. 
+
+[IMPORTANT]
+====
+The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
+====
+
+.Resource recommendations
+[options="header"]
+|===
+|                                     | Extra small (10 nodes) | Small (25 nodes)  | Medium (65 nodes) ^[2]^ | Large (120 nodes) ^[2]^
+| *Worker Node vCPU and memory*       | 4 vCPUs\| 16GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem ^[1]^ | 16 vCPUs\| 64GiB mem  ^[1]^  |16 vCPUs\| 64GiB Mem ^[1]^ 
+| *LokiStack size*                    | `1x.extra-small`         | `1x.small`          | `1x.small`           | `1x.medium`
+| *Network Observability controller memory limit* | 400Mi (default)        | 400Mi (default)   | 400Mi (default)    | 800Mi                
+| *eBPF sampling rate*                | 50 (default)           | 50 (default)      | 50 (default)       | 50 (default)
+| *eBPF memory limit*                 | 800Mi (default)        | 800Mi (default)   | 2000Mi             | 800Mi (default) 
+| *FLP memory limit*                     | 800Mi (default)        | 800Mi (default)   | 800Mi (default)    | 800Mi (default)         
+| *FLP Kafka partitions*              | N/A                    | 48                | 48                 | 48            
+| *Kafka consumer replicas*           | N/A                    | 24                | 24                 | 24
+| *Kafka brokers*                     | N/A                    | 3 (default)       | 3 (default)        | 3 (default)
+|===
+[.small]
+--
+1. Tested with AWS M6i instances.
+2. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested. 
+--
diff --git a/networking/network_observability/configuring-operator.adoc b/networking/network_observability/configuring-operator.adoc
@@ -11,4 +11,6 @@ You can update the Flow Collector API resource to configure the Network Observab
 include::modules/network-observability-flowcollector-view.adoc[leveloffset=+1]
 include::modules/network-observability-flowcollector-kafka-config.adoc[leveloffset=+1]
 include::modules/network-observability-configuring-FLP-sampling.adoc[leveloffset=+1]
-include::modules/network-observability-configuring-quickfilters-flowcollector.adoc[leveloffset=+1]
+include::modules/network-observability-configuring-quickfilters-flowcollector.adoc[leveloffset=+1]
+include::modules/network-observability-resource-recommendations.adoc[leveloffset=+1]
+include::modules/network-observability-resources-table.adoc[leveloffset=+2]