|
| 1 | +## Flows v2: An improved version of Netobserv eBPF Agent |
| 2 | + |
| 3 | +### What Changed? |
| 4 | +At the eBPF/TC code, the v1 used a ringbuffer to export flow records to the userspace program. |
| 5 | +Based on our measurements, ringbuffer can lead to a bottleneck since each a record for each packet in the data-path needs to be sent to the userspace, which eventually results in loss of records. |
| 6 | +Additionally, this leads to high CPU utilization since the userspace program would be constantly active to handle callback events on a per-packet basis. |
| 7 | +Refer to the [Measurements slide-deck](./measurements.pptx) for performance measurements. |
| 8 | +To tackle this and achieve 100% monitoring coverage, the v2 eBPF/TC code uses a Per-CPU Hash Map to aggregate flow-based records in the eBPF data-path, and pro-actively send the records to userspace upon flow termination. The detailed logic is below: |
| 9 | + |
| 10 | +#### eBPF Data-path Logic: |
| 11 | +1) Store flow information in a per-cpu hash map. The key of such map is the flow identification |
| 12 | +(addresses/ports, protocols, etc...) and the value are the flow metrics (packets, bytes and start/end time). |
| 13 | +On a higher level note, need to check if increasing the map size (hash computation part) affect throughput. |
| 14 | +2) Upon Packet Arrival, a lookup is performed on the map. |
| 15 | + * If the lookup is successful, then update the packet count, byte count, and the current timestamp. |
| 16 | + * If the lookup is unsuccessful, then try creating a new entry in the map. |
| 17 | +3) If entry creation failed due to a full map, then send the entry to userspace program via ringbuffer. |
| 18 | + |
| 19 | +##### Flow collisions |
| 20 | +A downside of the eBPF PerCPU HashMap implementation is that memory is not zeroed when an entry is |
| 21 | +removed. That causes that, after one entry is removed, if it is re-added again (or any other flow |
| 22 | +that goes into the same HashTable bucket), the new flow metrics would be added to the slot |
| 23 | +corresponding to the CPU that captured it, but the consecutive slots from other CPUs might contain |
| 24 | +data from old flows. |
| 25 | + |
| 26 | +To deal with it, we need to discard old flow entries (whose endTime is previous to the last |
| 27 | +flow eviction time) when we aggregate them at the userspace. |
| 28 | + |
| 29 | +#### User-space program Logic: (Refer [tracer.go](../pkg/ebpf/tracer.go)) |
| 30 | + |
| 31 | +The userspace program has two active threads: |
| 32 | + |
| 33 | +* **Periodically evict aggregated flows' map**. Every period (defined by the `CACHE_ACTIVE_TIMEOUT` |
| 34 | + configuration variable), the eBPF map that is updated from the kernel space is completely read |
| 35 | + and its entries are removed, then sent to FlowLogs-Pipeline (or any other ingestion service). |
| 36 | + |
| 37 | +* **Listen for flows ringbuffer**. When flows are received from the RingBuffer, they are aggregated |
| 38 | + at the user space before forwarding them periodically to the ingestion service. |
| 39 | + - Receiving a flow from the ringbuffer means that the eBPF aggregated map is full, so it also |
| 40 | + automatically triggers the eviction of the eBPF map to leave free space and minimize the usage |
| 41 | + of the ringbuffer (which, as explained before, is slower). |
| 42 | + |
| 43 | +##### Flow Collision handling in user-space |
| 44 | + |
| 45 | +Since the PerCPU HashMap stores one aggregated flow per each CPU, we need to aggregate all the |
| 46 | +partial flow entries in the user space before sending the complete flow, discarding the flow entries |
| 47 | +that might belong to old flow measurements (as explained in the kernel-side |
| 48 | +[flow collisions](#flow-collisions) section). |
0 commit comments