Skip to content

Commit bf41a71

Browse files
author
Mario Macias
authored
Modified eBPF implementation notes (#50)
1 parent c27bd51 commit bf41a71

File tree

2 files changed

+48
-50
lines changed

2 files changed

+48
-50
lines changed

bpf/README.md

Lines changed: 0 additions & 50 deletions
This file was deleted.

docs/ebpf_implementation.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
## Flows v2: An improved version of Netobserv eBPF Agent
2+
3+
### What Changed?
4+
At the eBPF/TC code, the v1 used a ringbuffer to export flow records to the userspace program.
5+
Based on our measurements, ringbuffer can lead to a bottleneck since each a record for each packet in the data-path needs to be sent to the userspace, which eventually results in loss of records.
6+
Additionally, this leads to high CPU utilization since the userspace program would be constantly active to handle callback events on a per-packet basis.
7+
Refer to the [Measurements slide-deck](./measurements.pptx) for performance measurements.
8+
To tackle this and achieve 100% monitoring coverage, the v2 eBPF/TC code uses a Per-CPU Hash Map to aggregate flow-based records in the eBPF data-path, and pro-actively send the records to userspace upon flow termination. The detailed logic is below:
9+
10+
#### eBPF Data-path Logic:
11+
1) Store flow information in a per-cpu hash map. The key of such map is the flow identification
12+
(addresses/ports, protocols, etc...) and the value are the flow metrics (packets, bytes and start/end time).
13+
On a higher level note, need to check if increasing the map size (hash computation part) affect throughput.
14+
2) Upon Packet Arrival, a lookup is performed on the map.
15+
* If the lookup is successful, then update the packet count, byte count, and the current timestamp.
16+
* If the lookup is unsuccessful, then try creating a new entry in the map.
17+
3) If entry creation failed due to a full map, then send the entry to userspace program via ringbuffer.
18+
19+
##### Flow collisions
20+
A downside of the eBPF PerCPU HashMap implementation is that memory is not zeroed when an entry is
21+
removed. That causes that, after one entry is removed, if it is re-added again (or any other flow
22+
that goes into the same HashTable bucket), the new flow metrics would be added to the slot
23+
corresponding to the CPU that captured it, but the consecutive slots from other CPUs might contain
24+
data from old flows.
25+
26+
To deal with it, we need to discard old flow entries (whose endTime is previous to the last
27+
flow eviction time) when we aggregate them at the userspace.
28+
29+
#### User-space program Logic: (Refer [tracer.go](../pkg/ebpf/tracer.go))
30+
31+
The userspace program has two active threads:
32+
33+
* **Periodically evict aggregated flows' map**. Every period (defined by the `CACHE_ACTIVE_TIMEOUT`
34+
configuration variable), the eBPF map that is updated from the kernel space is completely read
35+
and its entries are removed, then sent to FlowLogs-Pipeline (or any other ingestion service).
36+
37+
* **Listen for flows ringbuffer**. When flows are received from the RingBuffer, they are aggregated
38+
at the user space before forwarding them periodically to the ingestion service.
39+
- Receiving a flow from the ringbuffer means that the eBPF aggregated map is full, so it also
40+
automatically triggers the eviction of the eBPF map to leave free space and minimize the usage
41+
of the ringbuffer (which, as explained before, is slower).
42+
43+
##### Flow Collision handling in user-space
44+
45+
Since the PerCPU HashMap stores one aggregated flow per each CPU, we need to aggregate all the
46+
partial flow entries in the user space before sending the complete flow, discarding the flow entries
47+
that might belong to old flow measurements (as explained in the kernel-side
48+
[flow collisions](#flow-collisions) section).

0 commit comments

Comments
 (0)