You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance optimization using eBPF aggregation (#39)
* Add ebpf flows with per-cpu hashmap
* Add flags to perform eviction
* Add protocol to the evicted entry
* Add new headers to record
* tracer looksup the map on evicted entry
* Handle tcp reset flag
* Handle rst flag
* Enable timeout based eviction of ingress/egress maps
* ongoing performance measurements
* Add in logic in README
* Formatting changes to readme
* Format changes
* Correct the byte count
* Minor edits for tracer
* ebpf code with right byte count
* Update README.md
* Update README.md
* Latest measurements with multiflow and cpu,mem
* Add chart for throughput
* Cosmetic changes to measurements
* v6 support
* Logic for time calculation
* Cleanup of ebpf code
* Remove prints and cleanup
* Minor comments
* Remove unused lines
* bpf binary minor edit
* Alignment corrections
* Remove extra comments
Co-authored-by: Mario Macias <[email protected]>
* Refactor my_flow_id to id
* bug: Add direction variable while export
* Remove printf
* Handle hash collisions and improper deletions
* Remove stray debug entry
* Correct lint errors
* Add monotime module for CLOCK_MONOTONIC
* Use monotime instead of cgo
* Add monotime to go.sum
* Add monotime package to vendor
* tidy imports
* EvictionTimeout as duration
* fixed testmain
* Fixed tests
* fixing getPingFlows verification
* modify comment
* fix e2e test version
* Fix flow timing issue with reference time
* Tidy to fix lint errors
* fix errors in merge
* Remove TCP flag based eviction
* Remove TCP FIN/RST handling
* Remove redundant fields
* Simplify eBPF code optimizations (#49)
* simplifying ebpf agent
* version not really working well
* almost-working tests but I suspect that monotonic time could be doing bad stuff there
* not-yet-100% working tests
* reusing same bpfObjects for all the interfaces. That should decrease memory usage
* define max_entries at userspace
* avoid if/elses in C code map operations
* fixed compilation of unit tests
* wip: re-enable ring-buffer flows
* moved accounter inside tracer
* one single tracer for all the qdiscs and filters
* evict flows on ringbuffer
* Minor changes
* properly tested (and fixed) userspace accounter
* move eBPF system setup to flowtracer creation
* Discard old flows from being aggregated
* Fix zero-valued aggregated flows
* fix timestamp checking for flow discarding
* Unify ingress/egress maps
* Fixed build and test
* Updated generated eBPF binaries
* fix bug that caused that first flow could have start time == 0
Co-authored-by: Pravein <Pravein Govindan Kannan>
Co-authored-by: Mario Macias <[email protected]>
## Flows v2: An improved version of Netobserv eBPF Agent
2
+
3
+
### What Changed?
4
+
At the eBPF/TC code, the v1 used a ringbuffer to export flow records to the userspace program.
5
+
Based on our measurements, ringbuffer can lead to a bottleneck since each a record for each packet in the data-path needs to be sent to the userspace, which eventually results in loss of records.
6
+
Additionally, this leads to high CPU utilization since the userspace program would be constantly active to handle callback events on a per-packet basis.
7
+
Refer to the [Measurements slide-deck](../docs/measurements.pptx) for performance measurements.
8
+
To tackle this and achieve 100% monitoring coverage, the v2 eBPF/TC code uses a Per-CPU Hash Map to aggregate flow-based records in the eBPF data-path, and pro-actively send the records to userspace upon flow termination. The detailed logic is below:
9
+
10
+
#### eBPF Data-path Logic:
11
+
1) Store flow information in a per-cpu hash map. A separate per-cpu hash map is maintained for ingress and egress to avoid performance bottlenecks.
12
+
One design choice that needs to be concretized with performance measurements is to whether v4 and v6 IPs need to be maintained in the same map or a different one.
13
+
On a higher level note, need to check if increasing the map size (hash computation part) affect throughput.
14
+
2) Upon Packet Arrival, a lookup is performed on the map.
15
+
* If the lookup is successful, then update the packet count, byte count, and the current timestamp.
16
+
* If the lookup is unsuccessful, then try creating a new entry in the map.
17
+
18
+
3) If entry creation failed due to a full map, then send the entry to userspace program via ringbuffer.
19
+
4) Upon flow completion (tcp->fin/rst event), send the flow-id to userspace via ringbuffer.
20
+
21
+
##### Hash collisions
22
+
One downside of using hash-based map is, When flows are hashed to the per-cpu map, there is a possibility of hash collisions occuring which would make multiple different flows map into the same entry. As a result, it might lead to inaccurate flow entries. To handle hash collisions we do the following :
23
+
1) In each flow entry, we additionally maintain the full key/id.
24
+
2) Before a packet's id is updated to map, the key is additionally compared to check if there is another flow residing in the map.
25
+
3) If there is another flow, we do want to update the entry wrongly. Hence, we send the new packet entry directly to userspace via ringbuffer after updating a flag to inform of collision.
26
+
27
+
To detect and handle
28
+
#### User-space program Logic: (Refer [tracer.go](../pkg/ebpf/tracer.go))
29
+
The userspace program has three active threads:
30
+
31
+
1)**Trace** :
32
+
a) If the received flow-id is a flow completion (indicated via the flags) from eBPF data-path via ringbuffer and does the following:
33
+
* ScrubFlow : Performs lookup of the flow-id in the ingress/egress map and aggregates the metrics from different CPU specific counters. Then deletes the entry corresponding to the flow-id from the map.
34
+
* Exports the aggregated flow record to the accounter pipeline.
35
+
b) If the received flow-id is not a flow completion event, then just forward this record to accounter pipeline. It will be aggregated in future by accounter upon flow completion.
36
+
37
+
2)**MonitorIngress** :
38
+
This is a periodic thread which wakes up every n seconds and does the following :
39
+
a) Create a map iterator, and iterates over each entry in the map.
40
+
b) Evict an entry if the condition is met :
41
+
* If the timestamp of the last seen packet in the flow is more than m seconds ago.
42
+
* There are other options for eviction that can be implemented, either based on the packets/bytes observed. Or a more aggressive eviction if the map is k% full. These are further improvements that can be performed to fine-tune the map usage based on the scenario and use-case.
43
+
44
+
c) The evicted entry is aggregated into a flow-record and forwarded to the accounter pipeline.
45
+
46
+
3)**MonitorEgress** :
47
+
This is a period thread, which does the same task as MonitorIngress, but only the map is egress.
48
+
49
+
##### Hash Collision handling in user-space
50
+
Inspite of handling hash collisions in the eBPF datapath, there is still a chance of multiple flows mapping to the same map, since per-cpu map maintains a separate entries per-cpu. Hence, its possible that multiple flows from different CPUs can map into the same entry, but are in different buckets. Hence, during aggregation of entries, we check the key before aggregating the entries per-flow. Upon detection of such entries, we export the entry to accounter. Now since the flow key is stored along with each entry, we can recover such collided entries and send to accounter.
0 commit comments