-
Notifications
You must be signed in to change notification settings - Fork 50
WIP improve perfs #824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP improve perfs #824
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test ebpf-node-density-heavy-25nodes |
cd0eadf to
53d49f8
Compare
|
Add a python script to compare perfs: 53d49f8
I was expecting better performances improvments here but it still handle more flows and the ratio is showing improvments in terms of memory. WDYT @jotak ? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #824 +/- ##
==========================================
+ Coverage 29.74% 30.00% +0.25%
==========================================
Files 49 49
Lines 5355 4519 -836
==========================================
- Hits 1593 1356 -237
+ Misses 3645 3046 -599
Partials 117 117
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Comparing to last 3 runs shows better results: dda38e5
|
| // Interface already seen -> skip | ||
| return 0; | ||
|
|
||
| // Fast path: unroll loop for small array sizes (most common cases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we must measure that to see how much it improves CPU. The downside I see is that the code is less intuitive / readable, and also it's error prone if we decide to increase MAX_OBSERVED_INTERFACES (we'd need to add new "unrolled" blocks, which can easily be missed)
But optimizations often come with tradeoff so that might be ok, depending on the measured improvement
| flow_metrics new_flow = { | ||
| .if_index_first_seen = skb->ifindex, | ||
| .direction_first_seen = direction, | ||
| .packets = 1, | ||
| .bytes = len, | ||
| .eth_protocol = eth_protocol, | ||
| .start_mono_time_ts = pkt.current_ts, | ||
| .end_mono_time_ts = pkt.current_ts, | ||
| .flags = pkt.flags, | ||
| .dscp = pkt.dscp, | ||
| .sampling = flow_sampling, | ||
| .nb_observed_intf = 0 // Explicitly zero for clarity | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used to do that previously, and switched to individual assignments, iirc @msherif1234 found cases where that didn't work as intended, but can't remember what exactly. @msherif1234 do you remember?


Performance improvements
1. Lock-free flow updates (
bpf/flows.c,bpf/types.h)bpf_spin_lockfrom theflow_metricsstructure.__sync_fetch_and_add()forpacketsandbytes.Why: Reduces lock contention in the hot path; updates are safe with atomics.
2. Loop unrolling optimizations
add_observed_intf()function (bpf/flows.c)Why: Removes loop overhead; most flows see 1–2 interfaces.
md_already_exists()function (bpf/network_events_monitoring.h)Why: Eliminates loop overhead in network event checking.
3. Early IP filtering (
bpf/flows_filter.h,bpf/flows.c,bpf/utils.h)early_ip_filter_check()for IP-only rejection without L4 parsing.fill_ethhdr_l3only()— parses L2+L3 onlyparse_l4_after_l3()— parses L4 separatelyfill_iphdr_l3only()/fill_ip6hdr_l3only()— L3-only variantsWhy: Skips L4 parsing when IP-based filtering can reject packets early, reducing work.
4. Memory initialization optimizations (
bpf/flows.c)__builtin_memset()with explicit field initialization.flow_metrics new_flow = { ... }) and selective initialization.Why: Avoids unnecessary zeroing; compiler can optimize better.
5. Generated Go code updates
pkg/ebpf/bpf_*_bpfel.go(all architectures) to remove theLockfield.pkg/model/record_test.goto reflect the removed lock field in binary encoding tests.Overall impact
These changes target high-frequency paths:
Together, these reduce CPU cycles per packet, which should improve throughput in a high-traffic eBPF flow monitoring agent.
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.
To run a perfscale test, comment with:
/test ebpf-node-density-heavy-25nodes