WIP improve perfs #824

jpinsonneau · 2025-11-03T14:08:51Z

Performance improvements

1. Lock-free flow updates (`bpf/flows.c`, `bpf/types.h`)

Removed bpf_spin_lock from the flow_metrics structure.
Replaced with atomic operations:
- __sync_fetch_and_add() for packets and bytes.
- Direct writes for other fields (idempotent or acceptable occasional races).

Why: Reduces lock contention in the hot path; updates are safe with atomics.

2. Loop unrolling optimizations

`add_observed_intf()` function (`bpf/flows.c`)

Unrolled the loop for up to 6 interfaces.
Direct index comparisons instead of a loop.
Early exits for common cases (0–3 interfaces).

Why: Removes loop overhead; most flows see 1–2 interfaces.

`md_already_exists()` function (`bpf/network_events_monitoring.h`)

Unrolled the loop for the 4-element array.
Direct comparisons for all positions.

Why: Eliminates loop overhead in network event checking.

3. Early IP filtering (`bpf/flows_filter.h`, `bpf/flows.c`, `bpf/utils.h`)

Added early_ip_filter_check() for IP-only rejection without L4 parsing.
Split parsing into:
- fill_ethhdr_l3only() — parses L2+L3 only
- parse_l4_after_l3() — parses L4 separately
- fill_iphdr_l3only() / fill_ip6hdr_l3only() — L3-only variants

Why: Skips L4 parsing when IP-based filtering can reject packets early, reducing work.

4. Memory initialization optimizations (`bpf/flows.c`)

Replaced __builtin_memset() with explicit field initialization.
Uses designated initializers (flow_metrics new_flow = { ... }) and selective initialization.
Initialize only necessary fields; compiler handles the rest.

Why: Avoids unnecessary zeroing; compiler can optimize better.

5. Generated Go code updates

Updated pkg/ebpf/bpf_*_bpfel.go (all architectures) to remove the Lock field.
Updated pkg/model/record_test.go to reflect the removed lock field in binary encoding tests.

Overall impact

These changes target high-frequency paths:

Lock-free updates — reduces contention.
Loop unrolling — removes loop overhead.
Early filtering — skips unnecessary L4 parsing.
Better initialization — fewer unnecessary memory operations.

Together, these reduce CPU cycles per packet, which should improve throughput in a high-traffic eBPF flow monitoring agent.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

To run a perfscale test, comment with: /test ebpf-node-density-heavy-25nodes

openshift-ci · 2025-11-03T14:08:55Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-11-03T14:09:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mariomac for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jpinsonneau · 2025-11-03T14:09:01Z

/test ebpf-node-density-heavy-25nodes

jpinsonneau · 2025-11-04T10:27:11Z

Add a python script to compare perfs: 53d49f8

I was expecting better performances improvments here but it still handle more flows and the ratio is showing improvments in terms of memory.

WDYT @jotak ?
cc @msherif1234 you way me interested too 😸

codecov · 2025-11-04T10:32:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 30.00%. Comparing base (e9ebab7) to head (dda38e5).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #824      +/-   ##
==========================================
+ Coverage   29.74%   30.00%   +0.25%     
==========================================
  Files          49       49              
  Lines        5355     4519     -836     
==========================================
- Hits         1593     1356     -237     
+ Misses       3645     3046     -599     
  Partials      117      117

Flag	Coverage Δ
unittests	`30.00% <ø> (+0.25%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
pkg/ebpf/bpf_x86_bpfel.go	`0.00% <ø> (ø)`

... and 47 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jpinsonneau · 2025-11-04T11:00:56Z

Comparing to last 3 runs shows better results: dda38e5

jotak · 2025-11-04T11:42:58Z

bpf/flows.c

-            // Interface already seen -> skip
-            return 0;
+
+    // Fast path: unroll loop for small array sizes (most common cases)


I think we must measure that to see how much it improves CPU. The downside I see is that the code is less intuitive / readable, and also it's error prone if we decide to increase MAX_OBSERVED_INTERFACES (we'd need to add new "unrolled" blocks, which can easily be missed)
But optimizations often come with tradeoff so that might be ok, depending on the measured improvement

jotak · 2025-11-04T11:47:37Z

bpf/flows.c

+        flow_metrics new_flow = {
+            .if_index_first_seen = skb->ifindex,
+            .direction_first_seen = direction,
+            .packets = 1,
+            .bytes = len,
+            .eth_protocol = eth_protocol,
+            .start_mono_time_ts = pkt.current_ts,
+            .end_mono_time_ts = pkt.current_ts,
+            .flags = pkt.flags,
+            .dscp = pkt.dscp,
+            .sampling = flow_sampling,
+            .nb_observed_intf = 0 // Explicitly zero for clarity
+        };


we used to do that previously, and switched to individual assignments, iirc @msherif1234 found cases where that didn't work as intended, but can't remember what exactly. @msherif1234 do you remember?

openshift-ci bot added the do-not-merge/work-in-progress label Nov 3, 2025

jpinsonneau added 2 commits November 3, 2025 16:25

improve perfs

f4c9649

add perf comparaison script

53d49f8

jpinsonneau force-pushed the perfs_improvments branch from cd0eadf to 53d49f8 Compare November 4, 2025 10:25

compare to last x runs

dda38e5

jotak reviewed Nov 4, 2025

View reviewed changes

kernel stats

45b2534

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP improve perfs #824

WIP improve perfs #824

Uh oh!

jpinsonneau commented Nov 3, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Nov 3, 2025

Uh oh!

openshift-ci bot commented Nov 3, 2025

Uh oh!

jpinsonneau commented Nov 3, 2025

Uh oh!

jpinsonneau commented Nov 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

jpinsonneau commented Nov 4, 2025

Uh oh!

jotak Nov 4, 2025

Uh oh!

jotak Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP improve perfs #824

Are you sure you want to change the base?

WIP improve perfs #824

Uh oh!

Conversation

jpinsonneau commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance improvements

1. Lock-free flow updates (bpf/flows.c, bpf/types.h)

2. Loop unrolling optimizations

add_observed_intf() function (bpf/flows.c)

md_already_exists() function (bpf/network_events_monitoring.h)

3. Early IP filtering (bpf/flows_filter.h, bpf/flows.c, bpf/utils.h)

4. Memory initialization optimizations (bpf/flows.c)

5. Generated Go code updates

Overall impact

Dependencies

Checklist

Uh oh!

openshift-ci bot commented Nov 3, 2025

Uh oh!

openshift-ci bot commented Nov 3, 2025

Uh oh!

jpinsonneau commented Nov 3, 2025

Uh oh!

jpinsonneau commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jpinsonneau commented Nov 4, 2025

Uh oh!

jotak Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jotak Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jpinsonneau commented Nov 3, 2025 •

edited

Loading

1. Lock-free flow updates (`bpf/flows.c`, `bpf/types.h`)

`add_observed_intf()` function (`bpf/flows.c`)

`md_already_exists()` function (`bpf/network_events_monitoring.h`)

3. Early IP filtering (`bpf/flows_filter.h`, `bpf/flows.c`, `bpf/utils.h`)

4. Memory initialization optimizations (`bpf/flows.c`)

jpinsonneau commented Nov 4, 2025 •

edited

Loading

codecov bot commented Nov 4, 2025 •

edited

Loading