Skip to content

Conversation

jotak
Copy link
Member

@jotak jotak commented Jan 15, 2025

Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data.

In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number.

Also:

  • Increase Max Interfaces
  • ignore direction for duplicate interface checks (tradeoff to minimize loss of data in case of max reached)
  • Discard non IP flows (as they all end up in the same map bucket, they make the max interface quickly reached)

Description

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

bpf/flows.c Outdated
bpf_printk("error creating new observed_intf: %d\n", ret);
}
}
add_observed_intf(aggregate_flow, skb->ifindex, direction);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to acquire lock here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make interface update part of update_existing_flow()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, but I still take the lock in add_observed_intf for 2 reasons:

  • we can't call a function while having a lock (so for instance the call to increase_counter isn't allowed)
  • we can't read map value while having the lock, so all the for-loop to check for existing intf+direction isn't allowed during locking
    In other words we need to keep lock as short time as possible

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 16, 2025
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:6970084

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=6970084 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 16, 2025
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 20, 2025
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:b0d6deb

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=b0d6deb make set-agent-image

@jotak jotak requested a review from msherif1234 January 20, 2025 10:18
u8 dscp;
u8 nb_observed_intf;
u8 observed_direction[MAX_OBSERVED_INTERFACES];
u32 observed_intf[MAX_OBSERVED_INTERFACES];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why u changed this instead of having array of struct ?

Copy link
Member Author

@jotak jotak Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's padding optimization. With a struct there was a 3 bytes padding per struct instance, so an extra 18 bytes in total. Separated arrays reduce it to a minimum.

bpf/flows.c Outdated
if (nb_observed_intf < MAX_OBSERVED_INTERFACES) {
for (u8 i = 0; i < nb_observed_intf; i++) {
if (value->observed_intf[i] == if_index) {
// Interface already seen -> skip (we don't check for direction to avoid reaching the max too frequently, as a tradeoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the impact of ignoring direction ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that in slack. The result is not marking twice the same interface when it has a different direction. E.g. if a flow shows eth0/egress then eth0/ingress, only eth0/egress is going to be stored.
We may document that somewhere as a limitation ...

Copy link
Member Author

@jotak jotak Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as said in slack an alternative could be to introduce a new direction enum "BOTH", if we think it's important. It would take best of both worlds, ie. avoid reaching the max interface and still telling everything about what was observed. But it needs changes in the console plugin for display. I'll do that as a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> https://issues.redhat.com/browse/NETOBSERV-2067
(I added it as a 1.8 tech-debt, although we'll see if it can make into 1.8 or be postponed to 1.9)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this introduce different user experience compared to what we shipped prev. i.e we need doc and qe awareness for automation updates etc ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly theoretical, I don't think it involve any real loss in user experience - in fact, when I was seeing this the other day, I'm now pretty sure it was due to the non-IP flows that we're not capturing anymore.

Anyway, it's not a big deal to manage "both" directions so I've added that in the last commit. And on console side: netobserv/network-observability-console-plugin#695 (I generated fake data to be able to see it in console)

static __always_inline void update_existing_flow(flow_metrics *aggregate_flow, pkt_info *pkt,
u64 len, u32 sampling, u32 if_index,
u8 direction) {
// Count only packets seen from the same interface as previously to avoid duplicate counts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think u need to grap the lock here and free after the else and remove it from add_observred_intf to make sure not reading any value while it can be updated from another thead on different core

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not possible, that's what I said here #509 (comment)

  • when lock is acquired, no call function is allowed. Currently there's two call functions: add_observed_intf and increase_counter
  • reading map value isn't allowed while lock is acquired. For instance, the line if (value->observed_intf[i] == if_index) in add_observed_intf generates an error

If we really wanted to, we could work that around, inlining everything and creating temporary copies of values, but I think it would just make it worse.
The rule of thumb is to keep the lock as little time as possible, which is what I think I'm doing here. If the value is read from another thread before the lock is acquired, I don't think it's a problem, it's just going to get the value before it's updated, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if u use always_inline, the risk I see assume one thread added one interface and inc the count while another thread already looped over the prev counter and we end up overwritting it, is the error compiler err or verification err can u share ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the verifier, on startup:
Without inlining:

...ES) {\n\t1528: (25) if r1 > 0x5 goto pc+100 1629: frame1: R1=scalar(umin=6,umax=255,var_off=(0x0; 0xff)) R3=0 R5=1 R6=map_value(id=5,off=48,ks=40,vs=96,imm=0) R7=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R8=0 R9=map_value(id=5,off=0,ks=40,vs=96,imm=0) R10=fp0
 fp-32=????mm?? fp-40=00000000 fp-48= fp-56=00000000 fp-64=00000000 fp-72=mmmmmmmm fp-80=fp fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=mmmmmmmm fp-112=mmmmmmmm fp-120=mmmmmmmm fp-272=ctx fp-280=pkt fp-288=00000000 fp-296= fp-304=1 fp-312=00000000\n\t; bpf_printk(\"couldn't reserve space in the ringbuf. Dropping flow\");\n\t1629: (b7) r1 = 9
                     ; frame1: R1_w=9\n\t1630: (63) *(u32 *)(r10 -264) = r1    ; frame1: R1_w=9 R10=fp0 fp-264=9\n\t1631: (b7) r8 = 1                     ; frame1: R8_w=1\n\t; u32 initVal = 1;\n\t1632: (63) *(u32 *)(r10 -28) = r8     ; frame1: R8_w=1 R10=fp0 fp-32=mmmmmm??\n\t1633: (bf) r2 = r10
                   ; frame1: R2_w=fp0 R10=fp0\n\t;\n\t1634: (07) r2 += -264                 ; frame1: R2_w=fp-264\n\t; error_counter_p = bpf_map_lookup_elem(&global_counters, &key);\n\t1635: (18) r1 = 0xffff8b2008172c00    ; frame1: R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)\n
\t1637: (85) call bpf_map_lookup_elem#1\n\tfunction calls are not allowed while holding a lock\n\tprocessed 482 insns (limit 1000000) max_states_per_insn 2 total_states 29 peak_states 29 mark_read 15" component=ebpf.FlowFetcher
time="2025-01-21T12:34:34Z" level=fatal msg="can't instantiate NetObserv eBPF Agent" error="loading and assigning BPF objects: field TcEgressFlowParse: program tc_egress_flow_parse: load program: invalid argument: function calls are not allowed while holding a lock (280 line(s) omitted)"

=> function calls are not allowed while holding a lock

So we need to refactor and/or use always_inline ... That's actually tricky because add_observed_intf includes increase_counter which contains helper calls (bpf_map_lookup_elem) that cannot be inlined, so in any case that would require some refactoring.

I mentioned a second error that I saw last time, but I don't reproduce it today, so not sure what it was exactly....

So ... let me try to clean up all that and see how that works...

Copy link
Contributor

@msherif1234 msherif1234 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can make add_intf return an error and move the counter outside at the caller to avoid nested functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I don't understand why I had an error on reading the value previously, while holding the lock. It doesn't happen this time. => 61cf8a6

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 21, 2025
if (value->observed_direction[i] != direction &&
value->observed_direction[i] != OBSERVED_DIRECTION_BOTH) {
// Same interface seen on a different direction => mark as both directions
value->observed_direction[i] = OBSERVED_DIRECTION_BOTH;
Copy link
Contributor

@msherif1234 msherif1234 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLP doesn't use the interface direction to figure out the node direction, it does that only by comparing agent IP versus src/dest node IP
I don't think there's anything that reads the IfDirections field, except in the console for display...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the impact of using direction both when we filter flows on direction ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no impact precisely because I keep it separated from the direction enum that is used in filters.
The filters don't check anything in the observed_direction array of stored flows. So it's not impacted by this, and will continue to work simply with egress/ingress directions for each packet filtered individually.

#define MAX_NETWORK_EVENTS 4
#define MAX_OBSERVED_INTERFACES 4
#define MAX_OBSERVED_INTERFACES 6
#define OBSERVED_DIRECTION_BOTH 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have enum used to define possible directions https://github.com/netobserv/netobserv-ebpf-agent/blob/main/bpf/types.h#L75
so u can add both there and move MAX to 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change the enum to not mess up anything with the filters, which uses that enum. This value is very specific to the "observed direction" thing

@msherif1234
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 21, 2025
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 22, 2025
Copy link

New images:
quay.io/netobserv/ebpf-bytecode:c741005
quay.io/netobserv/netobserv-ebpf-agent:c741005

These will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=c741005 make set-agent-image

@memodi
Copy link
Member

memodi commented Jan 23, 2025

@jotak It's hard to tell deterministically whether we're seeing improvements because these errors are still expected:

Screenshot 2025-01-23 at 1 24 09 PM

in above I am using query: sum(rate(netobserv_agent_errors_total[1m])*60) by (error)

@jotak
Copy link
Member Author

jotak commented Jan 24, 2025

@memodi weird .. when I tested it, they almost disappeared. There might have been something regressed since my first commits, let me re-investigate

jotak added 2 commits January 24, 2025 08:52
Observed interfaces is a structure managed from the TC hook, so it is
elligible for being written in the main map instead of the additional
metrics map. Moreover, it is very frequently created (the vast majority
of flows being observed from at least two interfaces), so having it in
the additional maps sort of defeats one of its purpose, that was
allowing to reduce the memory used by separating commonly flow data from
more uncommon data.

In other words, when this is set into the addtional
map, the size map becomes similar to main map, whereas if removed the
size map can be shrinked to a much lower number.

lock for observed interface update

Increase max interfaces..

... and ignore direction for duplicate interface checks (tradeoff to
minimize loss of data in case of max reached)

Also some padding optimisation
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 24, 2025
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 24, 2025
@jotak jotak requested a review from msherif1234 January 24, 2025 10:15
Copy link

New images:
quay.io/netobserv/ebpf-bytecode:c97c4e6
quay.io/netobserv/netobserv-ebpf-agent:c97c4e6

These will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=c97c4e6 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 24, 2025
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 24, 2025
Copy link

New images:
quay.io/netobserv/ebpf-bytecode:ab76445
quay.io/netobserv/netobserv-ebpf-agent:ab76445

These will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=ab76445 make set-agent-image

bpf/flows.c Outdated
"proto=%d, sport=%d, dport=%d\n",
if_index, aggregate_flow->eth_protocol, pkt->id->transport_protocol,
pkt->id->src_port, pkt->id->dst_port);
if (pkt->id->transport_protocol != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

icmpv4/v6 won't work with this check ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, anything with both ports = 0 won't report that as an error, it will limit the interfaces silently - I'm trying to place the cursor between making this issue too visible and too hidden

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't protocol != 0 should be good enough ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO there's good chances that we'll see the error again in high density clusters. Now that I've added a severity label, maybe that's ok, @memodi can filter out low severity errors if that makes regression tests too flaky

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done
quick tests on my cluster show that it's sufficient indeed, we'll see if that's good enough for @memodi too

@jotak jotak changed the title Simplifying observed interfaces - moving it to main map NETOBSERV-2075: Simplifying observed interfaces - moving it to main map Jan 24, 2025
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 24, 2025

@jotak: This pull request references NETOBSERV-2075 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.19.0" version, but no target version was set.

In response to this:

Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data.

In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number.

Also:

  • Increase Max Interfaces
  • ignore direction for duplicate interface checks (tradeoff to minimize loss of data in case of max reached)
  • Discard non IP flows (as they all end up in the same map bucket, they make the max interface quickly reached)

Description

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 24, 2025
@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 24, 2025
Copy link

New images:
quay.io/netobserv/ebpf-bytecode:26f4cae
quay.io/netobserv/netobserv-ebpf-agent:26f4cae

These will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=26f4cae make set-agent-image

@msherif1234
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 24, 2025
@memodi
Copy link
Member

memodi commented Jan 24, 2025

thanks @jotak , with latest image seeing huge improvements, I'll also make change in perf testing to only look for {severity!=low} agent errors

image

@memodi
Copy link
Member

memodi commented Jan 24, 2025

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Jan 24, 2025
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 24, 2025

@jotak: This pull request references NETOBSERV-2075 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.19.0" version, but no target version was set.

In response to this:

Observed interfaces is a structure managed from the TC hook, so it is elligible for being written in the main map instead of the additional metrics map. Moreover, it is very frequently created (the vast majority of flows being observed from at least two interfaces), so having it in the additional maps sort of defeats one of its purpose, that was allowing to reduce the memory used by separating commonly flow data from more uncommon data.

In other words, when this is set into the addtional map, the size map becomes similar to main map, whereas if removed the size map can be shrinked to a much lower number.

Also:

  • Increase Max Interfaces
  • ignore direction for duplicate interface checks (tradeoff to minimize loss of data in case of max reached)
  • Discard non IP flows (as they all end up in the same map bucket, they make the max interface quickly reached)

Description

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jotak
Copy link
Member Author

jotak commented Jan 24, 2025

/approve

Copy link

openshift-ci bot commented Jan 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 33abbd2 into netobserv:main Jan 24, 2025
11 of 12 checks passed
msherif1234 pushed a commit to msherif1234/netobserv-ebpf-agent that referenced this pull request Oct 20, 2025
…ap (netobserv#509)

* Improve performances by moving observed interfaces to main map

Observed interfaces is a structure managed from the TC hook, so it is
elligible for being written in the main map instead of the additional
metrics map. Moreover, it is very frequently created (the vast majority
of flows being observed from at least two interfaces), so having it in
the additional maps sort of defeats one of its purpose, that was
allowing to reduce the memory used by separating commonly flow data from
more uncommon data.

In other words, when this is set into the addtional
map, the size map becomes similar to main map, whereas if removed the
size map can be shrinked to a much lower number.

lock for observed interface update

Increase max interfaces..

... and ignore direction for duplicate interface checks (tradeoff to
minimize loss of data in case of max reached)

Also some padding optimisation

* Discard non-IP flows

* Include all updates in single lock block

* Manage case with Both directions

format

* Do not report errors on proto/port 0

* Add severity on errors

* Still report errors on 0-ports (e.g. icmp)
msherif1234 pushed a commit to msherif1234/netobserv-ebpf-agent that referenced this pull request Oct 20, 2025
…ap (netobserv#509)

* Improve performances by moving observed interfaces to main map

Observed interfaces is a structure managed from the TC hook, so it is
elligible for being written in the main map instead of the additional
metrics map. Moreover, it is very frequently created (the vast majority
of flows being observed from at least two interfaces), so having it in
the additional maps sort of defeats one of its purpose, that was
allowing to reduce the memory used by separating commonly flow data from
more uncommon data.

In other words, when this is set into the addtional
map, the size map becomes similar to main map, whereas if removed the
size map can be shrinked to a much lower number.

lock for observed interface update

Increase max interfaces..

... and ignore direction for duplicate interface checks (tradeoff to
minimize loss of data in case of max reached)

Also some padding optimisation

* Discard non-IP flows

* Include all updates in single lock block

* Manage case with Both directions

format

* Do not report errors on proto/port 0

* Add severity on errors

* Still report errors on 0-ports (e.g. icmp)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants