Skip to content

NETOBSERV-2118 add missing sampling and exclude interfaces options#209

Merged
openshift-merge-bot[bot] merged 5 commits intonetobserv:mainfrom
jpinsonneau:2118
May 21, 2025
Merged

NETOBSERV-2118 add missing sampling and exclude interfaces options#209
openshift-merge-bot[bot] merged 5 commits intonetobserv:mainfrom
jpinsonneau:2118

Conversation

@jpinsonneau
Copy link
Member

Description

Add sampling and exclude_interfaces options

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@codecov
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 22.60%. Comparing base (f3bd130) to head (d2408e3).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #209   +/-   ##
=======================================
  Coverage   22.60%   22.60%           
=======================================
  Files          14       14           
  Lines        1451     1451           
=======================================
  Hits          328      328           
  Misses       1099     1099           
  Partials       24       24           
Flag Coverage Δ
unittests 22.60% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scripts/help.sh Outdated
echo " --enable_udn_mapping: enable User Defined Network mapping (default: false)"
echo " --enable_udn_mapping: enable User Defined Network mapping (default: false)"
echo " --get-subnets: get subnets information (default: false)"
echo " --sampling: rate at which packets should be sampled (default: 1)"
Copy link
Member

@jotak jotak Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how much text we can write here without being too verbose, but when referring to sampling I like to be more accurate about what it really does, because it's not obvious for newcomers. E.g: "a value of 50 means one packet every 50 is sampled".

Also: it would be better to talk about sampling ratio rather than rate (but this remark applies to everywhere, not just here/CLI) - see here for the difference. A sampling rate would generally mean how many packets per second are sampled, whereas a sampling ratio is what we want here - the proportion of packets that we keep.
I'll open PRs in other parts to do this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That description should remain short but we can include this option in an example and put the full sentence to explain it 👌

In that case we should write the ratio divisor as the dividend is forced to 1 in our usage ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That description should remain short but we can include this option in an example and put the full sentence to explain it 👌

In that case we should write the ratio divisor as the dividend is forced to 1 in our usage ?

Technically true but that makes the sentence quite more complicated! I think we can make it easy to understand without being too pedantic (that's why a short example helps IMO). Another way to put it would be: "value that defines the ratio of packets being sampled".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in ba6ec93

@memodi
Copy link
Member

memodi commented May 5, 2025

/ok-to-test

@jpinsonneau tests are failing here, could you PTAL? I am assuming this is ready for testing.

@github-actions
Copy link

github-actions bot commented May 5, 2025

New image:
quay.io/netobserv/network-observability-cli:910ce9b

It will expire after two weeks.

To use this build, update your commands using:

USER=netobserv VERSION=910ce9b make commands

or download the updated commands.

@jpinsonneau
Copy link
Member Author

/ok-to-test

@jpinsonneau tests are failing here, could you PTAL? I am assuming this is ready for testing.

@memodi I had some issues during rebase
0c2275f should fix it 🤞

@memodi
Copy link
Member

memodi commented May 8, 2025

/ok-to-test

@github-actions
Copy link

github-actions bot commented May 8, 2025

New image:
quay.io/netobserv/network-observability-cli:b1ce1f3

It will expire after two weeks.

To use this build, update your commands using:

USER=netobserv VERSION=b1ce1f3 make commands

or download the updated commands.

Copy link
Member

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpinsonneau about exclude_interfaces, when I have I exclude certain interface, I still see flows being collected for that interface:

for e.g.: here, I am excluding ens5 interface and there are still flows from that interface.

Screenshot 2025-05-08 at 3 42 18 PM

@jotak
Copy link
Member

jotak commented May 12, 2025

/lgtm

@memodi
Copy link
Member

memodi commented May 12, 2025

/needs-changes

@memodi
Copy link
Member

memodi commented May 12, 2025

/label needs-changes

@openshift-ci
Copy link

openshift-ci bot commented May 12, 2025

@memodi: The label(s) /label needs-changes cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, ux-approved, no-qe, downstream-change-needed, rebase/manual, cluster-config-api-changed, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, stability-fix-approved, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label needs-changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@memodi memodi added the needs-changes To be added to denote PR needs changes or some questions/comments to be addressed label May 12, 2025
@jpinsonneau
Copy link
Member Author

@jpinsonneau about exclude_interfaces, when I have I exclude certain interface, I still see flows being collected for that interface:

for e.g.: here, I am excluding ens5 interface and there are still flows from that interface.

Screenshot 2025-05-08 at 3 42 18 PM

I doubt that's coming from this PR but I will retry as soon as quay is up 😸
I feel it's more an issue with the eBPF implementation. @jotak do you know if we still show the excluded interfaces in the flow when multiple interfaces are involved ?

@jpinsonneau
Copy link
Member Author

@memodi I'm not reproducing the behavior you describe locally using --exclude_interfaces=eth0 🤔

Could you please ensure the eBPF agents have the exclude interfaces env variable set properly ?

$ oc describe pod netobserv-cli-5zskz -n netobserv-cli
Name:             netobserv-cli-5zskz
Namespace:        netobserv-cli
...
Containers:
  netobserv-cli:
...
    Image:          quay.io/netobserv/netobserv-ebpf-agent:main
...
    Environment:
...
      EXCLUDE_INTERFACES:                eth0
...

Else maybe there is something particular with ens5 interface or the cluster itself

@openshift-ci
Copy link

openshift-ci bot commented May 13, 2025

New changes are detected. LGTM label has been removed.

@jpinsonneau jpinsonneau requested a review from memodi May 13, 2025 09:33
@memodi
Copy link
Member

memodi commented May 13, 2025

@memodi I'm not reproducing the behavior you describe locally using --exclude_interfaces=eth0 🤔

Could you please ensure the eBPF agents have the exclude interfaces env variable set properly ?

$ oc describe pod netobserv-cli-5zskz -n netobserv-cli
Name:             netobserv-cli-5zskz
Namespace:        netobserv-cli
...
Containers:
  netobserv-cli:
...
    Image:          quay.io/netobserv/netobserv-ebpf-agent:main
...
    Environment:
...
      EXCLUDE_INTERFACES:                eth0
...

Else maybe there is something particular with ens5 interface or the cluster itself

the configuration happens fine in ebpf-agent, I checked that too, but I was expecting the flows with those interfaces not to appear for any matching Src/Dest. I am not sure if this is broken in ebpf-agent, don't think we have a regression test that's covering it.

@jpinsonneau
Copy link
Member Author

/retest

@github-actions
Copy link

New image:
quay.io/netobserv/network-observability-cli:056fe4f

It will expire after two weeks.

To use this build, update your commands using:

USER=netobserv VERSION=056fe4f make commands

or download the updated commands.

@jpinsonneau
Copy link
Member Author

jpinsonneau commented May 14, 2025

@memodi & @jotak looking at the eBPF logs, I confirm that setting the exclude interface params on ens5 remove the log line:

time="2025-05-14T08:37:04Z" level=info msg="interface detected. trying to attach TCX hook" component=agent.Flows interface="{ens5 2 NS(36: 4, 4026531840) 1a7d7268-11f5-435e-a895-4f3e246997b1}"

However, the collected flows still contains the related interface 🤔
This happens using any 1.6+ ebpf images

The only exclude interface that truely works is the default one lo which appears back when you filter on something else


On parallel, I tried back the INTERFACES param and found an issue on that one too. Having:

      INTERFACES: ens5
      EXCLUDE_INTERFACES: lo

Results in both ens5 and eth0 flows 😮


This issue happens when the agent is privileged. On CLI, it's always true to keep things simple.
See https://issues.redhat.com/browse/NETOBSERV-2257

@jotak
Copy link
Member

jotak commented May 14, 2025

interesting, thanks for opening a bug

@memodi
Copy link
Member

memodi commented May 16, 2025

/label qe-approved

@jpinsonneau thanks for investigating this, to have this feature useful I wonder if it makes sense to introduce auto-detect of privileged setting based on the feature (I know we discussed this for Operator), probably it would be good to introduce in CLI so that exclude_interfaces feature could be useful in CLI.

@jpinsonneau
Copy link
Member Author

/label qe-approved

@jpinsonneau thanks for investigating this, to have this feature useful I wonder if it makes sense to introduce auto-detect of privileged setting based on the feature (I know we discussed this for Operator), probably it would be good to introduce in CLI so that exclude_interfaces feature could be useful in CLI.

I can check that in parallel indeed but the best would be to fix the eBPF issue

@memodi
Copy link
Member

memodi commented May 19, 2025

I can check that in parallel indeed but the best would be to fix the eBPF issue

Agreed. Filed a story https://issues.redhat.com/browse/NETOBSERV-2262

@openshift-ci
Copy link

openshift-ci bot commented May 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau
Copy link
Member Author

Rebased without changes

@github-actions
Copy link

New image:
quay.io/netobserv/network-observability-cli:21cae8a

It will expire after two weeks.

To use this build, update your commands using:

USER=netobserv VERSION=21cae8a make commands

or download the updated commands.

@openshift-merge-bot openshift-merge-bot bot merged commit 27fa748 into netobserv:main May 21, 2025
13 checks passed
@jotak jotak removed the needs-review Tells that the PR needs a review label May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants