Skip to content

Conversation

@leandroberetta
Copy link
Contributor

@leandroberetta leandroberetta commented Oct 27, 2025

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Instructions to test on console PR.

Dependencies

netobserv/network-observability-console-plugin#1163

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 27, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link

openshift-ci bot commented Oct 27, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 58.30116% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.03%. Comparing base (8b929d7) to head (2090cbb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
api/flowcollector/v1beta2/zz_generated.deepcopy.go 6.52% 42 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/builder.go 65.55% 27 Missing and 4 partials ⚠️
...lector/v1beta2/flowcollector_validation_webhook.go 58.13% 16 Missing and 2 partials ⚠️
internal/pkg/metrics/alerts/alerts.go 76.31% 8 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/promql.go 61.11% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
- Coverage   73.24%   73.03%   -0.21%     
==========================================
  Files          82       82              
  Lines        9339     9431      +92     
==========================================
+ Hits         6840     6888      +48     
- Misses       2075     2115      +40     
- Partials      424      428       +4     
Flag Coverage Δ
unittests 73.03% <58.30%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go 97.46% <100.00%> (ø)
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
internal/pkg/metrics/alerts/promql.go 84.50% <61.11%> (-8.48%) ⬇️
internal/pkg/metrics/alerts/alerts.go 90.68% <76.31%> (-3.88%) ⬇️
...lector/v1beta2/flowcollector_validation_webhook.go 72.69% <58.13%> (-2.48%) ⬇️
internal/pkg/metrics/alerts/builder.go 81.51% <65.55%> (-13.94%) ⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 38.09% <6.52%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 66.79842% with 84 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.66%. Comparing base (a73d332) to head (c1a990d).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
api/flowcollector/v1beta2/zz_generated.deepcopy.go 2.27% 42 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/builder.go 75.36% 13 Missing and 4 partials ⚠️
internal/pkg/metrics/alerts/alerts.go 79.54% 8 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/promql.go 61.11% 6 Missing and 1 partial ⚠️
...lector/v1beta2/flowcollector_validation_webhook.go 81.25% 5 Missing and 1 partial ⚠️
.../controller/consoleplugin/consoleplugin_objects.go 91.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
+ Coverage   73.53%   73.66%   +0.12%     
==========================================
  Files          88       88              
  Lines        9841     9944     +103     
==========================================
+ Hits         7237     7325      +88     
- Misses       2162     2177      +15     
  Partials      442      442              
Flag Coverage Δ
unittests 73.66% <66.79%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go 97.46% <100.00%> (ø)
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
internal/controller/consoleplugin/config/config.go 75.00% <ø> (ø)
.../controller/consoleplugin/consoleplugin_objects.go 85.84% <91.66%> (-0.16%) ⬇️
...lector/v1beta2/flowcollector_validation_webhook.go 76.43% <81.25%> (+0.65%) ⬆️
internal/pkg/metrics/alerts/promql.go 87.32% <61.11%> (-9.17%) ⬇️
internal/pkg/metrics/alerts/alerts.go 92.41% <79.54%> (-2.81%) ⬇️
internal/pkg/metrics/alerts/builder.go 85.14% <75.36%> (-4.86%) ⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 37.21% <2.27%> (ø)

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -p '
spec:
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF

UI:

image

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF

UI:

image

Dependencies

netobserv/network-observability-console-plugin#1163

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

// They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats.
// +required
Thresholds AlertThresholds `json:"thresholds,omitempty"`
// Required for alert mode, optional for recording mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something I don't get here: how can it be optional for recording mode, if we still want them to appear in the health dashboard, associated with a severity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought the recording rule to be just a value, without the notion of a severity, but now that you mention, we can display the value with severities. You're right. Thanks for the feedback.

// More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md
// +optional
DisableAlerts []AlertTemplate `json:"disableAlerts"`
DisableHealthRules []HealthRuleTemplate `json:"disableHealthRules"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for disabling, we should keep the existing API and continue to only affect alerts. Disabling is mostly to help users when they have too much alerting noise, and it's not so needed for recording rules
Also because we should not rename this field because it already existed before the TP feature, that would be a breaking change.

Comment on lines 18 to 19
HealthRuleNoFlows HealthRuleTemplate = "NetObservNoFlows"
HealthRuleLokiError HealthRuleTemplate = "NetObservLokiError"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"NoFlows" and "LokiError" are different from the others, I think we should keep referring to them as alerts only; unlike the others, they are not health items for the dashboard, they're only for alerting.
We should probably add a validation check that they are not used in "recording" mode


func (rb *ruleBuilder) additionalDescription() string {
return fmt.Sprintf("You can turn off this alert by adding '%s' to spec.processor.metrics.disableAlerts in FlowCollector, or reconfigure it via spec.processor.metrics.alerts.", rb.template)
return fmt.Sprintf("You can turn off this health rule by adding '%s' to spec.processor.metrics.disableHealthRules in FlowCollector, or reconfigure it via spec.processor.metrics.healthRules.", rb.template)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just fyi: we will remove this added description, as it will be moved to the runbooks ; I was about to say something about this message wrt disabled alerts, but we don't care)


// buildRecordingRuleName builds recording rule name following the convention:
// netobserv:health:<template>:<groupby>:<side>:rate2m
func (rb *ruleBuilder) buildRecordingRuleName() string {
Copy link
Member

@jotak jotak Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer if the recording rule name was passed to createRule directly from each alert directly (in alerts.go) ; maybe it's less smart but it's more explicit and make it easier if there's a specific name or something that we want to change for any reason

Actually surely this function can be kept but it's the acronyms and toSnakeCase that IMO can be replaced with an explicit string per alert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will maybe follow-up later regarding this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I thought I addressed this one. I will take it into account for a follow-up PR.

recordName := rb.buildRecordingRuleName()
return &monitoringv1.Rule{
Record: recordName,
// Note: Recording rules cannot have annotations in Prometheus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? So we cannot pass any metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only labels: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

That would be an option although is not as good as annotations for these kind of information.

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 8, 2026
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

New images:

  • quay.io/netobserv/network-observability-operator:cd6dd59
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cd6dd59
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cd6dd59

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cd6dd59 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cd6dd59

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cd6dd59
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 9, 2026
@leandroberetta leandroberetta requested a review from jotak January 13, 2026 15:19
@leandroberetta leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 13, 2026
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:44ff6a7
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-44ff6a7
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-44ff6a7

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:44ff6a7 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-44ff6a7

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-44ff6a7
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 13, 2026
Copy link
Contributor

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @leandroberetta !

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link

openshift-ci bot commented Jan 15, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link

openshift-ci bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants