NETOBSERV-2365: Recording rules support #1163

leandroberetta · 2025-12-09T15:37:19Z

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes
Show severity levels (critical, warning, info) based on configured thresholds
Include direction indicators (Src/Dst) when metrics are directional
Integrate with the health summary to reflect overall network status
Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior
Details table shows template name, severity, current value, threshold, and direction
Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label
Queries current metric values for each recording rule
Processes metrics using health rule metadata from FlowCollector configuration
Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes
Contributes to overall health status determination
Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - "PacketDrop"
      - "DNSTracking"
  processor:
    metrics:
      healthRules:
      - template: DNSNxDomain
        mode: Alert
        variants:
        - groupBy: Namespace
          thresholds:
            info: "10"
            warning: "50"
            critical: "80"
      - template: PacketDropsByKernel
        mode: Recording
        variants:
        - thresholds:
            info: "0.5"
            warning: "2"
            critical: "5"
'

Generate DNS errors (for alerts)

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: dns-test
spec:
  containers:
  - name: dns-client
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Installing DNS utilities..."
      dnf install -y bind-utils iputils
      echo "Starting DNS error generator..."
      while true; do
        # Try to resolve non-existent domains (generates NXDOMAIN)
        for i in {1..20}; do
          nslookup "nonexistent-domain-${RANDOM}.invalid" || true
          nslookup "fake-${RANDOM}.test" || true
          nslookup "does-not-exist-${RANDOM}.local" || true
        done
        echo "Generated 60 DNS NXDOMAIN errors at $(date)"
        sleep 5
      done
  restartPolicy: Always
EOF

Generate packet drops (for recording rules)

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
  name: udp-sink
  namespace: packet-drop-test
spec:
  selector:
    app: udp-sink
  ports:
  - port: 9999
    protocol: UDP
    targetPort: 9999
---
# Receiver pod with small buffer (will saturate and drop packets)
apiVersion: v1
kind: Pod
metadata:
  name: udp-sink
  namespace: packet-drop-test
  labels:
    app: udp-sink
spec:
  containers:
  - name: sink
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting UDP sink with small buffer (will cause packet drops)..."
      while true; do
        nc -ul -p 9999 > /dev/null 2>&1
      done
    resources:
      requests:
        memory: "32Mi"
        cpu: "50m"
      limits:
        memory: "64Mi"
        cpu: "100m"
---
# Generator pod that floods UDP to cause packet drops
apiVersion: v1
kind: Pod
metadata:
  name: packet-drop-generator
  namespace: packet-drop-test
  labels:
    app: packet-drop-generator
spec:
  containers:
  - name: flood-gen
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting UDP flood generator to cause kernel packet drops..."
      echo "Waiting for UDP sink service..."
      sleep 10

      while true; do
        echo "=== Flooding UDP at $(date) ==="

        # Generate MASSIVE UDP flood in parallel
        # This will saturate network buffers and cause REAL kernel packet drops
        for i in {1..50}; do
          (
            # Each subprocess sends 5000 UDP packets as fast as possible
            for j in {1..5000}; do
              echo "DATA_${i}_${j}_$(date +%s%N)" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
            done
          ) &
        done

        # Wait for all background processes
        wait

        echo "Sent ~250,000 UDP packets in burst (kernel should drop many)"
        echo "Waiting 10 seconds before next flood..."
        echo ""

        sleep 10
      done
    resources:
      requests:
        memory: "128Mi"
        cpu: "200m"
      limits:
        memory: "256Mi"
        cpu: "1000m"
  restartPolicy: Always
EOF

Dependencies

Operator: NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature network-observability-operator#2112

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

openshift-ci-robot · 2025-12-09T15:37:24Z

openshift-ci · 2025-12-09T15:37:24Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-12-09T15:37:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mffiedler for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2025-12-11T15:37:35Z

Codecov Report

❌ Patch coverage is 7.44681% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.35%. Comparing base (9f6f672) to head (d51a426).

Files with missing lines	Patch %	Lines
web/src/components/health/health-helper.ts	7.44%	87 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1163      +/-   ##
==========================================
- Coverage   52.73%   52.35%   -0.39%     
==========================================
  Files         210      210              
  Lines       11026    11117      +91     
  Branches     1398     1423      +25     
==========================================
+ Hits         5815     5820       +5     
- Misses       4658     4744      +86     
  Partials      553      553

Flag	Coverage Δ
uitests	`54.38% <7.44%> (-0.55%)`	⬇️
unittests	`46.57% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
pkg/config/config.go	`47.36% <ø> (ø)`
web/src/model/config.ts	`100.00% <ø> (ø)`
web/src/components/health/health-helper.ts	`19.76% <7.44%> (-7.78%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

openshift-ci-robot · 2026-01-05T15:34:24Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-05T17:17:01Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.
# 1. Configure FlowCollector with alert + recording rule
kubectl patch flowcollector cluster --type=merge --patch '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - "PacketDrop"
      - "DNSTracking"
  processor:
    advanced:
      env:
        EXPERIMENTAL_ALERTS_HEALTH: "true"
    metrics:
      healthRules:
      - template: DNSNxDomain
        mode: alert
        variants:
        - groupBy: Namespace
          thresholds:
            info: "10"
            warning: "50"
            critical: "80"
      - template: PacketDropsByKernel
        mode: recording
        variants:
        - thresholds:
            info: "0.5"
            warning: "2"
            critical: "5"
'

# 2. Generate DNS errors (for alert)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: dns-test
spec:
  containers:
  - name: dns-client
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting DNS error generator..."
      while true; do
        for i in {1..20}; do
          nslookup "nonexistent-domain-\${RANDOM}.invalid" || true
          nslookup "fake-\${RANDOM}.test" || true
          nslookup "does-not-exist-\${RANDOM}.local" || true
        done
        echo "Generated 60 DNS NXDOMAIN errors"
        sleep 5
      done
  restartPolicy: Always
EOF

# 3. Generate packet drops (for recording rule)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
  name: udp-sink
  namespace: packet-drop-test
spec:
  selector:
    app: udp-sink
  ports:
  - port: 9999
    protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
  name: udp-sink
  namespace: packet-drop-test
  labels:
    app: udp-sink
spec:
  containers:
  - name: sink
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      while true; do
        nc -ul -p 9999 > /dev/null 2>&1
      done
    resources:
      limits:
        memory: "64Mi"
        cpu: "100m"
---
apiVersion: v1
kind: Pod
metadata:
  name: packet-drop-generator
  namespace: packet-drop-test
spec:
  containers:
  - name: flood-gen
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      sleep 10
      while true; do
        for i in {1..50}; do
          (
            for j in {1..5000}; do
              echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
            done
          ) &
        done
        wait
        echo "Sent 250k packets"
        sleep 10
      done
    resources:
      limits:
        memory: "256Mi"
        cpu: "1000m"
  restartPolicy: Always
EOF
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-06T13:27:50Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.
# 1. Configure FlowCollector with alert + recording rule
kubectl patch flowcollector cluster --type=merge --patch '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - "PacketDrop"
      - "DNSTracking"
    metrics:
      healthRules:
      - template: DNSNxDomain
        mode: alert
        variants:
        - groupBy: Namespace
          thresholds:
            info: "10"
            warning: "50"
            critical: "80"
      - template: PacketDropsByKernel
        mode: recording
        variants:
        - thresholds:
            info: "0.5"
            warning: "2"
            critical: "5"
'

# 2. Generate DNS errors (for alert)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: dns-test
spec:
  containers:
  - name: dns-client
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting DNS error generator..."
      while true; do
        for i in {1..20}; do
          nslookup "nonexistent-domain-\${RANDOM}.invalid" || true
          nslookup "fake-\${RANDOM}.test" || true
          nslookup "does-not-exist-\${RANDOM}.local" || true
        done
        echo "Generated 60 DNS NXDOMAIN errors"
        sleep 5
      done
  restartPolicy: Always
EOF

# 3. Generate packet drops (for recording rule)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
  name: udp-sink
  namespace: packet-drop-test
spec:
  selector:
    app: udp-sink
  ports:
  - port: 9999
    protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
  name: udp-sink
  namespace: packet-drop-test
  labels:
    app: udp-sink
spec:
  containers:
  - name: sink
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      while true; do
        nc -ul -p 9999 > /dev/null 2>&1
      done
    resources:
      limits:
        memory: "64Mi"
        cpu: "100m"
---
apiVersion: v1
kind: Pod
metadata:
  name: packet-drop-generator
  namespace: packet-drop-test
spec:
  containers:
  - name: flood-gen
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      sleep 10
      while true; do
        for i in {1..50}; do
          (
            for j in {1..5000}; do
              echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
            done
          ) &
        done
        wait
        echo "Sent 250k packets"
        sleep 10
      done
    resources:
      limits:
        memory: "256Mi"
        cpu: "1000m"
  restartPolicy: Always
EOF
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-06T13:33:01Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:
kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'
 # 2. Generate DNS errors (for alert)
 kubectl apply -f - <<EOF
 apiVersion: v1
 kind: Namespace
 metadata:
   name: dns-test
 ---
 apiVersion: v1
 kind: Pod
 metadata:
   name: dns-error-generator
   namespace: dns-test
 spec:
   containers:
   - name: dns-client
     image: nicolaka/netshoot:latest
     command:
     - /bin/bash
     - -c
     - |
       echo "Starting DNS error generator..."
       while true; do
         for i in {1..20}; do
           nslookup "nonexistent-domain-\${RANDOM}.invalid" || true
           nslookup "fake-\${RANDOM}.test" || true
           nslookup "does-not-exist-\${RANDOM}.local" || true
         done
         echo "Generated 60 DNS NXDOMAIN errors"
         sleep 5
       done
   restartPolicy: Always
 EOF

 # 3. Generate packet drops (for recording rule)
 kubectl apply -f - <<EOF
 apiVersion: v1
 kind: Namespace
 metadata:
   name: packet-drop-test
 ---
 apiVersion: v1
 kind: Service
 metadata:
   name: udp-sink
   namespace: packet-drop-test
 spec:
   selector:
     app: udp-sink
   ports:
   - port: 9999
     protocol: UDP
 ---
 apiVersion: v1
 kind: Pod
 metadata:
   name: udp-sink
   namespace: packet-drop-test
   labels:
     app: udp-sink
 spec:
   containers:
   - name: sink
     image: nicolaka/netshoot:latest
     command:
     - /bin/bash
     - -c
     - |
       while true; do
         nc -ul -p 9999 > /dev/null 2>&1
       done
     resources:
       limits:
         memory: "64Mi"
         cpu: "100m"
 ---
 apiVersion: v1
 kind: Pod
 metadata:
   name: packet-drop-generator
   namespace: packet-drop-test
 spec:
   containers:
   - name: flood-gen
     image: nicolaka/netshoot:latest
     command:
     - /bin/bash
     - -c
     - |
       sleep 10
       while true; do
         for i in {1..50}; do
           (
             for j in {1..5000}; do
               echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
             done
           ) &
         done
         wait
         echo "Sent 250k packets"
         sleep 10
       done
     resources:
       limits:
         memory: "256Mi"
         cpu: "1000m"
   restartPolicy: Always
 EOF
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-06T13:50:54Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:
kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'
Generate DNS errors (for alerts)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting DNS error generator..."
     while true; do
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors"
       sleep 5
     done
 restartPolicy: Always
EOF
Generate packet drops (for recording rules)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 restartPolicy: Always
 containers:
 - name: sink
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     limits:
       memory: 64Mi
       cpu: 100m
---
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
spec:
 restartPolicy: Always
 containers:
 - name: flood-gen
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     sleep 10
     while true; do
       for i in {1..50}; do
         (
           for j in {1..5000}; do
             echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done
       wait
       echo "Sent 250k packets"
       sleep 10
     done
   resources:
     limits:
       memory: 256Mi
       cpu: "1000m"
EOF
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-01-06T14:13:10Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:
kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'
Generate DNS errors (for alerts)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting DNS error generator..."
     while true; do
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors"
       sleep 5
     done
 restartPolicy: Always
EOF
Generate packet drops (for recording rules)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 restartPolicy: Always
 containers:
 - name: sink
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     limits:
       memory: 64Mi
       cpu: 100m
---
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
spec:
 restartPolicy: Always
 containers:
 - name: flood-gen
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     sleep 10
     while true; do
       for i in {1..50}; do
         (
           for j in {1..5000}; do
             echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done
       wait
       echo "Sent 250k packets"
       sleep 10
     done
   resources:
     limits:
       memory: 256Mi
       cpu: "1000m"
EOF
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

web/src/components/health/health-drawer-container.tsx

web/src/components/health/health-helper.ts

web/src/components/health/network-health.tsx

jpinsonneau · 2026-01-06T16:07:07Z

Seems like the build / lint job is failing:

diff --git a/web/locales/en/plugin__netobserv-plugin.json b/web/locales/en/plugin__netobserv-plugin.json
index 9b84d73..83ea64d 100644
--- a/web/locales/en/plugin__netobserv-plugin.json
+++ b/web/locales/en/plugin__netobserv-plugin.json
@@ -252,6 +252,7 @@
   "Description": "Description",
   "Navigate to network traffic": "Navigate to network traffic",
   "Global": "Global",
+  "Alert": "Alert",
   "critical issues": "critical issues",
   "warnings": "warnings",
   "minor issues": "minor issues",

you need to run make fmt and make i18n 😉

leandroberetta · 2026-01-06T16:08:18Z

Seems like the build / lint job is failing:

diff --git a/web/locales/en/plugin__netobserv-plugin.json b/web/locales/en/plugin__netobserv-plugin.json
index 9b84d73..83ea64d 100644
--- a/web/locales/en/plugin__netobserv-plugin.json
+++ b/web/locales/en/plugin__netobserv-plugin.json
@@ -252,6 +252,7 @@
   "Description": "Description",
   "Navigate to network traffic": "Navigate to network traffic",
   "Global": "Global",
+  "Alert": "Alert",
   "critical issues": "critical issues",
   "warnings": "warnings",
   "minor issues": "minor issues",

you need to run make fmt and make i18n 😉

ouch, I thought it passed, I'll run it asap, thanks and sorry.

jpinsonneau · 2026-01-06T16:12:10Z

ouch, I thought it passed, I'll run it asap, thanks and sorry.

No worries !

openshift-ci-robot · 2026-01-08T15:37:35Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:
kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'
Generate DNS errors (for alerts)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting DNS error generator..."
     while true; do
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors"
       sleep 5
     done
 restartPolicy: Always
EOF
Generate packet drops (for recording rules)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 restartPolicy: Always
 containers:
 - name: sink
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     limits:
       memory: 64Mi
       cpu: 100m
---
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
spec:
 restartPolicy: Always
 containers:
 - name: flood-gen
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     sleep 10
     while true; do
       for i in {1..50}; do
         (
           for j in {1..5000}; do
             echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done
       wait
       echo "Sent 250k packets"
       sleep 10
     done
   resources:
     limits:
       memory: 256Mi
       cpu: "1000m"
EOF
Dependencies

Operator: NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature network-observability-operator#2112

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

github-actions · 2026-01-08T15:40:14Z

New image:
quay.io/netobserv/network-observability-console-plugin:d8d22c0

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=d8d22c0 make set-plugin-image

web/src/components/health/health-drawer-container.tsx

web/src/components/health/recording-rules-gallery.tsx

web/src/components/health/recording-rule-details.tsx

jpinsonneau · 2026-01-13T17:15:29Z

web/src/components/health/recording-rule-details.tsx

+                <Badge isRead style={{ marginLeft: '0.5rem' }}>
+                  {t('Recording')}
+                </Badge>
+                {direction && <Badge style={{ marginLeft: '0.5rem' }}>{direction}</Badge>}


I'm introducing icons you may be interested in here:

network-observability-console-plugin/web/src/components/icons/react-icons.tsx

Lines 13 to 20 in 983b94d

export const SourceIcon: React.FC<{ size?: string | number; className?: string }> = ({ size, className }) => (

<IconWrapper icon={TbUpload} size={size} className={className} />

);

// Destination: download icon represents data target/endpoint

export const DestinationIcon: React.FC<{ size?: string | number; className?: string }> = ({ size, className }) => (

<IconWrapper icon={TbDownload} size={size} className={className} />

);

This will need you new PR right?. I think that we can improve it in a follow up PR

yeah sure don't wait for mine if you want to merge here 😉

openshift-ci-robot · 2026-01-14T13:13:06Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

Display recording rule violations organized by global, namespace, and node scopes

Show severity levels (critical, warning, info) based on configured thresholds

Include direction indicators (Src/Dst) when metrics are directional

Integrate with the health summary to reflect overall network status

Provide direct navigation to query browser for metric exploration

Implementation

UI Components

Recording rule cards display in the same gallery as alerts with unified selection behavior

Details table shows template name, severity, current value, threshold, and direction

Kebab menu provides quick access to view metrics in the query browser

Data Flow

Fetches recording rules from Prometheus API filtered by netobserv label

Queries current metric values for each recording rule

Processes metrics using health rule metadata from FlowCollector configuration

Groups rules by resource (global, namespace, node) and severity

Health Summary

Aggregates recording rule counts across all scopes

Contributes to overall health status determination

Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:
kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: Alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: Recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'
Generate DNS errors (for alerts)
oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: registry.access.redhat.com/ubi9/ubi:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Installing DNS utilities..."
     dnf install -y bind-utils iputils
     echo "Starting DNS error generator..."
     while true; do
       # Try to resolve non-existent domains (generates NXDOMAIN)
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors at $(date)"
       sleep 5
     done
 restartPolicy: Always
EOF
Generate packet drops (for recording rules)
oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
   targetPort: 9999
---
# Receiver pod with small buffer (will saturate and drop packets)
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 containers:
 - name: sink
   image: registry.access.redhat.com/ubi9/ubi:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting UDP sink with small buffer (will cause packet drops)..."
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     requests:
       memory: "32Mi"
       cpu: "50m"
     limits:
       memory: "64Mi"
       cpu: "100m"
---
# Generator pod that floods UDP to cause packet drops
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
 labels:
   app: packet-drop-generator
spec:
 containers:
 - name: flood-gen
   image: registry.access.redhat.com/ubi9/ubi:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting UDP flood generator to cause kernel packet drops..."
     echo "Waiting for UDP sink service..."
     sleep 10

     while true; do
       echo "=== Flooding UDP at $(date) ==="

       # Generate MASSIVE UDP flood in parallel
       # This will saturate network buffers and cause REAL kernel packet drops
       for i in {1..50}; do
         (
           # Each subprocess sends 5000 UDP packets as fast as possible
           for j in {1..5000}; do
             echo "DATA_${i}_${j}_$(date +%s%N)" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done

       # Wait for all background processes
       wait

       echo "Sent ~250,000 UDP packets in burst (kernel should drop many)"
       echo "Waiting 10 seconds before next flood..."
       echo ""

       sleep 10
     done
   resources:
     requests:
       memory: "128Mi"
       cpu: "200m"
     limits:
       memory: "256Mi"
       cpu: "1000m"
 restartPolicy: Always
EOF
Dependencies

Operator: NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature network-observability-operator#2112

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Co-authored-by: Julien Pinsonneau <[email protected]>

jotak · 2026-01-15T15:44:19Z

web/src/components/health/health-drawer-container.tsx

    drawerRef.current && drawerRef.current.focus();
  };

+  const handleSelectItem = (item?: ByResource | RecordingRulesByResource) => {


handleSelectItem seems exactly the same thing as setSelectedItem, no? We could remove it and use directly setSelectedItem ?

jotak · 2026-01-15T15:58:13Z

web/src/components/health/health-drawer-container.tsx

+            const aScore = a.critical.length * 3 + a.warning.length * 2 + a.other.length;
+            const bScore = b.critical.length * 3 + b.warning.length * 2 + b.other.length;


There is a score number for recording rules, like for alerts, right? Can't we just sort by score, like the alerts?

jotak · 2026-01-15T16:02:48Z

web/src/components/health/health-drawer-container.tsx

          }
        >
          <DrawerContentBody>
-            <HealthGallery


looks like health-gallery is now unused, so health-gallery.tsx can be deleted ?

leandroberetta self-assigned this Dec 9, 2025

openshift-ci-robot added the jira/valid-reference label Dec 9, 2025

openshift-ci bot added the do-not-merge/work-in-progress label Dec 9, 2025

leandroberetta mentioned this pull request Dec 9, 2025

NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature netobserv/network-observability-operator#2112

Open

10 tasks

openshift-merge-robot added the needs-rebase label Dec 13, 2025

leandroberetta force-pushed the netobserv-2365 branch from a68382f to 076d872 Compare December 17, 2025 14:38

openshift-merge-robot removed the needs-rebase label Dec 17, 2025

leandroberetta force-pushed the netobserv-2365 branch from 075f495 to 6f8b3a1 Compare December 17, 2025 14:39

leandroberetta force-pushed the netobserv-2365 branch from 12caef3 to 29656b7 Compare January 5, 2026 15:28

leandroberetta marked this pull request as ready for review January 5, 2026 16:03

openshift-ci bot removed the do-not-merge/work-in-progress label Jan 5, 2026

leandroberetta requested a review from jotak January 5, 2026 16:28

leandroberetta requested a review from jpinsonneau January 6, 2026 14:24

jpinsonneau reviewed Jan 6, 2026

View reviewed changes

jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 8, 2026

jpinsonneau reviewed Jan 13, 2026

View reviewed changes

web/src/components/health/health-drawer-container.tsx Outdated Show resolved Hide resolved

jpinsonneau reviewed Jan 13, 2026

View reviewed changes

web/src/components/health/health-drawer-container.tsx Outdated Show resolved Hide resolved

jpinsonneau reviewed Jan 13, 2026

View reviewed changes

web/src/components/health/recording-rules-gallery.tsx Outdated Show resolved Hide resolved

jpinsonneau reviewed Jan 13, 2026

View reviewed changes

web/src/components/health/recording-rule-details.tsx Outdated Show resolved Hide resolved

jpinsonneau reviewed Jan 13, 2026

View reviewed changes

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 13, 2026

openshift-merge-robot added the needs-rebase label Jan 14, 2026

leandroberetta and others added 13 commits January 14, 2026 13:02

margins added, minor polish

5b5e50f

several improvements

88cea60

fix linting

743bcc0

fix linting and testing

d6898c7

adding alert badge to alerts

a27f0e2

fmt i18n

2538a5d

Update web/src/components/health/health-drawer-container.tsx

e2e1585

Co-authored-by: Julien Pinsonneau <[email protected]>

Update web/src/components/health/health-drawer-container.tsx

44febf6

Co-authored-by: Julien Pinsonneau <[email protected]>

feedback

93212c3

Update web/src/components/health/recording-rules-gallery.tsx

aae457e

Co-authored-by: Julien Pinsonneau <[email protected]>

Update web/src/components/health/health-drawer-container.tsx

efb9dca

Co-authored-by: Julien Pinsonneau <[email protected]>

Update web/src/components/health/recording-rule-details.tsx

9a6b43b

Co-authored-by: Julien Pinsonneau <[email protected]>

fixes and improvements

d51a426

leandroberetta force-pushed the netobserv-2365 branch from 408439b to d51a426 Compare January 14, 2026 16:02

openshift-merge-robot removed the needs-rebase label Jan 14, 2026

leandroberetta requested a review from jpinsonneau January 15, 2026 10:53

jpinsonneau approved these changes Jan 15, 2026

View reviewed changes

openshift-ci bot assigned jpinsonneau Jan 15, 2026

openshift-ci bot added the lgtm label Jan 15, 2026

jotak reviewed Jan 15, 2026

View reviewed changes

	export const SourceIcon: React.FC<{ size?: string \| number; className?: string }> = ({ size, className }) => (
	<IconWrapper icon={TbUpload} size={size} className={className} />
	);

	// Destination: download icon represents data target/endpoint
	export const DestinationIcon: React.FC<{ size?: string \| number; className?: string }> = ({ size, className }) => (
	<IconWrapper icon={TbDownload} size={size} className={className} />
	);

		const aScore = a.critical.length * 3 + a.warning.length * 2 + a.other.length;
		const bScore = b.critical.length * 3 + b.warning.length * 2 + b.other.length;

NETOBSERV-2365: Recording rules support #1163

Are you sure you want to change the base?

NETOBSERV-2365: Recording rules support #1163

Uh oh!

Conversation

leandroberetta commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording Rules Feature

Implementation

Configuration

Testing

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci bot commented Dec 9, 2025

Uh oh!

openshift-ci bot commented Dec 9, 2025

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci-robot commented Jan 5, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording Rules Feature

Implementation

Configuration

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Jan 5, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording Rules Feature

Implementation

Configuration

Testing

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Jan 6, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording Rules Feature

Implementation

Configuration

Testing

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Jan 6, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording Rules Feature

Implementation

Configuration

Testing

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Jan 6, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Recording Rules Feature

Implementation

Configuration

Testing

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Jan 6, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

leandroberetta commented Dec 9, 2025 •

edited

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading

openshift-ci-robot commented Jan 5, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 5, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 6, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 6, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 6, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 6, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 8, 2026 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 14, 2026 •

edited by openshift-ci bot

Loading