Skip to content

Conversation

@leandroberetta
Copy link
Contributor

@leandroberetta leandroberetta commented Dec 9, 2025

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

image

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - "PacketDrop"
      - "DNSTracking"
  processor:
    metrics:
      healthRules:
      - template: DNSNxDomain
        mode: Alert
        variants:
        - groupBy: Namespace
          thresholds:
            info: "10"
            warning: "50"
            critical: "80"
      - template: PacketDropsByKernel
        mode: Recording
        variants:
        - thresholds:
            info: "0.5"
            warning: "2"
            critical: "5"
'

Generate DNS errors (for alerts)

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: dns-test
spec:
  containers:
  - name: dns-client
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Installing DNS utilities..."
      dnf install -y bind-utils iputils
      echo "Starting DNS error generator..."
      while true; do
        # Try to resolve non-existent domains (generates NXDOMAIN)
        for i in {1..20}; do
          nslookup "nonexistent-domain-${RANDOM}.invalid" || true
          nslookup "fake-${RANDOM}.test" || true
          nslookup "does-not-exist-${RANDOM}.local" || true
        done
        echo "Generated 60 DNS NXDOMAIN errors at $(date)"
        sleep 5
      done
  restartPolicy: Always
EOF

Generate packet drops (for recording rules)

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
  name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
  name: udp-sink
  namespace: packet-drop-test
spec:
  selector:
    app: udp-sink
  ports:
  - port: 9999
    protocol: UDP
    targetPort: 9999
---
# Receiver pod with small buffer (will saturate and drop packets)
apiVersion: v1
kind: Pod
metadata:
  name: udp-sink
  namespace: packet-drop-test
  labels:
    app: udp-sink
spec:
  containers:
  - name: sink
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting UDP sink with small buffer (will cause packet drops)..."
      while true; do
        nc -ul -p 9999 > /dev/null 2>&1
      done
    resources:
      requests:
        memory: "32Mi"
        cpu: "50m"
      limits:
        memory: "64Mi"
        cpu: "100m"
---
# Generator pod that floods UDP to cause packet drops
apiVersion: v1
kind: Pod
metadata:
  name: packet-drop-generator
  namespace: packet-drop-test
  labels:
    app: packet-drop-generator
spec:
  containers:
  - name: flood-gen
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting UDP flood generator to cause kernel packet drops..."
      echo "Waiting for UDP sink service..."
      sleep 10

      while true; do
        echo "=== Flooding UDP at $(date) ==="

        # Generate MASSIVE UDP flood in parallel
        # This will saturate network buffers and cause REAL kernel packet drops
        for i in {1..50}; do
          (
            # Each subprocess sends 5000 UDP packets as fast as possible
            for j in {1..5000}; do
              echo "DATA_${i}_${j}_$(date +%s%N)" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
            done
          ) &
        done

        # Wait for all background processes
        wait

        echo "Sent ~250,000 UDP packets in burst (kernel should drop many)"
        echo "Waiting 10 seconds before next flood..."
        echo ""

        sleep 10
      done
    resources:
      requests:
        memory: "128Mi"
        cpu: "200m"
      limits:
        memory: "256Mi"
        cpu: "1000m"
  restartPolicy: Always
EOF

Dependencies

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

image

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link

openshift-ci bot commented Dec 9, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link

openshift-ci bot commented Dec 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mffiedler for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 7.44681% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.35%. Comparing base (9f6f672) to head (d51a426).

Files with missing lines Patch % Lines
web/src/components/health/health-helper.ts 7.44% 87 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1163      +/-   ##
==========================================
- Coverage   52.73%   52.35%   -0.39%     
==========================================
  Files         210      210              
  Lines       11026    11117      +91     
  Branches     1398     1423      +25     
==========================================
+ Hits         5815     5820       +5     
- Misses       4658     4744      +86     
  Partials      553      553              
Flag Coverage Δ
uitests 54.38% <7.44%> (-0.55%) ⬇️
unittests 46.57% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/config/config.go 47.36% <ø> (ø)
web/src/model/config.ts 100.00% <ø> (ø)
web/src/components/health/health-helper.ts 19.76% <7.44%> (-7.78%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 5, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Screenshot From 2026-01-05 12-22-51

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@leandroberetta leandroberetta marked this pull request as ready for review January 5, 2026 16:03
@leandroberetta leandroberetta requested a review from jotak January 5, 2026 16:28
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 5, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Screenshot From 2026-01-05 12-22-51

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

# 1. Configure FlowCollector with alert + recording rule
kubectl patch flowcollector cluster --type=merge --patch '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - "PacketDrop"
      - "DNSTracking"
  processor:
    advanced:
      env:
        EXPERIMENTAL_ALERTS_HEALTH: "true"
    metrics:
      healthRules:
      - template: DNSNxDomain
        mode: alert
        variants:
        - groupBy: Namespace
          thresholds:
            info: "10"
            warning: "50"
            critical: "80"
      - template: PacketDropsByKernel
        mode: recording
        variants:
        - thresholds:
            info: "0.5"
            warning: "2"
            critical: "5"
'

# 2. Generate DNS errors (for alert)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: dns-test
spec:
  containers:
  - name: dns-client
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting DNS error generator..."
      while true; do
        for i in {1..20}; do
          nslookup "nonexistent-domain-\${RANDOM}.invalid" || true
          nslookup "fake-\${RANDOM}.test" || true
          nslookup "does-not-exist-\${RANDOM}.local" || true
        done
        echo "Generated 60 DNS NXDOMAIN errors"
        sleep 5
      done
  restartPolicy: Always
EOF

# 3. Generate packet drops (for recording rule)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
  name: udp-sink
  namespace: packet-drop-test
spec:
  selector:
    app: udp-sink
  ports:
  - port: 9999
    protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
  name: udp-sink
  namespace: packet-drop-test
  labels:
    app: udp-sink
spec:
  containers:
  - name: sink
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      while true; do
        nc -ul -p 9999 > /dev/null 2>&1
      done
    resources:
      limits:
        memory: "64Mi"
        cpu: "100m"
---
apiVersion: v1
kind: Pod
metadata:
  name: packet-drop-generator
  namespace: packet-drop-test
spec:
  containers:
  - name: flood-gen
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      sleep 10
      while true; do
        for i in {1..50}; do
          (
            for j in {1..5000}; do
              echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
            done
          ) &
        done
        wait
        echo "Sent 250k packets"
        sleep 10
      done
    resources:
      limits:
        memory: "256Mi"
        cpu: "1000m"
  restartPolicy: Always
EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 6, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Screenshot From 2026-01-05 12-22-51

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

# 1. Configure FlowCollector with alert + recording rule
kubectl patch flowcollector cluster --type=merge --patch '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - "PacketDrop"
      - "DNSTracking"
    metrics:
      healthRules:
      - template: DNSNxDomain
        mode: alert
        variants:
        - groupBy: Namespace
          thresholds:
            info: "10"
            warning: "50"
            critical: "80"
      - template: PacketDropsByKernel
        mode: recording
        variants:
        - thresholds:
            info: "0.5"
            warning: "2"
            critical: "5"
'

# 2. Generate DNS errors (for alert)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: dns-test
spec:
  containers:
  - name: dns-client
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      echo "Starting DNS error generator..."
      while true; do
        for i in {1..20}; do
          nslookup "nonexistent-domain-\${RANDOM}.invalid" || true
          nslookup "fake-\${RANDOM}.test" || true
          nslookup "does-not-exist-\${RANDOM}.local" || true
        done
        echo "Generated 60 DNS NXDOMAIN errors"
        sleep 5
      done
  restartPolicy: Always
EOF

# 3. Generate packet drops (for recording rule)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
  name: udp-sink
  namespace: packet-drop-test
spec:
  selector:
    app: udp-sink
  ports:
  - port: 9999
    protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
  name: udp-sink
  namespace: packet-drop-test
  labels:
    app: udp-sink
spec:
  containers:
  - name: sink
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      while true; do
        nc -ul -p 9999 > /dev/null 2>&1
      done
    resources:
      limits:
        memory: "64Mi"
        cpu: "100m"
---
apiVersion: v1
kind: Pod
metadata:
  name: packet-drop-generator
  namespace: packet-drop-test
spec:
  containers:
  - name: flood-gen
    image: nicolaka/netshoot:latest
    command:
    - /bin/bash
    - -c
    - |
      sleep 10
      while true; do
        for i in {1..50}; do
          (
            for j in {1..5000}; do
              echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
            done
          ) &
        done
        wait
        echo "Sent 250k packets"
        sleep 10
      done
    resources:
      limits:
        memory: "256Mi"
        cpu: "1000m"
  restartPolicy: Always
EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 6, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Screenshot From 2026-01-05 12-22-51

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'
 # 2. Generate DNS errors (for alert)
 kubectl apply -f - <<EOF
 apiVersion: v1
 kind: Namespace
 metadata:
   name: dns-test
 ---
 apiVersion: v1
 kind: Pod
 metadata:
   name: dns-error-generator
   namespace: dns-test
 spec:
   containers:
   - name: dns-client
     image: nicolaka/netshoot:latest
     command:
     - /bin/bash
     - -c
     - |
       echo "Starting DNS error generator..."
       while true; do
         for i in {1..20}; do
           nslookup "nonexistent-domain-\${RANDOM}.invalid" || true
           nslookup "fake-\${RANDOM}.test" || true
           nslookup "does-not-exist-\${RANDOM}.local" || true
         done
         echo "Generated 60 DNS NXDOMAIN errors"
         sleep 5
       done
   restartPolicy: Always
 EOF

 # 3. Generate packet drops (for recording rule)
 kubectl apply -f - <<EOF
 apiVersion: v1
 kind: Namespace
 metadata:
   name: packet-drop-test
 ---
 apiVersion: v1
 kind: Service
 metadata:
   name: udp-sink
   namespace: packet-drop-test
 spec:
   selector:
     app: udp-sink
   ports:
   - port: 9999
     protocol: UDP
 ---
 apiVersion: v1
 kind: Pod
 metadata:
   name: udp-sink
   namespace: packet-drop-test
   labels:
     app: udp-sink
 spec:
   containers:
   - name: sink
     image: nicolaka/netshoot:latest
     command:
     - /bin/bash
     - -c
     - |
       while true; do
         nc -ul -p 9999 > /dev/null 2>&1
       done
     resources:
       limits:
         memory: "64Mi"
         cpu: "100m"
 ---
 apiVersion: v1
 kind: Pod
 metadata:
   name: packet-drop-generator
   namespace: packet-drop-test
 spec:
   containers:
   - name: flood-gen
     image: nicolaka/netshoot:latest
     command:
     - /bin/bash
     - -c
     - |
       sleep 10
       while true; do
         for i in {1..50}; do
           (
             for j in {1..5000}; do
               echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
             done
           ) &
         done
         wait
         echo "Sent 250k packets"
         sleep 10
       done
     resources:
       limits:
         memory: "256Mi"
         cpu: "1000m"
   restartPolicy: Always
 EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 6, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

Screenshot From 2026-01-05 12-22-51

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'

Generate DNS errors (for alerts)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting DNS error generator..."
     while true; do
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors"
       sleep 5
     done
 restartPolicy: Always
EOF

Generate packet drops (for recording rules)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 restartPolicy: Always
 containers:
 - name: sink
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     limits:
       memory: 64Mi
       cpu: 100m
---
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
spec:
 restartPolicy: Always
 containers:
 - name: flood-gen
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     sleep 10
     while true; do
       for i in {1..50}; do
         (
           for j in {1..5000}; do
             echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done
       wait
       echo "Sent 250k packets"
       sleep 10
     done
   resources:
     limits:
       memory: 256Mi
       cpu: "1000m"
EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 6, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

image

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'

Generate DNS errors (for alerts)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting DNS error generator..."
     while true; do
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors"
       sleep 5
     done
 restartPolicy: Always
EOF

Generate packet drops (for recording rules)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 restartPolicy: Always
 containers:
 - name: sink
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     limits:
       memory: 64Mi
       cpu: 100m
---
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
spec:
 restartPolicy: Always
 containers:
 - name: flood-gen
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     sleep 10
     while true; do
       for i in {1..50}; do
         (
           for j in {1..5000}; do
             echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done
       wait
       echo "Sent 250k packets"
       sleep 10
     done
   resources:
     limits:
       memory: 256Mi
       cpu: "1000m"
EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jpinsonneau
Copy link
Contributor

Seems like the build / lint job is failing:

diff --git a/web/locales/en/plugin__netobserv-plugin.json b/web/locales/en/plugin__netobserv-plugin.json
index 9b84d73..83ea64d 100644
--- a/web/locales/en/plugin__netobserv-plugin.json
+++ b/web/locales/en/plugin__netobserv-plugin.json
@@ -252,6 +252,7 @@
   "Description": "Description",
   "Navigate to network traffic": "Navigate to network traffic",
   "Global": "Global",
+  "Alert": "Alert",
   "critical issues": "critical issues",
   "warnings": "warnings",
   "minor issues": "minor issues",

you need to run make fmt and make i18n 😉

@leandroberetta
Copy link
Contributor Author

Seems like the build / lint job is failing:

diff --git a/web/locales/en/plugin__netobserv-plugin.json b/web/locales/en/plugin__netobserv-plugin.json
index 9b84d73..83ea64d 100644
--- a/web/locales/en/plugin__netobserv-plugin.json
+++ b/web/locales/en/plugin__netobserv-plugin.json
@@ -252,6 +252,7 @@
   "Description": "Description",
   "Navigate to network traffic": "Navigate to network traffic",
   "Global": "Global",
+  "Alert": "Alert",
   "critical issues": "critical issues",
   "warnings": "warnings",
   "minor issues": "minor issues",

you need to run make fmt and make i18n 😉

ouch, I thought it passed, I'll run it asap, thanks and sorry.

@jpinsonneau
Copy link
Contributor

ouch, I thought it passed, I'll run it asap, thanks and sorry.

No worries !

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 8, 2026
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 8, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

image

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'

Generate DNS errors (for alerts)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting DNS error generator..."
     while true; do
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors"
       sleep 5
     done
 restartPolicy: Always
EOF

Generate packet drops (for recording rules)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
---
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 restartPolicy: Always
 containers:
 - name: sink
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     limits:
       memory: 64Mi
       cpu: 100m
---
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
spec:
 restartPolicy: Always
 containers:
 - name: flood-gen
   image: nicolaka/netshoot:latest
   command:
   - /bin/bash
   - -c
   - |
     sleep 10
     while true; do
       for i in {1..50}; do
         (
           for j in {1..5000}; do
             echo "DATA" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done
       wait
       echo "Sent 250k packets"
       sleep 10
     done
   resources:
     limits:
       memory: 256Mi
       cpu: "1000m"
EOF

Dependencies

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

New image:
quay.io/netobserv/network-observability-console-plugin:d8d22c0

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=d8d22c0 make set-plugin-image

<Badge isRead style={{ marginLeft: '0.5rem' }}>
{t('Recording')}
</Badge>
{direction && <Badge style={{ marginLeft: '0.5rem' }}>{direction}</Badge>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm introducing icons you may be interested in here:

export const SourceIcon: React.FC<{ size?: string | number; className?: string }> = ({ size, className }) => (
<IconWrapper icon={TbUpload} size={size} className={className} />
);
// Destination: download icon represents data target/endpoint
export const DestinationIcon: React.FC<{ size?: string | number; className?: string }> = ({ size, className }) => (
<IconWrapper icon={TbDownload} size={size} className={className} />
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need you new PR right?. I think that we can improve it in a follow up PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure don't wait for mine if you want to merge here 😉

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 13, 2026
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jan 14, 2026

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

This PR adds support for recording rules in the Network Health view. Recording rules are Prometheus recording rules that pre-compute and store health metrics, complementing the existing alerting functionality.

Recording Rules Feature

Recording rules appear alongside alerts in the Network Health view with the following capabilities:

  • Display recording rule violations organized by global, namespace, and node scopes
  • Show severity levels (critical, warning, info) based on configured thresholds
  • Include direction indicators (Src/Dst) when metrics are directional
  • Integrate with the health summary to reflect overall network status
  • Provide direct navigation to query browser for metric exploration

Implementation

UI Components

  • Recording rule cards display in the same gallery as alerts with unified selection behavior
  • Details table shows template name, severity, current value, threshold, and direction
  • Kebab menu provides quick access to view metrics in the query browser

Data Flow

  • Fetches recording rules from Prometheus API filtered by netobserv label
  • Queries current metric values for each recording rule
  • Processes metrics using health rule metadata from FlowCollector configuration
  • Groups rules by resource (global, namespace, node) and severity

Health Summary

  • Aggregates recording rule counts across all scopes
  • Contributes to overall health status determination
  • Displays alongside alert counts in the network health summary

Configuration

Recording rules are configured in the FlowCollector CR under processor.metrics.healthRules with mode: recording. The operator generates the corresponding PrometheusRule resources with the appropriate metric names and evaluation rules.

image

Testing

To test this feature with both alerts and recording rules, use the provided test configurations.

Configure DNSNxDomain for alerts, and PacketDrops as recording rules:

kubectl patch flowcollector cluster --type=merge --patch '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - "PacketDrop"
     - "DNSTracking"
 processor:
   metrics:
     healthRules:
     - template: DNSNxDomain
       mode: Alert
       variants:
       - groupBy: Namespace
         thresholds:
           info: "10"
           warning: "50"
           critical: "80"
     - template: PacketDropsByKernel
       mode: Recording
       variants:
       - thresholds:
           info: "0.5"
           warning: "2"
           critical: "5"
'

Generate DNS errors (for alerts)

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
 name: dns-test
---
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: dns-test
spec:
 containers:
 - name: dns-client
   image: registry.access.redhat.com/ubi9/ubi:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Installing DNS utilities..."
     dnf install -y bind-utils iputils
     echo "Starting DNS error generator..."
     while true; do
       # Try to resolve non-existent domains (generates NXDOMAIN)
       for i in {1..20}; do
         nslookup "nonexistent-domain-${RANDOM}.invalid" || true
         nslookup "fake-${RANDOM}.test" || true
         nslookup "does-not-exist-${RANDOM}.local" || true
       done
       echo "Generated 60 DNS NXDOMAIN errors at $(date)"
       sleep 5
     done
 restartPolicy: Always
EOF

Generate packet drops (for recording rules)

oc apply -f - <<'EOF'
apiVersion: v1
kind: Namespace
metadata:
 name: packet-drop-test
---
apiVersion: v1
kind: Service
metadata:
 name: udp-sink
 namespace: packet-drop-test
spec:
 selector:
   app: udp-sink
 ports:
 - port: 9999
   protocol: UDP
   targetPort: 9999
---
# Receiver pod with small buffer (will saturate and drop packets)
apiVersion: v1
kind: Pod
metadata:
 name: udp-sink
 namespace: packet-drop-test
 labels:
   app: udp-sink
spec:
 containers:
 - name: sink
   image: registry.access.redhat.com/ubi9/ubi:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting UDP sink with small buffer (will cause packet drops)..."
     while true; do
       nc -ul -p 9999 > /dev/null 2>&1
     done
   resources:
     requests:
       memory: "32Mi"
       cpu: "50m"
     limits:
       memory: "64Mi"
       cpu: "100m"
---
# Generator pod that floods UDP to cause packet drops
apiVersion: v1
kind: Pod
metadata:
 name: packet-drop-generator
 namespace: packet-drop-test
 labels:
   app: packet-drop-generator
spec:
 containers:
 - name: flood-gen
   image: registry.access.redhat.com/ubi9/ubi:latest
   command:
   - /bin/bash
   - -c
   - |
     echo "Starting UDP flood generator to cause kernel packet drops..."
     echo "Waiting for UDP sink service..."
     sleep 10

     while true; do
       echo "=== Flooding UDP at $(date) ==="

       # Generate MASSIVE UDP flood in parallel
       # This will saturate network buffers and cause REAL kernel packet drops
       for i in {1..50}; do
         (
           # Each subprocess sends 5000 UDP packets as fast as possible
           for j in {1..5000}; do
             echo "DATA_${i}_${j}_$(date +%s%N)" | nc -u -w 0 udp-sink.packet-drop-test.svc.cluster.local 9999 2>/dev/null
           done
         ) &
       done

       # Wait for all background processes
       wait

       echo "Sent ~250,000 UDP packets in burst (kernel should drop many)"
       echo "Waiting 10 seconds before next flood..."
       echo ""

       sleep 10
     done
   resources:
     requests:
       memory: "128Mi"
       cpu: "200m"
     limits:
       memory: "256Mi"
       cpu: "1000m"
 restartPolicy: Always
EOF

Dependencies

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

drawerRef.current && drawerRef.current.focus();
};

const handleSelectItem = (item?: ByResource | RecordingRulesByResource) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleSelectItem seems exactly the same thing as setSelectedItem, no? We could remove it and use directly setSelectedItem ?

Comment on lines +89 to +90
const aScore = a.critical.length * 3 + a.warning.length * 2 + a.other.length;
const bScore = b.critical.length * 3 + b.warning.length * 2 + b.other.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a score number for recording rules, like for alerts, right? Can't we just sort by score, like the alerts?

}
>
<DrawerContentBody>
<HealthGallery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like health-gallery is now unused, so health-gallery.tsx can be deleted ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants