Skip to content

feat: support the system notifications and improve the README.md #7357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

testcara
Copy link
Contributor

@testcara testcara commented Jul 29, 2025

For https://issues.redhat.com/browse/KFLUXUI-592

The 'System Notification' chapter in README.md introduces:

  • the label and JSON format needed for the system notification configmaps.
  • UI behaviors
  • Notification CI validation

With the configmaps deployed to related clusters, UI would show the notifications as:

image

More, it is just expanded when we click the badge in the header and UI sorts it for the newest to the oldest.
And we also provide the optional 'activeTimestamp' filed to ensure you can determine when the notification should be shown.
See the vids of UI for more details:

system-notification.mp4

Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes:

components/konflux-info/README.md

  • Bug: The numbering for "Usage Scenarios & Examples" is incorrect after the first example.

    • Line to change: components/konflux-info/README.md:128

    • Suggested change:

      --- a/components/konflux-info/README.md
      +++ b/components/konflux-info/README.md
      @@ -125,21 +125,21 @@ Example of a `banner-content.yaml` with multiple banners (first active one is sh
         # No timezone is needed when you expect it's UTC.

      -#### ✅ 2. One-Time Banner
      +#### 2.3.2. 2. One-Time Banner

      For a single event on a specific date:

      - summary: "Scheduled downtime on July 25"
        type: "warning"
        year: 2025
        month: 7
        dayOfMonth: 25
        startTime: "10:00"
      @@ -149,21 +149,21 @@ For a single event on a specific date:
      
      For a single event in today
      
      ```yaml
      - summary: "Scheduled downtime on July 25"
        type: "warning"
        startTime: "10:00"
        endTime: "14:00"

      -#### ✅ 2. Weekly Recurring Banner
      +#### 2.3.3. 3. Weekly Recurring Banner

      For an event that repeats every week:

      - summary: "Maintenance every Sunday"
        type: "info"
        dayOfWeek: 0
        startTime: "02:00"
        endTime: "04:00"

      -#### ✅ 3. Monthly Recurring Banner
      +#### 2.3.4. 4. Monthly Recurring Banner

      For an event that happens on the same day each month:

      - summary: "Patch release on 1st of every month"
        type: "info"
        dayOfMonth: 1
        startTime: "01:00"
        endTime: "03:00"
        timeZone: "Asia/Shanghai"

      -#### ✅ 4. Always-On Banner
      +#### 2.3.5. 5. Always-On Banner

      For an event that requires immediate notification:

      - summary: "New feature: Pipeline Insights is live!"
        type: "info"

      -#### ✅ 5. Empty Banner
      +#### 2.3.6. 6. Empty Banner

      When there are no events to announce:

      []
      
      
      
  • Bug: The auto-alert-schema.json file is deleted, but it's still referenced in the "Directory Structure" section.

    • Line to change: components/konflux-info/README.md:40
    • Suggested change:
      --- a/components/konflux-info/README.md
      +++ b/components/konflux-info/README.md
      @@ -37,7 +37,6 @@
       The `KONFLUX-INFO` directory contains:
       
       ```bash
       .
      -├── auto-alert-schema.json  # JSON shema definition for auto-alert-content.yaml
       ├── base/                   # Common resources (e.g., RBAC)
       ├── production/             # Production cluster configurations
       ├── staging/                # Staging cluster configurations
  • Bug: In the "Notification JSON Format" section, the allowed values for type are info, warning, or danger (lowercase), but the example system-alert-1.yaml uses Info (capitalized). This is inconsistent.

    • Line to change: components/konflux-info/README.md:190
    • Suggested change:
      --- a/components/konflux-info/README.md
      +++ b/components/konflux-info/README.md
      @@ -187,7 +187,7 @@
       
         - <strong>title (optional)</strong>: A short heading. <strong>Defaults to `component.metadata.name` if omitted.</strong>
         - <strong>summary (required)</strong>: A brief, user-facing message displayed as the notification content.
      -  - <strong>type (required)</strong>: It sets the bell icon. Allowed values: `info`, `warning`, or `danger`.
      +  - <strong>type (required)</strong>: It sets the bell icon. Allowed values: `info`, `warning`, or `danger` (case-sensitive).
       
         - Payload structure:
  • Bug: In the "Example Notification ConfigMap", the labels and data keys are inconsistent with the new naming conventions described in the text.

    • Line to change: components/konflux-info/README.md:205
    • Suggested change:
      --- a/components/konflux-info/README.md
      +++ b/components/konflux-info/README.md
      @@ -202,10 +202,10 @@
       kind: ConfigMap
       metadata:
      -  name: konflux-auto-alert-xyz
      +  name: konflux-system-notification-xyz
         namespace: konflux-info
         labels:
      -    konflux-auto-alert: "true" # Required. UI filter alerts out by this label.
      +    konflux.system.notification: "true"
       data:
      -  auto-alert-content.yaml: |
      +  notification-content.json: |-
           {
             "summary": "Builds are delayed due to maintenance",
             "type": "warning"

components/konflux-info/base/rbac.yaml

  • Improvement: The change removes specific resourceNames from the konflux-public-info-view-role, allowing it to get any configmaps in the konflux-info namespace. This is a broader permission than necessary if the UI only needs to read specific configmaps (banner and system notifications). It's best practice to follow the principle of least privilege.
    • Line to change: components/konflux-info/base/rbac.yaml:9
    • Suggested change:
      --- a/components/konflux-info/base/rbac.yaml
      +++ b/components/konflux-info/base/rbac.yaml
      @@ -6,9 +6,9 @@
    apiGroups:
    • ''
      resourceNames:
    • konflux-public-info
    • konflux-banner-configmap
      resources:
    • configmaps
      + - konflux-system-notification-* # Or a more specific label selector if possible
      ```
      Note: If konflux-system-notification-* is not a valid `resourceName` pattern, consider adding a `resourceNames` entry for each expected notification ConfigMap name, or using a `selector` if the API supports it for roles.

components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml

  • Bug: The konflux-auto-alert label is still used, but the README specifies konflux.system.notification.

    • Line to change: components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml:10
    • Suggested change:
      --- a/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
      +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
      @@ -7,7 +7,7 @@
       }
       kind: ConfigMap
       metadata:
         labels:
      -    konflux-auto-alert: "true"
      +    konflux.system.notification: "true"
         name: konflux-autoalert-configmap-1
         namespace: konflux-info
  • Bug: The data key is alert-content.json, but the README specifies notification-content.json.

    • Line to change: components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml:3
    • Suggested change:
      --- a/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
      +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
      @@ -1,7 +1,7 @@
       apiVersion: v1
       data:
      -  alert-content.json: |
      +  notification-content.json: |
           {
             "summary": "There is an example system alert!",
             "type": "Info"
           }
  • Bug: The type field uses Info (capital 'I'), but the README specifies info (lowercase).

    • Line to change: components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml:6
    • Suggested change:
      --- a/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
      +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
      @@ -3,7 +3,7 @@
       data:
         alert-content.json: |
           {
             "summary": "There is an example system alert!",
      -      "type": "Info"
      +      "type": "info"
           }
       kind: ConfigMap
       metadata:

components/mintmaker/production/*/kustomization.yaml

  • Improvement: The changes to mintmaker image tags appear unrelated to the commit message "feat: improve the Readme.md for system alerts and yaml files". It's generally best practice to keep commits focused on a single logical change.
    • Suggestion: Consider reverting these changes and submitting them in a separate pull request, or update the commit message and body to reflect that this PR also includes a mintmaker image update and explain why it's part of this change.

Copy link
Contributor

Code Review by Gemini

### components/konflux-info/README.md

**Improvements:**

*   **Line 103**: The numbering for "Weekly Recurring Banner" is incorrect.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -100,7 +100,7 @@
     - summary: "Maintenance every Sunday"
       type: "info"
       dayOfWeek: 0
       startTime: "02:00"
       endTime: "04:00"
       # No timezone is needed when you expect it's UTC.
     ```
    -#### ✅ **2. Weekly Recurring Banner**
    +#### 2.3.3. <a name='3.WeeklyRecurringBanner'></a>✅ **3. Weekly Recurring Banner**
    
     For an event that repeats every week:
    
    ```

*   **Line 111**: The numbering for "Monthly Recurring Banner" is incorrect.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -108,7 +108,7 @@
       endTime: "04:00"
     ```
     
    -#### ✅ **3. Monthly Recurring Banner**
    +#### 2.3.4. <a name='4.MonthlyRecurringBanner'></a>✅ **4. Monthly Recurring Banner**
     
     For an event that happens on the same day each month:
     
    ```

*   **Line 119**: The numbering for "Always-On Banner" is incorrect.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -116,7 +116,7 @@
       timeZone: "Asia/Shanghai"
     ```
     
    -#### ✅ **4. Always-On Banner**
    +#### 2.3.5. <a name='5.Always-OnBanner'></a>✅ **5. Always-On Banner**
     
     For an event that requires immediate notification:
     
    ```

*   **Line 124**: The numbering for "Empty Banner" is incorrect.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -121,7 +121,7 @@
     - summary: "New feature: Pipeline Insights is live!"
       type: "info"
     ```
     
    -#### ✅ **5. Empty Banner**
    +#### 2.3.6. <a name='6.EmptyBanner'></a>✅ **6. Empty Banner**
     
     When there are no events to announce:
     
    ```

*   **Line 132**: The heading for "How to submit a PR for Banner" is missing its section number and anchor.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -129,7 +129,7 @@
     ```
     
     ---
     
    -## 📝 How to submit a PR for Banner
    +### 2.4. <a name='HowtosubmitaPRforBanner'></a>📝 How to submit a PR for Banner
     
     1. Locate the target cluster directory:
     
    ```

*   **Line 154**: The heading for "Frequently Asked Questions" is missing its section number and anchor.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -151,7 +151,7 @@
       Type: New banner
       Purpose: Release announcement for Konflux 1.2
       ```
     
    -## ❓ Frequently Asked Questions
    +### 2.6. <a name='FrequentlyAskedQuestions'></a>❓ Frequently Asked Questions
     
     - Why is only one banner shown even when multiple are configured?
     
    ```

*   **Line 190**: The description for the JSON payload fields could be clearer.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -187,7 +187,7 @@
     Each ConfigMap contains notification data in the `notification-content.json` field as JSON.
     
     <strong>Key points:</strong>
     
    -- The JSON payload supports these fields for each notification object:
    +- The JSON payload for each notification object supports these fields:
     
       - <strong>title (optional)</strong>: A short heading. <strong>Defaults to `component.metadata.name` if omitted.</strong>
       - <strong>summary (required)</strong>: A brief, user-facing message displayed as the notification content.
    ```
    *   **Note**: The `title` field description "Defaults to `component.metadata.name` if omitted" implies `component.metadata.name` is part of the JSON payload, but it's likely a UI-side default based on the ConfigMap's metadata. Consider clarifying this if it's a UI behavior.

### components/konflux-info/base/rbac.yaml

**Bugs:**

*   **Lines 10-12**: The removal of `resourceNames` makes the role overly permissive, allowing it to `get` any ConfigMap in the `konflux-info` namespace. This is a security regression. It should be restricted to specific ConfigMaps (e.g., `konflux-public-info`, `konflux-banner-configmap`, and ConfigMaps with the `konflux.system.notification: "true"` label).
    ```diff
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -7,9 +7,6 @@
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: konflux-public-info-view-role
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ---
    ```
    **Suggested Change:** Re-add `resourceNames` or use `resourceNames` with `selector` if supported, or define a `resourceName` for the new system notifications. For example, to allow access to specific ConfigMaps:
    ```yaml
    ---
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: konflux-public-info-view-role
    rules:
      - verbs:
          - get
        apiGroups:
          - ''
        resourceNames:
          - konflux-public-info
          - konflux-banner-configmap
          # Add resource names for system notifications if they are fixed,
          # or consider a label selector if the API supports it for roles.
          # If not, a separate role for system notifications might be needed
          # or rely on the UI to filter by label after getting all.
        resources:
          - configmaps
    ---
    ```
    Alternatively, if the intent is to allow getting *all* configmaps and filter by label in the UI, this should be explicitly documented and accepted as a design choice, but it's generally less secure.

### components/konflux-info/staging/stone-stage-p01/kustomization.yaml

**Bugs:**

*   **Line 6**: The directory name `system-alerts` is inconsistent with the `README.md` and `staging/stone-stg-rh01` which use `system-notifications`.
    ```diff
    --- a/components/konflux-info/staging/stone-stage-p01/kustomization.yaml
    +++ b/components/konflux-info/staging/stone-stage-p01/kustomization.yaml
    @@ -3,7 +3,7 @@
     resources:
       - ../../base
    -  - auto-alerts
    +  - system-alerts
     
     generatorOptions:
    - disableNameSuffixHash: true
    +  disableNameSuffixHash: true
    ```
    **Suggested Change:**
    ```diff
    --- a/components/konflux-info/staging/stone-stage-p01/kustomization.yaml
    +++ b/components/konflux-info/staging/stone-stage-p01/kustomization.yaml
    @@ -3,7 +3,7 @@
     resources:
       - ../../base
    -  - system-alerts
    +  - system-notifications
     
     generatorOptions:
       disableNameSuffixHash: true
    ```

### components/konflux-info/staging/stone-stage-p01/system-alerts/kustomization.yaml

**Bugs:**

*   **Line 4**: The resource name `system-alert-1.yaml` is inconsistent with the `README.md` and `staging/stone-stg-rh01` which use `system-notification-1.yaml`.
    ```diff
    --- a/components/konflux-info/production/kflux-prd-rh02/auto-alerts/kustomization.yaml
    +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/kustomization.yaml
    @@ -1,4 +1,4 @@
     apiVersion: kustomize.config.k8s.io/v1beta1
     kind: Kustomization
     resources:
    - - auto-alert-1.yaml
    +  - system-alert-1.yaml
    ```
    **Suggested Change:**
    ```diff
    --- a/components/konflux-info/staging/stone-stage-p01/system-alerts/kustomization.yaml
    +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/kustomization.yaml
    @@ -1,4 +1,4 @@
     apiVersion: kustomize.config.k8s.io/v1beta1
     kind: Kustomization
     resources:
    -  - system-alert-1.yaml
    +  - system-notification-1.yaml
    ```

### components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml

**Bugs:**

*   **Line 4**: The data key `alert-content.json` is inconsistent with the `README.md` and `staging/stone-stg-rh01` which use `notification-content.json`.
*   **Line 7**: The `type` value `Info` uses a capital 'I', but the `README.md` specifies `info` (lowercase). This might cause issues if the UI expects lowercase.
*   **Line 10**: The label `konflux-auto-alert: "true"` is the old label. The `README.md` specifies `konflux.system.notification: "true"`.
*   **Line 11**: The name `konflux-autoalert-configmap-1` is inconsistent with the new naming convention.
    ```diff
    --- a/components/konflux-info/production/kflux-prd-rh03/auto-alerts/auto-alert-1.yaml
    +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
    @@ -1,12 +1,13 @@
     apiVersion: v1
     data:
    -  auto-alert-content.yaml: |-
    -    enable: false
    -    type: warning
    -    summary: pipeline run cannot run
    +  alert-content.json: |
    +    {
    +      "summary": "There is an example system alert!",
    +      "type": "Info"
    +    }
     kind: ConfigMap
     metadata:
       labels:
         konflux-auto-alert: "true"
       name: konflux-autoalert-configmap-1
       namespace: konflux-info
    ```
    **Suggested Change:**
    ```diff
    --- a/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
    +++ b/components/konflux-info/staging/stone-stage-p01/system-alerts/system-alert-1.yaml
    @@ -1,13 +1,14 @@
     apiVersion: v1
     data:
    -  alert-content.json: |
    +  notification-content.json: |
     +    {
     +      "summary": "There is an example system alert!",
    -      "type": "Info"
    +      "type": "info"
     +    }
     kind: ConfigMap
     metadata:
       labels:
    -    konflux-auto-alert: "true"
    +    konflux.system.notification: "true"
    -  name: konflux-autoalert-configmap-1
    +  name: konflux-system-notification-configmap-1
       namespace: konflux-info
    ```

### components/konflux-info/staging/stone-stg-rh01/system-notifications/system-notification-1.yaml

**Bugs:**

*   **Line 7**: The `type` value `Info` uses a capital 'I', but the `README.md` specifies `info` (lowercase). This might cause issues if the UI expects lowercase.
*   **Line 11**: The name `konflux-autoalert-configmap-1` is inconsistent with the new naming convention.
    ```diff
    --- /dev/null
    +++ b/components/konflux-info/staging/stone-stg-rh01/system-notifications/system-notification-1.yaml
    @@ -0,0 +1,13 @@
    +apiVersion: v1
    +data:
    +  notification-content.json: |
    +    {
    +      "summary": "There is an example system notification!",
    +      "type": "Info"
    +    }
    +kind: ConfigMap
    +metadata:
    +  labels:
    +    konflux-system-notification: "true"
    +  name: konflux-autoalert-configmap-1
    +  namespace: konflux-info
    ```
    **Suggested Change:**
    ```diff
    --- a/components/konflux-info/staging/stone-stg-rh01/system-notifications/system-notification-1.yaml
    +++ b/components/konflux-info/staging/stone-stg-rh01/system-notifications/system-notification-1.yaml
    @@ -4,9 +4,9 @@
     data:
       notification-content.json: |
         {
           "summary": "There is an example system notification!",
    -      "type": "Info"
    +      "type": "info"
         }
     kind: ConfigMap
     metadata:
       labels:
         konflux-system-notification: "true"
    -  name: konflux-autoalert-configmap-1
    +  name: konflux-system-notification-configmap-1
       namespace: konflux-info
    ```

### components/mintmaker/production/kflux-ocp-p01/kustomization.yaml
### components/mintmaker/production/kflux-osp-p01/kustomization.yaml
### components/mintmaker/production/kflux-prd-rh02/kustomization.yaml
### components/mintmaker/production/kflux-prd-rh03/kustomization.yaml
### components/mintmaker/production/kflux-rhel-p01/kustomization.yaml
### components/mintmaker/production/stone-prod-p01/kustomization.yaml
### components/multi-platform-controller/base/kustomization.yaml

**Note:** These files contain image tag updates. Assuming these are routine updates and not directly related to the `konflux-info` changes, no specific review comments are provided for them.

@testcara testcara force-pushed the konflux-system-alerts branch from ed261ce to b44df3a Compare August 11, 2025 09:31
Copy link
Contributor

Code Review by Gemini

## Code Review

### `components/konflux-info/README.md`

*   **Suggestion**: Consider removing the `<!-- vscode-markdown-toc -->` comments. While useful for VS Code users, they are tool-specific annotations that clutter the raw Markdown file and are not standard Markdown.
    *   **Line 1-10, 40-43**

*   **Issue**: The README states that the `title` field for system notifications is optional and "Defaults to `component.metadata.name` if omitted." However, `component.metadata.name` refers to the Kubernetes ConfigMap's metadata, not a field within the `notification-content.json` payload itself. The example `notification-content.json` does not include a `title` field. This could be confusing.
    *   **Suggestion**: Clarify how `component.metadata.name` is used as a default title. If the UI is expected to extract the ConfigMap's `metadata.name` as a fallback, explicitly state that. If `title` is meant to be an optional field *within* the JSON, the example should reflect that.
    *   **Line 260**: `title (optional): A short heading. Defaults to component.metadata.name if omitted.`

*   **Issue**: The `README.md` explicitly notes "no CI validation for the `notification-content.json` payload" (Line 290). This is a regression compared to the previous `auto-alert-schema.json` which provided validation for the `summary` field (including `minLength`, `maxLength`, and `pattern`). Without validation, malformed JSON or potentially unsafe content (e.g., XSS if the UI doesn't sanitize) could be introduced.
    *   **Suggestion**: Reintroduce schema validation for the `notification-content.json` payload. This could be a new JSON schema file and a corresponding CI check, similar to how `banner-content.yaml` is validated. This is crucial for data integrity and security.
    *   **Line 290**: `⚠️ Note: There is currently no CI validation for the notification-content.json payload, so please ensure proper formatting.`

### `components/konflux-info/auto-alert-schema.json`

*   **Issue**: This file was deleted, which removes the schema validation for the old `auto-alerts`. As noted above, no new schema validation was introduced for the `system-notifications` JSON payload.
    *   **Suggestion**: Create a new JSON schema for `notification-content.json` and implement CI validation for it. This schema should at least include validation for `summary` (e.g., `minLength`, `maxLength`, and a `pattern` to restrict characters) and `type` (enum values).

### `components/konflux-info/base/rbac.yaml`

*   **Issue**: The `konflux-public-info-view-role` has been broadened to allow `get` access to *all* `configmaps` in the `konflux-info` namespace, by removing specific `resourceNames`. This is a significant permission escalation. If other sensitive ConfigMaps are ever placed in this namespace, this role would grant access to them.
    *   **Suggestion**: Revert the change to `resourceNames` and explicitly list all ConfigMaps that this role should have access to. For example, if `konflux-public-info`, `konflux-banner-configmap`, and the new `konflux-system-notification-configmap-*` are the only intended ConfigMaps, list them explicitly. Alternatively, if Kubernetes version supports it, use a `labelSelector` to only allow access to ConfigMaps with specific labels (e.g., `konflux.system.notification: "true"` or `konflux-banner: "true"`).
    *   **Line 9-11**:
        ```diff
        --- a/components/konflux-info/base/rbac.yaml
        +++ b/components/konflux-info/base/rbac.yaml
        @@ -7,9 +7,6 @@
         apiVersion: rbac.authorization.k8s.io/v1
         metadata:
           name: konflux-public-info-view-role
         rules:
           - verbs:
               - get
             apiGroups:
               - ''
        -    resourceNames:
        -      - konflux-public-info
        -      - konflux-banner-configmap
             resources:
               - configmaps
        ```

### `components/mintmaker/production/base/manager_patch.yaml`

*   **Issue**: The `GOMEMLIMIT` environment variable has been removed. This variable helps Go applications manage their memory usage within a container's memory limits. Removing it might lead to the Go process attempting to use more memory than allocated, potentially resulting in Out-Of-Memory (OOM) kills, especially if the application's memory consumption is close to the container's `memory` limit (which is 8Gi).
    *   **Suggestion**: Re-evaluate the necessity of `GOMEMLIMIT`. If it was previously set to prevent OOMs, its removal could introduce instability. If the application's memory profile has changed significantly or other memory management strategies are in place, this change might be acceptable, but it should be explicitly justified.
    *   **Line 13-15**:
        ```diff
        --- a/components/mintmaker/production/base/manager_patch.yaml
        +++ b/components/mintmaker/production/base/manager_patch.yaml
        @@ -10,6 +10,3 @@
           requests:
             cpu: 100m
             memory: 8Gi
        -        env:
        -        - name: GOMEMLIMIT
        -          value: "7GiB"
        ```

@testcara testcara force-pushed the konflux-system-alerts branch from b44df3a to 209fea9 Compare August 11, 2025 10:21
Copy link

openshift-ci bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: testcara
Once this PR has been reviewed and has the lgtm label, please assign ksprinkl for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the approved label Aug 11, 2025
Copy link
Contributor

Code Review by Gemini

## Code Review

Here's a review of the provided code changes, focusing on potential bugs and areas for improvement.

### 1. `.github/workflows/validate-notification.yaml`

*   **Bug/Improvement**: The paths to `system-notifications` are relative to the repository root, but the `BANNER_DIR` environment variable is defined. It's better to use the environment variable for consistency and clarity.
    ```diff
    --- a/.github/workflows/validate-notification.yaml
    +++ b/.github/workflows/validate-notification.yaml
    @@ -17,7 +17,7 @@
 
       - name: Check konflux-system-notification label present
         run: |
    -      for file in system-notifications/*.yaml; do
    +      for file in "${BANNER_DIR}/system-notifications/*.yaml"; do
             if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
               continue
             fi
    @@ -33,7 +33,7 @@
           node-version: "18"
 
       - name: Install ajv-cli for JSON Schema validation
    -    run: npm install -g ajv-cli
    +    run: |
    +      npm install -g ajv-cli
 
       - name: Validate notification-content.json files
         run: |
    -      for file in system-notifications/*.yaml; do
    +      for file in "${BANNER_DIR}/system-notifications/*.yaml"; do
             file_name=$(basename "$file")  # no spaces around =
             if [[ "$file_name" == "kustomization.yaml" ]]; then
               continue
    ```

*   **Bug**: The `TMP_DIR` (`.tmp/notifications`) is used to store temporary JSON files, but the directory is not created before `yq` attempts to write to it. This will cause the workflow to fail if the directory doesn't exist.
    ```diff
    --- a/.github/workflows/validate-notification.yaml
    +++ b/.github/workflows/validate-notification.yaml
    @@ -38,6 +38,7 @@
 
       - name: Validate notification-content.json files
         run: |
    +      mkdir -p "${TMP_DIR}"
           for file in system-notifications/*.yaml; do
             file_name=$(basename "$file")  # no spaces around =
             if [[ "$file_name" == "kustomization.yaml" ]]; then
    ```

### 2. `components/konflux-info/README.md`

*   **Bug**: The `auto-alert-schema.json` file is being deleted, so its reference in the directory structure should also be removed.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -17,7 +17,6 @@
     The `KONFLUX-INFO` directory contains:
 
     ```bash
     .
    -├── auto-alert-schema.json  # JSON shema definition for auto-alert-content.yaml
     ├── base/                   # Common resources (e.g., RBAC)
     ├── production/             # Production cluster configurations
     ├── staging/                # Staging cluster configurations
    ```

*   **Bug**: The `auto-alerts` directory is being replaced by `system-notifications`. The `README.md` should reflect this change in the directory structure and kustomization.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -26,9 +25,9 @@
 
     ```bash
     .
    -├── auto-alerts # The directory manages auto-generated alerts content shown in the UI
    +├── system-notifications # The directory manages auto-generated notifications content shown in the UI
     ├── banner-content.yaml # The banner content shown in the UI
     ├── info.json # Metadata about the cluster
    -└── kustomization.yaml # Kustomize configuration for this cluster, including base, auto-alerts, and other configs
    +└── kustomization.yaml # Kustomize configuration for this cluster, including base, system-notifications, and other configs
 
     ```
    ```

*   **Bug**: The example ConfigMap for system notifications still uses the old `auto-alert` format and label. It should be updated to match the new `notification-content.json` format and the `konflux-system-notification` label used in the new `system-notification-1.yaml` and `validate-notification.yaml` workflow.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -195,14 +195,14 @@
     kind: ConfigMap
     metadata:
       name: konflux-auto-alert-xyz
       namespace: konflux-info
       labels:
    -    konflux-auto-alert: "true" # Required. UI filter alerts out by this label.
    +    konflux-system-notification: "true"
     data:
    -  auto-alert-content.yaml: |
    -    enable: true
    -    summary: "Builds are delayed due to maintenance"
    -    type: "warning"
    +  notification-content.json: |-
    +    {
    +      "summary": "Builds are delayed due to maintenance",
    +      "type": "warning",
    +      "title": "From Builds Service"
    +    }
     ```

*   **Bug**: The folder structure example still refers to `auto-alerts`.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -238,11 +238,11 @@
 
     ### 3.4. <a name='FolderStructure'></a>✅ **Folder Structure**
 
     Notifications are organized under the `system-notifications/` directory:
 
     ```bash
    -auto-alerts/   # Alert ConfigMaps (one file = one alert)
    +system-notifications/   # Notification ConfigMaps (one file = one notification)
     .
    -├── alert-1.yaml           # Fully valid ConfigMap YAML
    -├── alert-2.yaml
    -└── kustomization.yaml     # Auto-generated, includes all alert YAMLs
    +├── notification-1.yaml  # A ConfigMap representing one notification
    +├── notification-2.yaml
    +└── kustomization.yaml   # Auto-generated, includes all notifications YAMLs
 
     ```
    ```

### 3. `components/konflux-info/base/rbac.yaml`

*   **Bug/Security Concern**: The change removes specific `resourceNames` for `configmaps`, effectively granting `get` access to *all* ConfigMaps in the `konflux-info` namespace to the `konflux-public-info-view-role`. This is a potential security regression. It's best practice to grant the least privilege necessary. If the new system notification ConfigMaps are intended to be publicly viewable, they should be explicitly added to the `resourceNames` list.
    ```diff
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -6,9 +6,9 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ```
    **Suggested Change**: Revert the removal of `resourceNames` and add the new system notification ConfigMap names if they are meant to be public. For example:
    ```yaml
    ---
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: konflux-public-info-view-role
    rules:
      - verbs:
          - get
        apiGroups:
          - ''
        resourceNames:
          - konflux-public-info
          - konflux-banner-configmap
          - konflux-system-notification-configmap-1 # Add specific names for system notifications
          # Add other system notification configmap names as needed
        resources:
          - configmaps
    ```
    Alternatively, if there are many system notification ConfigMaps and they all follow a naming convention, consider using a label selector in the UI to filter them, and keep the RBAC broad if the namespace is exclusively for public info. However, explicit `resourceNames` is generally safer. Given the `konflux-info` namespace is specifically for this purpose, granting `get` on all `configmaps` might be acceptable if all configmaps in this namespace are indeed public. Please confirm this design decision.

### 4. `components/konflux-info/staging/stone-stage-p01/system-notifications/system-notification-1.yaml`

*   **Improvement**: The `konflux-system-notification` label is used here, which is consistent with the workflow. However, the `README.md` example uses `konflux.system.notification`. Ensure consistency across all files. The current label in this file is good, so the `README.md` should be updated to match it.

---

Overall, the changes are well-structured and aim to improve the system. The main points to address are consistency in naming/labels across documentation and code, and the potential security implication in the RBAC definition.

@testcara testcara force-pushed the konflux-system-alerts branch from 209fea9 to 72f70ea Compare August 11, 2025 10:30
Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes:

Issues and Suggestions

1. Consistency Mismatch in Labels

  • Issue: The README.md example for system notifications uses the label konflux.system.notification: "true", but the new validation workflow (validate-notification.yaml) and the example notification ConfigMaps (system-notification-1.yaml) use konflux-system-notification: "true". This inconsistency can lead to confusion and misconfiguration.
  • File: components/konflux-info/README.md
  • Lines to change:
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -300,7 +300,7 @@
     kind: ConfigMap
     metadata:
       name: konflux-system-notification-xyz
       namespace: konflux-info
       labels:
    -    konflux.system.notification: "true"
    +    konflux-system-notification: "true"
     data:
       notification-content.json: |-
         {

2. Incorrect Numbering in README.md

  • Issue: The "Usage Scenarios & Examples" section under "Konflux Banner" has incorrect sub-numbering. 2.3.2. ✅ 2. One-Time Banner appears twice, and subsequent numbers are off.

  • File: components/konflux-info/README.md

  • Lines to change:

    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -122,7 +122,7 @@
     -  # No timezone is needed when you expect it's UTC.

    -#### ✅ 2. One-Time Banner
    +#### 2.3.2. 2. One-Time Banner

    For a single event on a specific date:

    @@ -139,7 +139,7 @@
    endTime: "14:00"

    
    -#### ✅ **2. Weekly Recurring Banner**
    +#### 2.3.3. <a name='3.WeeklyRecurringBanner'></a>✅ **3. Weekly Recurring Banner**
    
    For an event that repeats every week:
    
    @@ -150,7 +150,7 @@
      endTime: "04:00"
    

    -#### ✅ 3. Monthly Recurring Banner
    +#### 2.3.4. 4. Monthly Recurring Banner

    For an event that happens on the same day each month:

    @@ -161,7 +161,7 @@
    timeZone: "Asia/Shanghai"

    
    -#### ✅ **4. Always-On Banner**
    +#### 2.3.5. <a name='5.Always-OnBanner'></a>✅ **5. Always-On Banner**
    
    For an event that requires immediate notification:
    
    @@ -170,7 +170,7 @@
      type: "info"
    

    -#### ✅ 5. Empty Banner
    +#### 2.3.6. 6. Empty Banner

    When there are no events to announce:

    
    

3. Production Kustomization Missing System Notifications

  • Issue: The kustomization.yaml files for production clusters (components/konflux-info/production/*/kustomization.yaml) remove the auto-alerts reference but do not add system-notifications. This means system notifications will not be deployed to production environments. If this is intentional (e.g., for a phased rollout), it should be noted. If not, system-notifications should be added.
  • File: components/konflux-info/production/*/kustomization.yaml (all production cluster kustomizations)
  • Lines to change (example for kflux-ocp-p01):
    --- a/components/konflux-info/production/kflux-ocp-p01/kustomization.yaml
    +++ b/components/konflux-info/production/kflux-ocp-p01/kustomization.yaml
    @@ -1,7 +1,7 @@
     apiVersion: kustomize.config.k8s.io/v1beta1
     kind: Kustomization
     resources:
       - ../../base
    -  - auto-alerts
    +  - system-notifications
     
     generatorOptions:
    (Apply similar changes to kflux-prd-rh02, kflux-prd-rh03, kflux-rhel-p01, stone-prd-rh01, stone-prod-p01, stone-prod-p02)

4. Unrelated Changes

  • Issue: The changes to components/mintmaker and components/multi-platform-controller (removing GOMEMLIMIT and updating image tags) are unrelated to the commit message "feat: improve the Readme.md for system alerts and yaml files".
  • Suggestion: It is best practice to keep pull requests focused on a single logical change. These unrelated modifications should be moved to a separate pull request to maintain clarity and simplify future reviews and rollbacks.
  • Files:
    • components/mintmaker/production/base/manager_patch.yaml
    • components/mintmaker/production/*/kustomization.yaml
    • components/multi-platform-controller/base/kustomization.yaml

@testcara testcara force-pushed the konflux-system-alerts branch from 72f70ea to 9630e54 Compare August 11, 2025 10:34
Copy link
Contributor

Code Review by Gemini

## Code Review

This pull request introduces a new GitHub Actions workflow for validating system notifications, updates the `README.md` to reflect the new "System Notifications" concept (replacing "Auto Alerts"), and refactors related Kubernetes configurations.

### Issues and Suggestions

1.  **Bug: `rbac.yaml` - Permission Escalation**
    *   **File:** `components/konflux-info/base/rbac.yaml`
    *   **Lines:** 10-13
    *   **Issue:** The `konflux-public-info-view-role` previously had `get` access to specific `ConfigMap` resources (`konflux-public-info`, `konflux-banner-configmap`). With this change, the `resourceNames` field is removed, granting the role `get` access to *all* `ConfigMaps` within the `konflux-info` namespace. This is a significant permission escalation and could expose sensitive `ConfigMaps` if they exist in that namespace.
    *   **Suggestion:** Revert the change to `rbac.yaml` and ensure the role only has access to the specific `ConfigMaps` it needs, including the new system notification ConfigMaps if they are intended to be publicly viewable. If the new system notification ConfigMaps are also meant to be public, their names should be explicitly added to the `resourceNames` list.

    ```diff
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -7,10 +7,8 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ---
    ```
    *   **Proposed Change (Example, assuming new notification ConfigMaps should also be public):**
        ```yaml
        ---
        kind: Role
        apiVersion: rbac.authorization.k8s.io/v1
        metadata:
          name: konflux-public-info-view-role
        rules:
          - verbs:
              - get
            apiGroups:
              - ''
            resourceNames:
              - konflux-public-info
              - konflux-banner-configmap
              - konflux-system-notification-configmap-1 # Add specific names for system notifications if needed
            resources:
              - configmaps
        ---
        ```
        *Note: If system notification ConfigMaps have dynamic names, a different approach might be needed, but granting access to all ConfigMaps is generally not recommended.*

2.  **Bug: `README.md` - JSON Example Syntax Error**
    *   **File:** `components/konflux-info/README.md`
    *   **Lines:** 109-110
    *   **Issue:** The example JSON for `notification-content.json` is missing a comma between the `type` and `title` fields. This will result in invalid JSON.
    *   **Suggestion:** Add a comma after the `type` field.

    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -106,7 +106,7 @@
     {
       "summary": "Builds are delayed due to maintenance",
       "type": "warning"
    -      "title": "From Builds Service"
    +      ,"title": "From Builds Service"
     }
     ```

3.  **Potential Issue: Deployment Inconsistency for System Notifications**
    *   **Files:**
        *   `components/konflux-info/production/*/kustomization.yaml`
        *   `components/konflux-info/staging/*/kustomization.yaml`
    *   **Issue:** The `auto-alerts` directory references have been removed from *all* production cluster `kustomization.yaml` files, but the new `system-notifications` directory references have only been added to *staging* cluster `kustomization.yaml` files. This means that after this change, production clusters will no longer deploy any system notifications.
    *   **Suggestion:** Confirm if this is the intended behavior.
        *   If production clusters are *not* meant to have system notifications, then this is fine, but it's a significant change in functionality that might warrant a note in the PR description or a separate discussion.
        *   If production clusters *should* have system notifications, then the `system-notifications` directory and its reference in `kustomization.yaml` should also be added to the production cluster configurations, similar to how it's done for staging.

### Minor Improvements

1.  **`validate-notification.yaml` - Redundant Comment**
    *   **File:** `.github/workflows/validate-notification.yaml`
    *   **Line:** 49
    *   **Suggestion:** The comment `# no spaces around =` is redundant as `file_name=$(basename "$file")` is standard shell syntax and doesn't require this specific note. It can be removed for cleaner code.

    ```diff
    --- a/.github/workflows/validate-notification.yaml
    +++ b/.github/workflows/validate-notification.yaml
    @@ -46,7 +46,7 @@
           mkdir -p ${{ env.TMP_DIR }}
           for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
             file_name=$(basename "$file")  # no spaces around =
             if [[ "$file_name" == "kustomization.yaml" ]]; then
               continue
             fi
    ```
    *   **Proposed Change:**
        ```yaml
        - name: Validate notification-content.json files
          run: |
            mkdir -p ${{ env.TMP_DIR }}
            for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
              file_name=$(basename "$file")
              if [[ "$file_name" == "kustomization.yaml" ]]; then
                continue
              fi
              # Extract JSON content from ConfigMap
              yq e '.data."notification-content.json"' "$file" > "${TMP_DIR}/${file_name}.json"
              ajv validate -s "${SCHEMA_FILE}" -d "${TMP_DIR}/${file_name}.json" --errors=text
              if [ $? -ne 0 ]; then
                echo "ERROR: notification-content.json in ${file} does not validate against schema"
                exit 1
              fi
            done
        ```

@testcara testcara changed the title feat: improve the Readme.md for system alerts and yaml files feat: support the system notifications and improve the README.md Aug 11, 2025
@testcara testcara force-pushed the konflux-system-alerts branch from 9630e54 to b3be250 Compare August 11, 2025 10:38
Copy link
Contributor

Code Review by Gemini

## Code Review

### 1. `base/rbac.yaml`

**Issue:** Potential Security Vulnerability - Overly Broad Permissions

The change to `base/rbac.yaml` removes the `resourceNames` field, which previously restricted the `konflux-public-info-view-role` to only `konflux-public-info` and `konflux-banner-configmap` ConfigMaps. By removing this, the role now grants `get` access to *all* ConfigMaps within the `konflux-info` namespace. If there are other ConfigMaps in this namespace that contain sensitive information, this change could expose them.

**Suggested Change:**

Restrict the `get` permission to only the necessary ConfigMaps, including the new system notification ConfigMaps.

```diff
--- a/components/konflux-info/base/rbac.yaml
+++ b/components/konflux-info/base/rbac.yaml
@@ -1,12 +1,11 @@
 ---
 kind: Role
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: konflux-public-info-view-role
 rules:
   - verbs:
       - get
     apiGroups:
       - ''
-    resourceNames:
-      - konflux-public-info
-      - konflux-banner-configmap
     resources:
       - configmaps
+    resourceNames:
+      - konflux-public-info
+      - konflux-banner-configmap
+      - '*' # Or list specific system notification ConfigMap names/patterns if known and limited

Explanation for resourceNames: - '*':
If system notification ConfigMaps are dynamically named (e.g., konflux-system-notification-xyz), using * might be necessary to allow access to all of them. However, if there's a way to define a more specific pattern or list them, that would be more secure. If the konflux-info namespace is only for public information and notifications, then * might be acceptable, but it's crucial to confirm this assumption.

2. components/konflux-info/README.md

Improvement: Clarify UI Behavior for Multiple Banners

The "Important Behavior" section for banners is now under "2.2.2. Required and Optional Fields for Each Banner". It might be clearer to have this as a separate sub-section under "2. Konflux Banner" or integrate it more directly into the "UI Behavior" section. The current placement feels a bit out of place.

Suggested Change:

Move the "Important Behavior" subsection content to "2.5. UI Behavior".

--- a/components/konflux-info/README.md
+++ b/components/konflux-info/README.md
@@ -49,13 +49,6 @@
 
 The file must contain a **YAML list** where each item represents a banner configuration.
 
----
-
-### **Important Behavior**
-
-- The <strong style="color: red;">UI displays only the first valid active banner</strong> from the list, based on current date, time, and optional recurrence settings.
-- If multiple banners are configured, <strong style="color: red;">order matters</strong>.
-
----
-
-### **Required and Optional Fields for Each Banner**
+#### 2.2.2. <a name='RequiredandOptionalFieldsforEachBanner'></a>**Required and Optional Fields for Each Banner**
 
 📎 For the full schema used in CI validation, see banner-schema.json. This table is a human-friendly reference for banner authors.
 
@@ -190,6 +183,8 @@
 
 - The <strong style="color: red;">UI displays only the first valid active banner</strong> from the list, based on current date, time, and optional recurrence settings.
 - If multiple banners are configured, <strong style="color: red;">order matters</strong>.
+- <strong style="color: red;">Time-related fields like `startTime` and `endTime` are not shown in the UI</strong>; they only control when the banner is active.
+- <strong style="color: red;">The type and summary fields are displayed directly in the UI</strong>. If you want to communicate duration or timing details, please include them within the summary.
 - We enjoyed leveraging the [PatternFly Banner component (v5)](https://v5-archive.patternfly.org/components/banner/) to implement the UI, following its design principles for clarity and consistency.
 
 ### 2.6. <a name='FrequentlyAskedQuestions'></a>❓ Frequently Asked Questions

3. components/konflux-info/README.md

Improvement: Consistency in Markdown for "Required and Optional Fields"

The "Required and Optional Fields for Each Banner" section uses ⚠️* for startTime and endTime in the table, but the explanation below the table uses ⚠️ **If any of ... is specified, both startTime and endTime are required.**. For consistency, it would be better to use the same markdown formatting for the asterisk in the table.

Suggested Change:

--- a/components/konflux-info/README.md
+++ b/components/konflux-info/README.md
@@ -60,8 +60,8 @@
 | Field        | Type   | Required | Description                                                               |
 | ------------ | ------ | -------- | ------------------------------------------------------------------------- |
 | `summary`    | string | ✅       | Banner text (5–500 chars). **Supports Markdown** (e.g., bold, links).     |
 | `type`       | string | ✅       | Banner type: `info`, `warning`, or `danger`.                              |
-| `startTime`  | string | ⚠️\*     | Start time in `HH:mm` (24-hour). Required if date-related fields are set. |
-| `endTime`    | string | ⚠️\*     | End time in `HH:mm` (24-hour). Required if date-related fields are set.   |
+| `startTime`  | string | ⚠️\*     | Start time in `HH:mm` (24-hour). Required if date-related fields are set. |
+| `endTime`    | string | ⚠️\*     | End time in `HH:mm` (24-hour). Required if date-related fields are set.   |
 | `timeZone`   | string | ❌       | Optional IANA time zone (e.g., Asia/Shanghai). Omit for UTC (default).    |
 | `year`       | number | ❌       | Year (1970–9999) for one-time banners.                                    |
 | `month`      | number | ❌       | Month (1–12).                                                             |

(Note: The diff shows no change, but the suggestion is to ensure the * is part of the markdown for consistency, e.g., ⚠️\* or ⚠️** if it's meant to be bold.)

4. components/konflux-info/README.md

Improvement: Typo in "2.3.2. ✅ 2. One-Time Banner"

There's a repeated section heading "2. One-Time Banner". The second instance should likely be "3. Weekly Recurring Banner" to match the content.

Suggested Change:

--- a/components/konflux-info/README.md
+++ b/components/konflux-info/README.md
@@ -126,7 +126,7 @@
   endTime: "14:00"

-#### ✅ 2. Weekly Recurring Banner
+#### 2.3.3. 3. Weekly Recurring Banner

For an event that repeats every week:


### 5. `components/konflux-info/README.md`

**Improvement:** Typo in "2.3.2. ✅ **2. One-Time Banner**" (continued)

Following the previous point, the subsequent headings are also off by one.

**Suggested Change:**

```diff
--- a/components/konflux-info/README.md
+++ b/components/konflux-info/README.md
@@ -135,7 +135,7 @@
   endTime: "04:00"

-#### ✅ 3. Monthly Recurring Banner
+#### 2.3.4. 4. Monthly Recurring Banner

For an event that happens on the same day each month:

@@ -147,7 +147,7 @@
timeZone: "Asia/Shanghai"


-#### ✅ **4. Always-On Banner**
+#### 2.3.5. <a name='5.Always-OnBanner'></a>✅ **5. Always-On Banner**

For an event that requires immediate notification:

@@ -156,7 +156,7 @@
  type: "info"

-#### ✅ 5. Empty Banner
+#### 2.3.6. 6. Empty Banner

When there are no events to announce:


### 6. `components/konflux-info/README.md`

**Improvement:** Clarify "auto-generated" for `kustomization.yaml`

In the "Folder Structure" section for System Notifications, it states `kustomization.yaml` is "Auto-generated, includes all notifications YAMLs". This might be misleading if it's not *always* auto-generated (e.g., if manual edits are sometimes needed). It's better to clarify that it *should* be kept up-to-date with automation.

**Suggested Change:**

```diff
--- a/components/konflux-info/README.md
+++ b/components/konflux-info/README.md
@@ -269,7 +269,7 @@
 .
 ├── notification-1.yaml  # A ConfigMap representing one notification
 ├── notification-2.yaml
-└── kustomization.yaml   # Auto-generated, includes all notifications YAMLs
+└── kustomization.yaml   # Should be kept up-to-date, includes all notifications YAMLs
 

### 7. `components/konflux-info/README.md`

**Improvement:** Consistency in `metadata.name` reference

In "3.5. ✅ **UI Behavior**", it states: "If the `title` field is omitted in the JSON, it falls back to using component.metadata.name as the default in the UI." This should probably be `ConfigMap.metadata.name` for clarity, as the notification itself is within a ConfigMap.

**Suggested Change:**

```diff
--- a/components/konflux-info/README.md
+++ b/components/konflux-info/README.md
@@ -284,7 +284,7 @@
 - When multiple notifications exist, the UI lists them ordered by their creation time (`creationTimestamp`), <strong>showing the most recent first</strong>.
 - The `type` field control the notification icon shown before the title in the Notification Drawer.
 - If the `title` field is omitted in the JSON, it falls back to using component.metadata.name as the default in the UI.
+- If the `title` field is omitted in the JSON, it falls back to using ConfigMap.metadata.name as the default in the UI.
 - We leveraged [PatternFly Notification Drawer (v5)](https://v5-archive.patternfly.org/components/notification-drawer/html/#ws-core-c-notification-drawer-basic) and [Notification Badge (v5)](https://v5-archive.patternfly.org/components/notification-badge) components to implement the UI, following their design principles for consistency and usability.
 - All notifications are always shown as unread. There is no backend tracking for notification state, so <strong>`read/unread` functionality is not supported</strong>.
 

@testcara testcara marked this pull request as ready for review August 11, 2025 10:49
@openshift-ci openshift-ci bot requested review from JoaoPedroPP and sahil143 August 11, 2025 10:49
@testcara testcara force-pushed the konflux-system-alerts branch from b3be250 to 88d81d1 Compare August 12, 2025 04:46
Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes:

.github/workflows/validate-notification.yaml

  • Improvement:
    • Line 14: While wget directly to /usr/bin/yq works, for CI/CD, it's generally more robust to use a package manager (if available for yq on ubuntu-latest) or download to a local directory and add it to PATH. However, for this specific use case, it's acceptable.

components/konflux-info/README.md

  • Bug:
    • Line 144: The heading "2. Weekly Recurring Banner" is incorrectly numbered. It should be "2.3.3." to follow the sequence.
    • Line 153: The heading "3. Monthly Recurring Banner" is incorrectly numbered. It should be "2.3.4."
    • Line 162: The heading "4. Always-On Banner" is incorrectly numbered. It should be "2.3.5."
    • Line 168: The heading "5. Empty Banner" is incorrectly numbered. It should be "2.3.6."

components/konflux-info/base/rbac.yaml

  • Bug/Potential Issue:
    • Lines 7-9:
      -    resourceNames:
      -      - konflux-public-info
      -      - konflux-banner-configmap
      Removing resourceNames means the konflux-public-info-view-role now grants get access to all configmaps in the konflux-info namespace, rather than just the two specified ones. This significantly broadens the permissions. If the UI only needs access to specific ConfigMaps, it's best practice to explicitly list them or use label selectors for more granular control. Please confirm if this broader access is intentional and justified from a security perspective.

components/mintmaker/production/base/manager_patch.yaml

  • Potential Issue:
    • Lines 13-14:
      -        env:
      -        - name: GOMEMLIMIT
      -          value: "7GiB"
      Removing the GOMEMLIMIT environment variable can impact how the Go application manages its memory. This variable is often used to prevent Go applications from consuming excessive memory, especially in containerized environments. Please confirm if this removal is intentional and if the application has been re-evaluated for memory stability without this limit. Without it, the Go runtime might use more memory than desired, potentially leading to OOM kills or increased resource consumption.

components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml

  • Improvement:
    • Lines 36-37:
            writeRelabelConfigs:
            - action: LabelKeep
              regex: "__name__|source_environment|source_cluster|namespace|app|pod|container|\
                label_pipelines_appstudio_openshift_io_type|health_status|dest_namespace|\
                controller|service|reason|phase|type|resource|resourcequota|le|app|image|\
                commit_hash|job|operation|tokenName|rateLimited|state|persistentvolumeclaim|\
                storageclass|volumename|release_reason|instance|result|deployment_reason|\
                validation_reason|strategy|succeeded|target|name|method|code|sp|le|\
                unexpected_status|failure|hostname|label_app_kubernetes_io_managed_by|status|\
                pipeline|pipelinename|pipelinerun|schedule|check|grpc_service|grpc_code|\
                grpc_method|lease|lease_holder|deployment|platform|mode|cpu|role|node|kind|\
                verb|request_kind|tested_cluster|resource_type|exported_job|http_method|\
                http_route|http_status_code|gin_errors|rule_result|rule_execution_cause|\
                policy_name|policy_background_mode|rule_type|policy_type|policy_validation_mode|\
                resource_request_operation|resource_kind|policy_change_type|event_type"
      The regex for LabelKeep is extremely long and difficult to read and maintain on a single line. While it's functional, consider if there's a way to break it down or use multiple LabelKeep actions if the Prometheus operator supports it, or if labeldrop could be used for a shorter list of unwanted labels. This is a readability and maintainability suggestion, not a bug.

components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml

  • Improvement:
    • Line 2:
      - op: replace
      + op: add
      Changing op: replace to op: add for a patch at /spec/endpoints/0/relabelings/0 means that if there was an existing relabeling rule at index 0, this new rule will be inserted before it, shifting the existing rule to index 1. If this order of operations is critical, ensure it's the desired behavior. If the intention is to always have this specific rule as the first one, add is correct.

components/monitoring/prometheus/staging/stone-stg-rh01/kustomization.yaml

  • Potential Issue:
    • Line 5:
      - - prometheusrule-uwm.yaml
      The prometheusrule-uwm.yaml file is being removed from the kustomization.yaml and subsequently deleted. This file defined a PrometheusRule for container_memory_working_set_bytes in the openshift-user-workload-monitoring namespace. Please confirm if this metric is no longer needed for monitoring or if its collection has been moved elsewhere. Removing it could impact existing dashboards or alerts.

components/tracing/otel-collector/staging/otel-collector-helm-values.yaml

  • Potential Issue:
    • Lines 10-12:
      -    prometheus:
      -      endpoint: 0.0.0.0:8889
    • Lines 29-35:
      -      metrics:
      -        receivers:
      -        - otlp
      -        processors:
      -        - memory_limiter
      -        - batch
      -        exporters:
      -        - prometheus
    • Line 37:
      -      metrics:
      -        level: basic
      -        readers:
      -        - pull:
      -            exporter: prometheus
      -            host: 0.0.0.0
      -            port: 8888
      +      metrics: null
      These changes remove the Prometheus exporter and the entire metrics pipeline from the OpenTelemetry collector configuration. This means the collector will no longer expose metrics via a Prometheus endpoint. If any other monitoring systems or dashboards rely on scraping metrics from this OTel collector, they will cease to function. Please confirm if this is an intentional change and if downstream consumers of these metrics have been updated or are no longer needed.

components/vector-kubearchive-log-collector/OWNERS

  • Potential Issue:
    • Line 1:
      - # See the OWNERS docs: https://go.k8s.io/owners
      - 
      - approvers:
      - - rh-hemartin
      - - skoved
      - - ggallen
      - - maruiz93
      - - mafh314
      - - olegbet
      - 
      - reviewers:
      - - rh-hemartin
      - - skoved
      - - ggallen
      - - maruiz93
      - - mafh314
      - - olegbet
      Deleting the OWNERS file for vector-kubearchive-log-collector removes the explicit definition of approvers and reviewers for this component. Please confirm if ownership is now managed elsewhere or if this component is being deprecated/removed.

components/vector-kubearchive-log-collector/base/rbac.yaml

  • Bug/Critical Issue:
    • Line 1:
      - ---
      - kind: RoleBinding
      - apiVersion: rbac.authorization.k8s.io/v1
      - metadata:
      -   name: kubearchive-logging-component-maintainers
      -   namespace: product-kubearchive-logging
      - subjects:
      -   - kind: Group
      -     apiGroup: rbac.authorization.k8s.io
      -     name: konflux-kubearchive  # rover group
      - roleRef:
      -   apiGroup: rbac.authorization.k8s.io
      -   kind: ClusterRole
      -   name: component-maintainer
      - ---
      - kind: RoleBinding
      - apiVersion: rbac.authorization.k8s.io/v1
      - metadata:
      -   name: kubearchive-logging-admin
      -   namespace: product-kubearchive-logging
      - subjects:
      -   - kind: Group
      -     apiGroup: rbac.authorization.k8s.io
      -     name: konflux-kubearchive  # rover group
      - roleRef:
      -   apiGroup: rbac.authorization.k8s.io
      -   kind: ClusterRole
      -   name: admin
      The deletion of rbac.yaml removes two RoleBindings (kubearchive-logging-component-maintainers and kubearchive-logging-admin) in the product-kubearchive-logging namespace. This will revoke permissions for the konflux-kubearchive group to manage or administer resources in this namespace. This is a critical change that will likely break existing access and operations. Please ensure these RoleBindings are no longer needed or have been replaced by an alternative mechanism.

@testcara testcara force-pushed the konflux-system-alerts branch from 88d81d1 to 4c02063 Compare August 12, 2025 04:49
Copy link
Contributor

Code Review by Gemini

## Code Review

### 1. `.github/workflows/validate-notification.yaml`

*   **Improvement**: The glob pattern `**/**/system-notifications/*.yaml` can be simplified.
    ```diff
    --- a/.github/workflows/validate-notification.yaml
    +++ b/.github/workflows/validate-notification.yaml
    @@ -7,7 +7,7 @@
     jobs:
       validate:
         runs-on: ubuntu-latest
    -
         steps:
           - uses: actions/checkout@v3
    -
           - name: Install yq (YAML processor)
             run: |
               sudo wget https://github.com/mikefarah/yq/releases/download/v4.34.1/yq_linux_amd64 -O /usr/bin/yq
               sudo chmod +x /usr/bin/yq
    -
           - name: Check konflux-system-notification label present
             run: |
               mkdir -p ${{ env.TMP_DIR }}
    -          for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
    +          for file in  ${{ env.NOTIFICATION_DIR }}/**/system-notifications/*.yaml; do
                 if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
                   continue
                 fi
                 label=$(yq e '.metadata.labels."konflux.system.notification"' "$file")
                 if [[ "$label" != "\"true\"" && "$label" != "true" ]]; then
    @@ -24,7 +24,7 @@
           - name: Validate notification-content.json files
             run: |
               mkdir -p ${{ env.TMP_DIR }}
    -          for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
    +          for file in  ${{ env.NOTIFICATION_DIR }}/**/system-notifications/*.yaml; do
                 file_name=$(basename "$file")
                 if [[ "$file_name" == "kustomization.yaml" ]]; then
                   continue
    ```

### 2. `components/konflux-info/README.md`

*   **Bug**: The numbering for "Usage Scenarios & Examples" is incorrect, with duplicate "2." and subsequent sections misnumbered.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -143,28 +143,28 @@
     -  # No timezone is needed when you expect it's UTC.
     ```
     
    -#### ✅ **2. One-Time Banner**
    +#### 2.3.2. <a name='2.One-TimeBanner'></a>✅ **2. One-Time Banner**
     
     For a single event on a specific date:
     
     ```yaml
      - summary: "Scheduled downtime on July 25"
        type: "warning"
    @@ -172,28 +172,28 @@
      ```
     
     For a single event in today
     
     ```yaml
      - summary: "Scheduled downtime on July 25"
        type: "warning"
    @@ -179,28 +179,28 @@
        endTime: "14:00"
      ```
     
    -#### ✅ **2. Weekly Recurring Banner**
    +#### 2.3.3. <a name='3.WeeklyRecurringBanner'></a>✅ **3. Weekly Recurring Banner**
     
     For an event that repeats every week:
     
     ```yaml
      - summary: "Maintenance every Sunday"
        type: "info"
    @@ -189,28 +189,28 @@
        endTime: "04:00"
      ```
     
    -#### ✅ **3. Monthly Recurring Banner**
    +#### 2.3.4. <a name='4.MonthlyRecurringBanner'></a>✅ **4. Monthly Recurring Banner**
     
     For an event that happens on the same day each month:
     
     ```yaml
      - summary: "Patch release on 1st of every month"
        type: "info"
    @@ -209,28 +209,28 @@
        endTime: "03:00"
        timeZone: "Asia/Shanghai"
      ```
     
    -#### ✅ **4. Always-On Banner**
    +#### 2.3.5. <a name='5.Always-OnBanner'></a>✅ **5. Always-On Banner**
     
     For an event that requires immediate notification:
     
     ```yaml
      - summary: "New feature: Pipeline Insights is live!"
        type: "info"
    @@ -238,28 +238,28 @@
      ```
     
    -#### ✅ **5. Empty Banner**
    +#### 2.3.6. <a name='6.EmptyBanner'></a>✅ **6. Empty Banner**
     
     When there are no events to announce:
     
     ```
      []
     ```
    ```

### 3. `components/konflux-info/base/rbac.yaml`

*   **Note**: The change removes specific `resourceNames` for `configmaps`, allowing the `konflux-public-info-view-role` to `get` any `configmap` within the `konflux-info` namespace. While this might be intended for public information, ensure no sensitive `configmaps` are present or could be added to this namespace that should not be publicly viewable.
    ```diff
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -4,9 +4,6 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ```

### 4. `components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml`

*   **Note**: The operation for the patch on `relabelings/0` changed from `replace` to `add`. This means that instead of replacing the first element in the `relabelings` list, it will now insert a new element at the beginning, shifting existing elements. This is a change in behavior. Confirm this is the intended effect (i.e., to insert this specific relabeling at the start of the list).
    ```diff
    --- a/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    +++ b/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    @@ -1,6 +1,6 @@
     ---
    -- op: replace
    +- op: add
       path: /spec/endpoints/0/relabelings/0
       value:
         targetLabel: source_environment
    ```

@testcara testcara force-pushed the konflux-system-alerts branch from 4c02063 to ade86c7 Compare August 12, 2025 04:52
Copy link
Contributor

Code Review by Gemini

## Code Review

### `components/konflux-info/README.md`

*   **Improvement:** In section `2.3.3. ✅ 3. Weekly Recurring Banner`, the example title `✅ 2. Weekly Recurring Banner` is incorrect. It should be `✅ 3. Weekly Recurring Banner`.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -144,7 +144,7 @@ For a single event in today
     - endTime: "14:00"
     ```
     
    -#### ✅ **2. Weekly Recurring Banner**
    +#### 2.3.3. <a name='3.WeeklyRecurringBanner'></a>✅ **3. Weekly Recurring Banner**
     
     For an event that repeats every week:
     
    ```

*   **Improvement:** In section `2.3.4. ✅ 4. Monthly Recurring Banner`, the example title `✅ 3. Monthly Recurring Banner` is incorrect. It should be `✅ 4. Monthly Recurring Banner`.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -153,7 +153,7 @@ For an event that repeats every week:
     - endTime: "04:00"
     ```
     
    -#### ✅ **3. Monthly Recurring Banner**
    +#### 2.3.4. <a name='4.MonthlyRecurringBanner'></a>✅ **4. Monthly Recurring Banner**
     
     For an event that happens on the same day each month:
     
    ```

*   **Improvement:** In section `2.3.5. ✅ 5. Always-On Banner`, the example title `✅ 4. Always-On Banner` is incorrect. It should be `✅ 5. Always-On Banner`.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -162,7 +162,7 @@ For an event that happens on the same day each month:
     - timeZone: "Asia/Shanghai"
     ```
     
    -#### ✅ **4. Always-On Banner**
    +#### 2.3.5. <a name='5.Always-OnBanner'></a>✅ **5. Always-On Banner**
     
     For an event that requires immediate notification:
     
    ```

*   **Improvement:** In section `2.3.6. ✅ 6. Empty Banner`, the example title `✅ 5. Empty Banner` is incorrect. It should be `✅ 6. Empty Banner`.
    ```diff
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -168,7 +168,7 @@ For an event that requires immediate notification:
     - summary: "New feature: Pipeline Insights is live!"
       type: "info"
     ```
    -#### ✅ **5. Empty Banner**
    +#### 2.3.6. <a name='6.EmptyBanner'></a>✅ **6. Empty Banner**
     
     When there are no events to announce:
     
    ```

### `.github/workflows/validate-notification.yaml`

*   **Improvement:** The glob pattern `**/**/system-notifications/*.yaml` can be simplified to `**/system-notifications/*.yaml`.
    ```diff
    --- a/.github/workflows/validate-notification.yaml
    +++ b/.github/workflows/validate-notification.yaml
    @@ -19,7 +19,7 @@ jobs:
       - name: Check konflux-system-notification label present
         run: |
           mkdir -p ${{ env.TMP_DIR }}
    -      for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
    +      for file in  ${{ env.NOTIFICATION_DIR }}/**/system-notifications/*.yaml; do
             if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
               continue
             fi
    @@ -37,7 +37,7 @@ jobs:
       - name: Validate notification-content.json files
         run: |
           mkdir -p ${{ env.TMP_DIR }}
    -      for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
    +      for file in  ${{ env.NOTIFICATION_DIR }}/**/system-notifications/*.yaml; do
             file_name=$(basename "$file")
             if [[ "$file_name" == "kustomization.yaml" ]]; then
               continue
    ```

### `components/konflux-info/base/rbac.yaml`

*   **Note:** The change removes specific `resourceNames` (`konflux-public-info`, `konflux-banner-configmap`) and broadens the `konflux-public-info-view-role` to allow `get` access on *all* `configmaps` within the `konflux-info` namespace. This is likely necessary for the new system notification feature to dynamically discover ConfigMaps by label, but it's a permission increase that should be acknowledged.
    ```diff
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -5,9 +5,6 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ---
    ```

### `components/konflux-info/notification-schema.json`

*   **Bug:** The `type` field in the `notification` definition is missing the `enum` constraint, which is specified in the `README.md` as `info`, `warning`, or `danger`.
    ```diff
    --- /dev/null
    +++ b/components/konflux-info/notification-schema.json
    @@ -16,7 +16,8 @@
         },
         "type": {
           "type": "string",
           "description": "Required. The type of notification: 'info', 'warning', or 'danger'."
    +      "enum": ["info", "warning", "danger"]
         },
     
         "activeTimestamp": {
    ```

### `components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml`

*   **Bug Fix:** Changing `op: replace` to `op: add` for the `relabelings` array is a good fix. If the `relabelings` array was empty or didn't exist, `replace` would fail. `add` ensures the element is added correctly.
    ```diff
    --- a/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    +++ b/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    @@ -1,6 +1,6 @@
    ---
    -- op: replace
    +- op: add
       path: /spec/endpoints/0/relabelings/0
       value:
         targetLabel: source_environment
         replacement: development-cluster
    ```

### `components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml`

*   **Note:** The `writeRelabelConfigs` section has been moved from an overlay to the base `MonitoringStack` definition, and a very extensive list of `match[]` parameters has been added to the `ServiceMonitor`. This is a significant change to the Prometheus metric collection and remote write configuration. While not a bug in the code itself, it requires careful review by a monitoring expert to ensure all necessary metrics are still collected and no unintended metrics are dropped.
    ```diff
    --- a/components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
    +++ b/components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
    @@ -35,21 +35,37 @@ spec:
           secret:
             key: client-id
             name: rhobs
         clientSecret:
           key: client-secret
           name: rhobs
         endpointParams:
           audience: # added by overlays
         tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
       url: # added by overlays
    -      # writeRelabelConfigs:  # added by overlays
    +      writeRelabelConfigs:
    +      - action: LabelKeep
    +        regex: "__name__|source_environment|source_cluster|namespace|app|pod|container|\
    +          label_pipelines_appstudio_openshift_io_type|health_status|dest_namespace|\
    +          controller|service|reason|phase|type|resource|resourcequota|le|app|image|\
    +          commit_hash|job|operation|tokenName|rateLimited|state|persistentvolumeclaim|\
    +          storageclass|volumename|release_reason|instance|result|deployment_reason|\
    +          validation_reason|strategy|succeeded|target|name|method|code|sp|le|\
    +          unexpected_status|failure|hostname|label_app_kubernetes_io_managed_by|status|\
    +          pipeline|pipelinename|pipelinerun|schedule|check|grpc_service|grpc_code|\
    +          grpc_method|lease|lease_holder|deployment|platform|mode|cpu|role|node|kind|\
    +          verb|request_kind|tested_cluster|resource_type|exported_job|http_method|\
    +          http_route|http_status_code|gin_errors|rule_result|rule_execution_cause|\
    +          policy_name|policy_background_mode|rule_type|policy_type|policy_validation_mode|\
    +          resource_request_operation|resource_kind|policy_change_type|event_type"
    +          
     
     ---
     # Grant permission to Federate In-Cluster Prometheus
     apiVersion: rbac.authorization.k8s.io/v1
     kind: ClusterRoleBinding
     metadata:
       name: appstudio-federate-ms-view
       labels:
         kubernetes.io/part-of: appstudio-federate-ms
         monitoring.rhobs/stack: appstudio-federate-ms
    @@ -71,7 +87,191 @@ metadata:
     spec:
       selector: # use the prometheus service to create a "dummy" target.
         matchLabels:
           app.kubernetes.io/managed-by: observability-operator
           app.kubernetes.io/name: appstudio-federate-ms-prometheus
       endpoints:
    -  - params: {}  # added by overlays
    +  - params:
    +      'match[]': # scrape only required metrics from in-cluster prometheus
    +        - '{__name__="pipeline_service_schedule_overhead_percentage_sum"}'
    +        - '{__name__="pipeline_service_schedule_overhead_percentage_count"}'
    +        - '{__name__="pipeline_service_execution_overhead_percentage_sum"}'
    +        - '{__name__="pipeline_service_execution_overhead_percentage_count"}'
    +        - '{__name__="pipelinerun_duration_scheduled_seconds_sum"}'
    +        - '{__name__="pipelinerun_duration_scheduled_seconds_count"}'
    +        - '{__name__="pipelinerun_gap_between_taskruns_milliseconds_sum"}'
    +        - '{__name__="pipelinerun_gap_between_taskruns_milliseconds_count"}'
    +        - '{__name__="pipelinerun_kickoff_not_attempted_count"}'
    +        - '{__name__="pending_resolutionrequest_count"}'
    +        - '{__name__="taskrun_pod_create_not_attempted_or_pending_count"}'
    +        - '{__name__="tekton_pipelines_controller_pipelinerun_count"}'
    +        - '{__name__="tekton_pipelines_controller_running_pipelineruns_count"}'
    +        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_quota_count"}'
    +        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_node_count"}'
    +        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_quota"}'
    +        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_node"}'
    +        - '{__name__="tekton_pipelines_controller_pipelinerun_duration_seconds_sum"}'
    +        - '{__name__="tekton_pipelines_controller_pipelinerun_duration_seconds_count"}'
    +        - '{__name__="watcher_workqueue_depth"}'
    +        - '{__name__="watcher_client_latency_bucket"}'
    +        - '{__name__="pac_watcher_work_queue_depth"}'
    +        - '{__name__="pac_watcher_client_latency_bucket"}'
    +        - '{__name__="grpc_server_handled_total", namespace=~"tekton-results|openshift-pipelines"}'
    +        - '{__name__="grpc_server_handled_total", namespace=~"openshift-etcd"}'
    +        - '{__name__="grpc_server_handling_seconds_bucket", namespace=~"tekton-results|openshift-pipelines"}'
    +        - '{__name__="grpc_server_handling_seconds_bucket", namespace="openshift-etcd"}'
    +        - '{__name__="grpc_server_msg_received_total", namespace="openshift-etcd"}'
    +        - '{__name__="controller_runtime_reconcile_errors_total", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="controller_runtime_reconcile_total", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="kube_lease_owner", namespace="openshift-pipelines", lease=~"controller.tektonresolverframework.bundleresolver..*"}'
    +        - '{__name__="kube_lease_owner", namespace="openshift-pipelines", lease=~"tekton-pipelines-controller.github.com.tektoncd.pipeline.pkg.reconciler..*"}'
    +        - '{__name__="kube_pod_status_unschedulable", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="kube_pod_container_status_restarts_total", namespace=~"openshift-pipelines|release-service"}'
    +        - '{__name__="kube_pod_container_status_waiting_reason", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="kube_pod_status_phase", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="kube_pod_container_resource_limits", namespace="release-service"}'
    +        - '{__name__="kube_pod_container_status_terminated_reason", namespace="release-service"}'
    +        - '{__name__="kube_pod_container_status_last_terminated_reason", namespace="release-service"}'
    +        - '{__name__="kube_pod_container_status_ready", namespace="release-service"}'
    +        - '{__name__="kube_persistentvolume_status_phase", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="kube_resourcequota", namespace!~".*-tenant|openshift-.*|kube-.*"}'
    +        - '{__name__="kube_statefulset_status_replicas_ready", namespace="gitops-service-argocd"}'
    +        - '{__name__="kube_statefulset_replicas", namespace="gitops-service-argocd"}'
    +        - '{__name__="openshift_route_status", namespace="gitops-service-argocd"}'
    +
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="gitops-service-argocd"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace=~"gitops-service-argocd"}'
    +
    +        # Namespace (expression):  "build-service"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="build-service"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="build-service"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="build-service"}'
    +
    +        # Namespace (expression):  "integration-service"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="integration-service"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="integration-service"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="integration-service"}'
    +
    +        # Namespace (expression):  "konflux-ui"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="konflux-ui"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="konflux-ui"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="konflux-ui"}'
    +        - '{__name__="kube_running_pods_ready", namespace="konflux-ui"}'
    +        - '{__name__="kube_endpoint_address", namespace="konflux-ui"}'
    +        - '{__name__="kube_pod_container_status_restarts_total", namespace="konflux-ui"}'
    +
    +        # Namespace (expression):  "mintmaker"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="mintmaker"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="mintmaker"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="mintmaker"}'
    +        - '{__name__="cluster_ram_requested_perc"}'
    +        - '{__name__="node_memory_pressured_perc"}'
    +        - '{__name__="redis_node_memory_usage_perc"}'
    +
    +        # Namespace (expression):  ~".*monitoring.*"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace=~".*monitoring.*"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace=~".*monitoring.*"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace=~".*monitoring.*"}'
    +
    +        # Namespace (expression):  "multi-platform-controller"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="multi-platform-controller"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="multi-platform-controller"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="multi-platform-controller"}'
    +
    +        # Namespace (expression):  "namespace-lister"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="namespace-lister"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="namespace-lister"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="namespace-lister"}'
    +
    +        # Namespace (expression):  "openshift-pipelines"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-pipelines"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-pipelines"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-pipelines"}'
    +
    +        # Namespace (expression):  "product-kubearchive"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="product-kubearchive"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="product-kubearchive"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="product-kubearchive"}'
    +
    +        # Namespace (expression):  "project-controller"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="project-controller"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="project-controller"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="project-controller"}'
    +
    +        # Namespace (expression):  "release-service"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="release-service"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="release-service"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="release-service"}'
    +
    +        # Namespace (expression):  ~"smee.*"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace=~"smee.*"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace=~"smee.*"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace=~"smee.*"}'
    +
    +        # Namespace (expression):  "openshift-apiserver"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-apiserver"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-apiserver"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-apiserver"}'
    +
    +        # Namespace (expression):  "openshift-oauth-apiserver"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-oauth-apiserver"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-oauth-apiserver"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-oauth-apiserver"}'
    +
    +        # Namespace (expression):  "konflux-kyverno"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="konflux-kyverno"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="konflux-kyverno"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="konflux-kyverno"}'
    +
    +        # Namespace (expression):  "openshift-kube-apiserver"
    +        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-kube-apiserver"}'
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-kube-apiserver"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-kube-apiserver"}'
    +
    +        # Namespace (expression):  "konflux-user-support"
    +        - '{__name__="kube_deployment_status_replicas_available", namespace="konflux-user-support"}'
    +        - '{__name__="kube_deployment_spec_replicas", namespace="konflux-user-support"}'
    +
    +        - '{__name__="argocd_app_reconcile_bucket", namespace="gitops-service-argocd"}'
    +        - '{__name__="argocd_app_info", namespace="gitops-service-argocd"}'
    +        - '{__name__="container_cpu_usage_seconds_total", namespace="release-service"}'
    +        - '{__name__="container_cpu_usage_seconds_total", namespace="openshift-etcd"}'
    +        - '{__name__="container_memory_usage_bytes", namespace="release-service"}'
    +        - '{__name__="container_memory_usage_bytes", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_disk_wal_fsync_duration_seconds_bucket"}'
    +        - '{__name__="etcd_disk_backend_commit_duration_seconds_bucket"}'
    +        - '{__name__="etcd_server_proposals_failed_total"}'
    +        - '{__name__="etcd_server_leader_changes_seen_total", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_server_has_leader", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_server_is_leader", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_server_id", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_server_quota_backend_bytes", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_mvcc_db_total_size_in_bytes", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_server_received_total", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_network_active_peers", namespace="openshift-etcd"}'
    +        - '{__name__="etcd_network_peer_round_trip_time_seconds_bucket"}'
    +        - '{__name__="etcd_disk_defrag_inflight"}'
    +        - '{__name__="kube_job_spec_completions"}'
    +        - '{__name__="kube_job_status_succeeded"}'
    +        - '{__name__="kube_job_status_failed"}'
    +        - '{__name__="node_cpu_seconds_total", mode="idle"}'
    +        - '{__name__="node_memory_MemTotal_bytes"}'
    +        - '{__name__="node_memory_MemAvailable_bytes"}'
    +        - '{__name__="platform:hypershift_hostedclusters:max"}'
    +        - '{__name__="kube_node_role"}'
    +        - '{__name__="etcd_shield_trigger"}'
    +        - '{__name__="etcd_shield_alert_triggered"}'
    +        - '{__name__="apiserver_admission_webhook_rejection_count", name="vpipelineruns.konflux-ci.dev"}'
    +        - '{__name__="apiserver_watch_events_total"}'
    +        - '{__name__="apiserver_storage_objects"}'
    +        - '{__name__="apiserver_current_inflight_requests"}'
    +        - '{__name__="resource_verb:apiserver_request_total:rate5m"}'
    +        - '{__name__="code:apiserver_request_total:rate5m"}'
    +        - '{__name__="instance:apiserver_request_total:rate5m"}'
    +        - '{__name__="prometheus_ready"}'
    +        - '{__name__="process_cpu_seconds_total", job="apiserver"}'
    +        - '{__name__="namespace:container_memory_usage_bytes:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
    +        - '{__name__="namespace:container_cpu_usage:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
    +        - '{__name__="node_namespace_pod:kube_pod_info:", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
    +        - '{__name__="kube_node_status_allocatable", resource=~"cpu|memory"}'
    +        - '{__name__="kube_node_status_condition", condition="MemoryPressure", status="true"}'
    +        - '{__name__="namespace_memory:kube_pod_container_resource_requests:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
    +        - '{__name__="namespace_cpu:kube_pod_container_resource_requests:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
    +        - '{__name__="namespace_memory:kube_pod_container_resource_limits:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
    +        - '{__name__="namespace_cpu:kube_pod_container_resource_limits:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
     
         relabelings:
         # override the target's address by the prometheus-k8s service name.
         - action: replace
           targetLabel: __address__
           replacement: prometheus-k8s.openshift-monitoring.svc:9091
    ```

### `components/tracing/otel-collector/staging/otel-collector-helm-values.yaml`

*   **Note:** The `prometheus` exporter and the `metrics` pipeline within the `telemetry` section have been removed. This indicates a change in how metrics are exported or collected from the OpenTelemetry Collector. This is a functional change that should be verified by the tracing/monitoring team.
    ```diff
    --- a/components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
    +++ b/components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
    @@ -3,9 +3,6 @@
     sapm:
       endpoint: "https://ingest.us1.signalfx.com/v2/trace"
       access_token: ${env:SIGNALFX_API_TOKEN}
     signalfx:
       endpoint: "https://api.signalfx.com/v2/traces"
       realm: "us1"
       api_url: "https://api.us1.signalfx.com"
       access_token: ${env:SIGNALFX_API_TOKEN} 
    -prometheus:
    -  endpoint: 0.0.0.0:8889
     extensions:
       # The health_check extension is mandatory for this chart.
       # Without the health_check extension the collector will fail the readiness and liveliness probes.
       # The health_check extension can be modified, but should never be removed.
    @@ -56,18 +53,7 @@ config:
             - debug
             - sapm
             - signalfx
           processors:
             - memory_limiter
             - attributes/collector-info
             - batch
             - attributes/stage
           receivers:
             - otlp
    -      metrics:
    -        receivers:
    -        - otlp
    -        processors:
    -        - memory_limiter
    -        - batch
    -        exporters:
    -        - prometheus
    -    telemetry:
    -      metrics:
    -        level: basic
    -        readers:
    -        - pull:
    -            exporter: prometheus
    -            host: 0.0.0.0
    -            port: 8888
    -
    +      metrics: null
     # Configuration for ports
     ports:
       otlp:
    ```

@testcara testcara force-pushed the konflux-system-alerts branch from ade86c7 to 87b260e Compare August 12, 2025 04:57
Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes, focusing on bugs and suggested improvements:

.github/workflows/validate-notification.yaml

Improvements:

  • Line 14: The glob pattern **/**/system-notifications/*.yaml can be simplified.
    --- a/.github/workflows/validate-notification.yaml
    +++ b/.github/workflows/validate-notification.yaml
    @@ -11,7 +11,7 @@
       - name: Check konflux-system-notification label present
         run: |
           mkdir -p ${{ env.TMP_DIR }}
    -      for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
    +      for file in  ${{ env.NOTIFICATION_DIR }}/**/system-notifications/*.yaml; do
             if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
               continue
             fi
    @@ -29,7 +29,7 @@
       - name: Validate notification-content.json files
         run: |
           mkdir -p ${{ env.TMP_DIR }}
    -      for file in  ${{ env.NOTIFICATION_DIR }}/**/**/system-notifications/*.yaml; do
    +      for file in  ${{ env.NOTIFICATION_DIR }}/**/system-notifications/*.yaml; do
             file_name=$(basename "$file")
             if [[ "$file_name" == "kustomization.yaml" ]]; then
               continue

components/konflux-info/base/rbac.yaml

Potential Bug / Security Concern:

  • Lines 9-11: The removal of resourceNames means this role now grants get access to all ConfigMaps in the konflux-info namespace, rather than just the explicitly listed ones (konflux-public-info and konflux-banner-configmap). If there are other ConfigMaps in this namespace that are not intended to be publicly viewable, this change would expose them. It's generally safer to explicitly list all ConfigMaps that should be accessible or use a label selector if supported by the RBAC system (which standard Kubernetes Role rules do not directly support for resources).

    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -5,9 +5,6 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ---

    Recommendation: Revert this change and explicitly add the new system notification ConfigMap names (e.g., konflux-system-notification-configmap-1) to the resourceNames list, or provide a clear justification that all ConfigMaps in konflux-info are intended to be public.

components/konflux-info/notification-schema.json

Bug:

  • Line 18: The type field's schema is missing the enum constraint to enforce the allowed values (info, warning, danger) as described in the README.md.

    --- a/components/konflux-info/notification-schema.json
    +++ b/components/konflux-info/notification-schema.json
    @@ -15,7 +15,8 @@
         },
         "type": {
           "type": "string",
  •      "description": "Required. The type of notification: 'info', 'warning', or 'danger'."
    
  •      "description": "Required. The type of notification: 'info', 'warning', or 'danger'.",
    
  •      "enum": ["info", "warning", "danger"]
       },
    
       "activeTimestamp": {
    
    
    

components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml

Bug:

  • Line 2: Changing op: replace to op: add for a patch operation can lead to issues if the target element already exists. If /spec/endpoints/0/relabelings/0 is expected to always exist and be modified, replace is the correct and safer operation. add is typically used for inserting new elements into a list or adding a new field.

    --- a/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    +++ b/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    @@ -1,6 +1,6 @@
     ---
    -- op: replace
    +- op: replace
       path: /spec/endpoints/0/relabelings/0
       value:
         targetLabel: source_environment
         replacement: development-cluster

Unrelated Changes

The following changes are unrelated to "support the system notifications and improve the README.md" and should be moved to separate pull requests to maintain clear commit history and simplify reviews:

  • components/mintmaker/production/base/manager_patch.yaml
  • components/mintmaker/production/kflux-ocp-p01/kustomization.yaml (and similar mintmaker kustomizations)
  • components/monitoring/grafana/base/dashboards/release/kustomization.yaml
  • components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
  • components/monitoring/prometheus/development/monitoringstack/kustomization.yaml
  • components/monitoring/prometheus/production/base/monitoringstack/endpoints-params.yaml (deleted)
  • components/monitoring/prometheus/production/base/monitoringstack/kustomization.yaml
  • components/monitoring/prometheus/production/base/monitoringstack/writeRelabelConfigs.yaml (deleted)
  • components/monitoring/prometheus/staging/base/monitoringstack/endpoints-params.yaml (deleted)
  • components/monitoring/prometheus/staging/base/monitoringstack/kustomization.yaml
  • components/monitoring/prometheus/staging/base/monitoringstack/writeRelabelConfigs.yaml (deleted)
  • components/monitoring/prometheus/staging/stone-stg-rh01/kustomization.yaml
  • components/monitoring/prometheus/staging/stone-stg-rh01/prometheusrule-uwm.yaml (deleted)
  • components/multi-platform-controller/base/kustomization.yaml
  • components/release/development/kustomization.yaml
  • components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
  • components/vector-kubearchive-log-collector/OWNERS (deleted)
  • components/vector-kubearchive-log-collector/base/kustomization.yaml
  • components/vector-kubearchive-log-collector/base/rbac.yaml (deleted)
  • components/vector-kubearchive-log-collector/staging/stone-stg-rh01/loki-helm-values.yaml

@testcara testcara force-pushed the konflux-system-alerts branch from 87b260e to 4465a6e Compare August 12, 2025 05:02
Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes:

Overall, the core changes related to supporting system notifications and updating the README are well-implemented. However, this pull request includes a significant number of unrelated changes that should ideally be separated into their own pull requests to maintain clarity, simplify reviews, and ensure proper change management.


Unrelated Changes (High Priority)

Several changes in this PR are unrelated to "system notifications" or "README improvement" and should be reverted and submitted in separate PRs. These changes affect different components and could have unintended side effects or require separate review by relevant teams.

  1. File: components/mintmaker/production/base/manager_patch.yaml

    • Line 13-14:
      -        env:
      -        - name: GOMEMLIMIT
      -          value: "7GiB"
    • Issue: Removal of GOMEMLIMIT environment variable for mintmaker. This is a functional change to the mintmaker component's memory limits and is unrelated to system notifications.
  2. Files: components/mintmaker/production/*/kustomization.yaml (e.g., kflux-ocp-p01, kflux-osp-p01, etc.)

    • Lines 6-9 (example from kflux-ocp-p01/kustomization.yaml):
      +images:
      +  - name: quay.io/konflux-ci/mintmaker
      +    newName: quay.io/konflux-ci/mintmaker
      +    newTag: a3816e76f5e06ba3114731fc34fa14f76d442b6c
    • Issue: Image pinning for mintmaker. This is an image update/pinning change and should be a separate PR.
  3. File: components/monitoring/grafana/base/dashboards/release/kustomization.yaml

    • Line 4:
      - https://github.com/konflux-ci/release-service/config/grafana/?ref=f6eb52a7403263fdf0109decadb9924e355b32fe
      + https://github.com/konflux-ci/release-service/config/grafana/?ref=26f6f47a8e09862c907f5ae1e820fc319e0c2866
    • Issue: Update to release-service Grafana dashboard reference. This is an update to monitoring dashboards and is unrelated.
  4. File: components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml

    • Lines 37-53:
      +      writeRelabelConfigs:
      +      - action: LabelKeep
      +        regex: "__name__|source_environment|source_cluster|namespace|app|pod|container|\
      +          label_pipelines_appstudio_openshift_io_type|health_status|dest_namespace|\
      +          controller|service|reason|phase|type|resource|resourcequota|le|app|image|\
      +          commit_hash|job|operation|tokenName|rateLimited|state|persistentvolumeclaim|\
      +          storageclass|volumename|release_reason|instance|result|deployment_reason|\
      +          validation_reason|strategy|succeeded|target|name|method|code|sp|le|\
      +          unexpected_status|failure|hostname|label_app_kubernetes_io_managed_by|status|\
      +          pipeline|pipelinename|pipelinerun|schedule|check|grpc_service|grpc_code|\
      +          grpc_method|lease|lease_holder|deployment|platform|mode|cpu|role|node|kind|\
      +          verb|request_kind|tested_cluster|resource_type|exported_job|http_method|\
      +          http_route|http_status_code|gin_errors|rule_result|rule_execution_cause|\
      +          policy_name|policy_background_mode|rule_type|policy_type|policy_validation_mode|\
      +          resource_request_operation|resource_kind|policy_change_type|event_type"
    • Lines 74-209:
      +  - params:
      +      'match[]': # scrape only required metrics from in-cluster prometheus
      +        - '{__name__="pipeline_service_schedule_overhead_percentage_sum"}'
      +        - '{__name__="pipeline_service_schedule_overhead_percentage_count"}'
      +        - '{__name__="pipeline_service_execution_overhead_percentage_sum"}'
      +        - '{__name__="pipeline_service_execution_overhead_percentage_count"}'
      +        ... (and many more lines)
    • Issue: These are very extensive and critical changes to the Prometheus monitoring stack, defining which metrics are scraped and which labels are kept. This is a major functional change to observability and should be a dedicated PR.
  5. File: components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml

    • Line 3:
      - op: replace
      + op: add
    • Issue: Changing op: replace to op: add for a Kustomize patch. If the target path /spec/endpoints/0/relabelings/0 already exists, op: add will attempt to insert a new element at index 0, potentially causing unexpected behavior or errors if the array index is already occupied. It should likely be op: replace if modifying an existing entry, or op: add at the end of the array (e.g., relabelings/-) if adding a new one. Given the context, it seems like it's adding a new relabeling rule, so relabelings/- would be safer.
  6. Files: components/monitoring/prometheus/production/base/monitoringstack/endpoints-params.yaml and writeRelabelConfigs.yaml (Deleted)

    • Issue: While these deletions are consistent with the changes made directly in monitoringstack.yaml, the reason for their deletion (i.e., moving their content) is part of the large, unrelated monitoring change.
  7. Files: components/monitoring/prometheus/staging/base/monitoringstack/endpoints-params.yaml and writeRelabelConfigs.yaml (Deleted)

    • Issue: Same as above, part of the large, unrelated monitoring change.
  8. File: components/monitoring/prometheus/staging/stone-stg-rh01/prometheusrule-uwm.yaml (Deleted)

    • Issue: Deletion of a PrometheusRule. This is a functional change to monitoring rules and is unrelated.
  9. File: components/multi-platform-controller/base/kustomization.yaml

    • Lines 8-9, 13-14, 16-17:
      - https://github.com/konflux-ci/multi-platform-controller/deploy/operator?ref=835e6b85c4ba77cc90e22ed3dcb116f915b61558
      - https://github.com/konflux-ci/multi-platform-controller/deploy/otp?ref=835e6b85c4ba77cc90e22ed3dcb116f915b61558
      + https://github.com/konflux-ci/multi-platform-controller/deploy/operator?ref=944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
      + https://github.com/konflux-ci/multi-platform-controller/deploy/otp?ref=944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
      ...
      -  newTag: 835e6b85c4ba77cc90e22ed3dcb116f915b61558
      +  newTag: 944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
      -  newTag: 835e6b85c4ba77cc90e22ed3dcb116f915b61558
      +  newTag: 944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
    • Issue: Updates to multi-platform-controller image tags and manifest references. This is an image update and is unrelated.
  10. File: components/release/development/kustomization.yaml

    • Lines 5, 10:
      -  - https://github.com/konflux-ci/release-service/config/default?ref=f6eb52a7403263fdf0109decadb9924e355b32fe
      +  - https://github.com/konflux-ci/release-service/config/default?ref=26f6f47a8e09862c907f5ae1e820fc319e0c2866
      ...
      -    newTag: f6eb52a7403263fdf0109decadb9924e355b32fe
      +    newTag: 26f6f47a8e09862c907f5ae1e820fc319e0c2866
    • Issue: Updates to release-service image tag and manifest reference. This is an image update and is unrelated.
  11. File: components/tracing/otel-collector/staging/otel-collector-helm-values.yaml

    • Lines 10-12, 34-39, 41:
      -    prometheus:
      -      endpoint: 0.0.0.0:8889
      ...
      -      metrics:
      -        receivers:
      -        - otlp
      -        processors:
      -        - memory_limiter
      -        - batch
      -        exporters:
      -        - prometheus
      -    telemetry:
      -      metrics:
      -        level: basic
      -        readers:
      -        - pull:
      -            exporter: prometheus
      -            host: 0.0.0.0
      -            port: 8888
      +      metrics: null
    • Issue: Removal of Prometheus exporter and metrics pipeline from OpenTelemetry Collector. This is a significant functional change to how metrics are collected and exposed, and it is unrelated.
  12. File: components/vector-kubearchive-log-collector/OWNERS (Deleted)

    • Issue: Deletion of the OWNERS file. This changes the ownership and review process for this component and is unrelated.
  13. File: components/vector-kubearchive-log-collector/base/rbac.yaml (Deleted)

    • Issue: Deletion of RBAC roles for kubearchive-logging-component-maintainers and kubearchive-logging-admin. This is a functional change to permissions and is unrelated.
  14. File: components/vector-kubearchive-log-collector/staging/stone-stg-rh01/loki-helm-values.yaml

    • Lines 25-27:
      +    - name: tmp-storage
      +      mountPath: /tmp
      ...
      +    - name: tmp-storage
      +      emptyDir: {}
    • Issue: Addition of tmp-storage volume to Loki. This is a functional change to Loki deployment and is unrelated.

Issues and Improvements (Related to System Notifications)

  1. File: components/konflux-info/base/rbac.yaml

    • Lines 8-11:
      -    resourceNames:
      -      - konflux-public-info
      -      - konflux-banner-configmap
           resources:
             - configmaps
    • Issue: The removal of resourceNames means this role now grants get access to all ConfigMaps in the konflux-info namespace, not just the specific ones previously listed. This is an over-permissioning and a potential security concern if other sensitive ConfigMaps exist or are added to this namespace in the future.
    • Suggestion: Revert this change and instead add the new konflux-system-notification-configmap (or a more generic name if multiple notification configmaps are expected) to the resourceNames list. If the UI needs to discover configmaps by label, the RBAC should ideally be updated to allow listing configmaps with a specific label, rather than all configmaps. If that's not possible with resourceNames, consider a more granular approach or accept the broader permission if the namespace is strictly controlled.
  2. File: .github/workflows/validate-notification.yaml

    • Lines 20-21:

        label=$(yq e '.metadata.labels."konflux.system.notification"' "$file")
        if [[ "$label" != "\"true\"" && "$label" != "true" ]]; then
    • Improvement: The yq command might return null if the label is missing, which can be tricky to compare directly as a string. A more robust way to check for the label's presence and value is to use yq's has operator or check for null explicitly.

    • Suggested Change:

      --- a/.github/workflows/validate-notification.yaml
      +++ b/.github/workflows/validate-notification.yaml
      @@ -17,7 +17,7 @@
           if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
             continue
           fi
          label=$(yq e '.metadata.labels."konflux.system.notification"' "$file")
      -    if [[ "$label" != "\"true\"" && "$label" != "true" ]]; then
      +    if [[ "$(yq e 'has(.metadata.labels."konflux.system.notification")' "$file")" != "true" || "$(yq e '.metadata.labels."konflux.system.notification"' "$file")" != "true" ]]; then
             echo "ERROR: $file is missing konflux-system-notification: \"true\" label"
             exit 1
           fi

      Self-correction: The yq e 'has(...)' check is good, but the second part || "$(yq e '.metadata.labels."konflux.system.notification"' "$file")" != "true" is redundant if has is false. A simpler approach would be to check if the label exists AND its value is "true".

    • Revised Suggested Change for Label Check:

      --- a/.github/workflows/validate-notification.yaml
      +++ b/.github/workflows/validate-notification.yaml
      @@ -17,7 +17,7 @@
           if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
             continue
           fi
      -    label=$(yq e '.metadata.labels."konflux.system.notification"' "$file")
      -    if [[ "$label" != "\"true\"" && "$label" != "true" ]]; then
      +    if [[ "$(yq e 'has(.metadata.labels."konflux.system.notification")' "$file")" != "true" || "$(yq e '.metadata.labels."konflux.system.notification"' "$file")" != "true" ]]; then
             echo "ERROR: $file is missing konflux-system-notification: \"true\" label"
             exit 1
           fi

      Further refinement: The yq output for a boolean true is true (unquoted) and for a string "true" is "true" (quoted). The original check "$label" != "\"true\"" && "$label" != "true" correctly handles both. The has check is good for existence. Let's combine them more cleanly.

    • Final Suggested Change for Label Check:

      --- a/.github/workflows/validate-notification.yaml
      +++ b/.github/workflows/validate-notification.yaml
      @@ -17,7 +17,7 @@
           if [[ "$(basename "$file")" == "kustomization.yaml" ]]; then
             continue
           fi
      -    label=$(yq e '.metadata.labels."konflux.system.notification"' "$file")
      -    if [[ "$label" != "\"true\"" && "$label" != "true" ]]; then
      +    if ! yq e '.metadata.labels."konflux.system.notification" == "true"' "$file" &>/dev/null; then
             echo "ERROR: $file is missing konflux-system-notification: \"true\" label or has incorrect value"
             exit 1
           fi

      Explanation: yq e 'expression' returns true or false. &>/dev/null suppresses yq's output, and we check its exit code. If the expression evaluates to true, yq exits with 0. If false or an error (e.g., path not found), it exits with non-zero. This is more idiomatic for yq checks.

    • Lines 35-36:

        # Extract JSON content from ConfigMap
        yq e '.data."notification-content.json"' "$file" > "${TMP_DIR}/${file_name}.json"
    • Improvement: This command assumes notification-content.json always exists in the data field. If a ConfigMap has the label but is missing this key, yq might output an empty file or an error, leading to a confusing ajv validation failure. It's better to check for the existence of the key before attempting to extract.

    • Suggested Change:

      --- a/.github/workflows/validate-notification.yaml
      +++ b/.github/workflows/validate-notification.yaml
      @@ -32,10 +32,14 @@
           if [[ "$file_name" == "kustomization.yaml" ]]; then
             continue
           fi
      -    # Extract JSON content from ConfigMap
      -    yq e '.data."notification-content.json"' "$file" > "${TMP_DIR}/${file_name}.json"
      -    ajv validate -s "${SCHEMA_FILE}" -d "${TMP_DIR}/${file_name}.json" --errors=text
      -    if [ $? -ne 0 ]; then
      -      echo "ERROR: notification-content.json in ${file} does not validate against schema"
      -      exit 1
      +    if yq e 'has(.data."notification-content.json")' "$file" &>/dev/null; then
      +      # Extract JSON content from ConfigMap
      +      yq e '.data."notification-content.json"' "$file" > "${TMP_DIR}/${file_name}.json"
      +      ajv validate -s "${SCHEMA_FILE}" -d "${TMP_DIR}/${file_name}.json" --errors=text
      +      if [ $? -ne 0 ]; then
      +        echo "ERROR: notification-content.json in ${file} does not validate against schema"
      +        exit 1
      +      fi
      +    else
      +      echo "ERROR: $file is missing 'data.notification-content.json' field."
      +      exit 1
           fi
         done

Good Changes

  • components/konflux-info/README.md: The updates to the README are comprehensive, well-structured, and clearly explain the new system notifications feature, its format, validation, and UI behavior. The migration from "Auto Alerts" to "System Notifications" is well-documented.
  • components/konflux-info/auto-alert-schema.json (Deleted) and components/konflux-info/notification-schema.json (New): The replacement of the old schema with a new, more appropriate one for system notifications is good. The new schema looks well-defined.
  • File Renames and Deletions: The consistent renaming of auto-alerts directories and files to system-notifications across the cluster configurations is well-executed.

Recommendation:

Please revert all the "Unrelated Changes" identified above. Once those are removed, the PR can be re-reviewed focusing solely on the system notification feature and the suggested improvements for the rbac.yaml and validate-notification.yaml workflow.

@testcara testcara force-pushed the konflux-system-alerts branch from 4465a6e to 83f583c Compare August 12, 2025 05:06
Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes, focusing on identifying bugs and suggesting improvements.


components/konflux-info/base/rbac.yaml

Concern: Security Vulnerability - Overly Broad Permissions

The change removes specific resourceNames for ConfigMaps and instead grants get access to all ConfigMaps within the konflux-info namespace (as determined by the kustomization.yaml which applies this Role and RoleBinding to that namespace). While this might be intended for dynamic discovery of system notifications, granting get access to all ConfigMaps in a namespace to system:authenticated users (via the RoleBinding) is a significant broadening of permissions. If any sensitive information is stored in other ConfigMaps within konflux-info without the specific notification label, it could be exposed.

Exact lines to change (or consider):

--- a/components/konflux-info/base/rbac.yaml
+++ b/components/konflux-info/base/rbac.yaml
@@ -1,23 +1,20 @@
 ---
 kind: Role
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: konflux-public-info-view-role
+  # Add namespace here to explicitly scope the Role to 'konflux-info'
+  # If omitted, it applies to the namespace where the Role is deployed by Kustomize.
+  # namespace: konflux-info
 rules:
   - verbs:
       - get
     apiGroups:
       - ''
-    resourceNames:
-      - konflux-public-info
-      - konflux-banner-configmap
     resources:
       - configmaps
+    # Consider adding a labelSelector if Kubernetes RBAC supported it for ConfigMaps.
+    # Since it doesn't, the UI must filter by label.
 ---
 kind: RoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: konflux-public-info-view-rb
+  # Add namespace here to explicitly scope the RoleBinding to 'konflux-info'
+  # namespace: konflux-info
 subjects:
   - kind: Group
     apiGroup: rbac.authorization.k8s.io
     name: system:authenticated
 roleRef:
   apiGroup: rbac.authorization.k8s.io
-  kind: ClusterRole
+  kind: Role # Change to Role if the Role is namespace-scoped
   name: konflux-public-info-view-role

Recommendation:

  1. Acknowledge and Accept Risk: If the broader access to all ConfigMaps in konflux-info is accepted due to the dynamic nature of system notifications and the lack of labelSelector support in Kubernetes RBAC for resources, ensure that no sensitive ConfigMaps without the konflux.system.notification: "true" label are ever stored in the konflux-info namespace.
  2. Explicit Namespace Scoping: While Kustomize will apply these resources to konflux-info, explicitly adding namespace: konflux-info to both the Role and RoleBinding metadata makes the intent clearer and prevents accidental cluster-wide application if these resources were ever deployed outside of the Kustomize context. Also, ensure kind: Role is used in roleRef if the Role is namespace-scoped.

components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml

Improvement: Readability and Maintainability of Long Regex/Lists

The writeRelabelConfigs regex and the match[] list within the ServiceMonitor are extremely long and difficult to read and maintain. While this is a functional change for metric filtering and cost optimization, it introduces a significant maintenance burden.

Exact lines to consider:

--- a/components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
+++ b/components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
@@ -35,21 +35,37 @@ spec:
           secret:
             key: client-id
             name: rhobs
         clientSecret:
           key: client-secret
           name: rhobs
         endpointParams:
           audience: # added by overlays
         tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
       url: # added by overlays
-      # writeRelabelConfigs:  # added by overlays
+      writeRelabelConfigs:
+      - action: LabelKeep
+        regex: "__name__|source_environment|source_cluster|namespace|app|pod|container|\
+          label_pipelines_appstudio_openshift_io_type|health_status|dest_namespace|\
+          controller|service|reason|phase|type|resource|resourcequota|le|app|image|\
+          commit_hash|job|operation|tokenName|rateLimited|state|persistentvolumeclaim|\
+          storageclass|volumename|release_reason|instance|result|deployment_reason|\
+          validation_reason|strategy|succeeded|target|name|method|code|sp|le|\
+          unexpected_status|failure|hostname|label_app_kubernetes_io_managed_by|status|\
+          pipeline|pipelinename|pipelinerun|schedule|check|grpc_service|grpc_code|\
+          grpc_method|lease|lease_holder|deployment|platform|mode|cpu|role|node|kind|\
+          verb|request_kind|tested_cluster|resource_type|exported_job|http_method|\
+          http_route|http_status_code|gin_errors|rule_result|rule_execution_cause|\
+          policy_name|policy_background_mode|rule_type|policy_type|policy_validation_mode|\
+          resource_request_operation|resource_kind|policy_change_type|event_type"
+          
+
 ---
 # Grant permission to Federate In-Cluster Prometheus
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
   name: appstudio-federate-ms-view
   labels:
     kubernetes.io/part-of: appstudio-federate-ms
     monitoring.rhobs/stack: appstudio-federate-ms
 roleRef:
@@ -71,21 +167,209 @@ metadata:
   namespace: appstudio-monitoring
   labels:
     kubernetes.io/part-of: appstudio-federate-ms
     monitoring.rhobs/stack: appstudio-federate-ms
 spec:
   selector: # use the prometheus service to create a "dummy" target.
     matchLabels:
       app.kubernetes.io/managed-by: observability-operator
       app.kubernetes.io/name: appstudio-federate-ms-prometheus
   endpoints:
-  - params: {}  # added by overlays
+  - params:
+      'match[]': # scrape only required metrics from in-cluster prometheus
+        - '{__name__="pipeline_service_schedule_overhead_percentage_sum"}'
+        - '{__name__="pipeline_service_schedule_overhead_percentage_count"}'
+        - '{__name__="pipeline_service_execution_overhead_percentage_sum"}'
+        - '{__name__="pipeline_service_execution_overhead_percentage_count"}'
+        - '{__name__="pipelinerun_duration_scheduled_seconds_sum"}'
+        - '{__name__="pipelinerun_duration_scheduled_seconds_count"}'
+        - '{__name__="pipelinerun_gap_between_taskruns_milliseconds_sum"}'
+        - '{__name__="pipelinerun_gap_between_taskruns_milliseconds_count"}'
+        - '{__name__="pipelinerun_kickoff_not_attempted_count"}'
+        - '{__name__="pending_resolutionrequest_count"}'
+        - '{__name__="taskrun_pod_create_not_attempted_or_pending_count"}'
+        - '{__name__="tekton_pipelines_controller_pipelinerun_count"}'
+        - '{__name__="tekton_pipelines_controller_running_pipelineruns_count"}'
+        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_quota_count"}'
+        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_node_count"}'
+        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_quota"}'
+        - '{__name__="tekton_pipelines_controller_running_taskruns_throttled_by_node"}'
+        - '{__name__="tekton_pipelines_controller_pipelinerun_duration_seconds_sum"}'
+        - '{__name__="tekton_pipelines_controller_pipelinerun_duration_seconds_count"}'
+        - '{__name__="watcher_workqueue_depth"}'
+        - '{__name__="watcher_client_latency_bucket"}'
+        - '{__name__="pac_watcher_work_queue_depth"}'
+        - '{__name__="pac_watcher_client_latency_bucket"}'
+        - '{__name__="grpc_server_handled_total", namespace=~"tekton-results|openshift-pipelines"}'
+        - '{__name__="grpc_server_handled_total", namespace=~"openshift-etcd"}'
+        - '{__name__="grpc_server_handling_seconds_bucket", namespace=~"tekton-results|openshift-pipelines"}'
+        - '{__name__="grpc_server_handling_seconds_bucket", namespace="openshift-etcd"}'
+        - '{__name__="grpc_server_msg_received_total", namespace="openshift-etcd"}'
+        - '{__name__="controller_runtime_reconcile_errors_total", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="controller_runtime_reconcile_total", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="kube_lease_owner", namespace="openshift-pipelines", lease=~"controller.tektonresolverframework.bundleresolver..*"}'
+        - '{__name__="kube_lease_owner", namespace="openshift-pipelines", lease=~"tekton-pipelines-controller.github.com.tektoncd.pipeline.pkg.reconciler..*"}'
+        - '{__name__="kube_pod_status_unschedulable", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="kube_pod_container_status_restarts_total", namespace=~"openshift-pipelines|release-service"}'
+        - '{__name__="kube_pod_container_status_waiting_reason", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="kube_pod_status_phase", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="kube_pod_container_resource_limits", namespace="release-service"}'
+        - '{__name__="kube_pod_container_status_terminated_reason", namespace="release-service"}'
+        - '{__name__="kube_pod_container_status_last_terminated_reason", namespace="release-service"}'
+        - '{__name__="kube_pod_container_status_ready", namespace="release-service"}'
+        - '{__name__="kube_persistentvolume_status_phase", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="kube_resourcequota", namespace!~".*-tenant|openshift-.*|kube-.*"}'
+        - '{__name__="kube_statefulset_status_replicas_ready", namespace="gitops-service-argocd"}'
+        - '{__name__="kube_statefulset_replicas", namespace="gitops-service-argocd"}'
+        - '{__name__="openshift_route_status", namespace="gitops-service-argocd"}'
+
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="gitops-service-argocd"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace=~"gitops-service-argocd"}'
+
+        # Namespace (expression):  "build-service"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="build-service"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="build-service"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="build-service"}'
+
+        # Namespace (expression):  "integration-service"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="integration-service"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="integration-service"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="integration-service"}'
+
+        # Namespace (expression):  "konflux-ui"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="konflux-ui"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="konflux-ui"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="konflux-ui"}'
+        - '{__name__="kube_running_pods_ready", namespace="konflux-ui"}'
+        - '{__name__="kube_endpoint_address", namespace="konflux-ui"}'
+        - '{__name__="kube_pod_container_status_restarts_total", namespace="konflux-ui"}'
+
+        # Namespace (expression):  "mintmaker"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="mintmaker"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="mintmaker"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="mintmaker"}'
+        - '{__name__="cluster_ram_requested_perc"}'
+        - '{__name__="node_memory_pressured_perc"}'
+        - '{__name__="redis_node_memory_usage_perc"}'
+
+        # Namespace (expression):  ~".*monitoring.*"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace=~".*monitoring.*"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace=~".*monitoring.*"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace=~".*monitoring.*"}'
+
+        # Namespace (expression):  "multi-platform-controller"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="multi-platform-controller"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="multi-platform-controller"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="multi-platform-controller"}'
+
+        # Namespace (expression):  "namespace-lister"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="namespace-lister"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="namespace-lister"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="namespace-lister"}'
+
+        # Namespace (expression):  "openshift-pipelines"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-pipelines"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-pipelines"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-pipelines"}'
+
+        # Namespace (expression):  "product-kubearchive"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="product-kubearchive"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="product-kubearchive"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="product-kubearchive"}'
+
+        # Namespace (expression):  "project-controller"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="project-controller"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="project-controller"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="project-controller"}'
+
+        # Namespace (expression):  "release-service"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="release-service"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="release-service"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="release-service"}'
+
+        # Namespace (expression):  ~"smee.*"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace=~"smee.*"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace=~"smee.*"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace=~"smee.*"}'
+
+        # Namespace (expression):  "openshift-apiserver"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-apiserver"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-apiserver"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-apiserver"}'
+
+        # Namespace (expression):  "openshift-oauth-apiserver"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-oauth-apiserver"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-oauth-apiserver"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-oauth-apiserver"}'
+
+        # Namespace (expression):  "konflux-kyverno"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="konflux-kyverno"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="konflux-kyverno"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="konflux-kyverno"}'
+
+        # Namespace (expression):  "openshift-kube-apiserver"
+        - '{__name__="kube_deployment_status_replicas_ready", namespace="openshift-kube-apiserver"}'
+        - '{__name__="kube_deployment_status_replicas_available", namespace="openshift-kube-apiserver"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="openshift-kube-apiserver"}'
+
+        # Namespace (expression):  "konflux-user-support"
+        - '{__name__="kube_deployment_status_replicas_available", namespace="konflux-user-support"}'
+        - '{__name__="kube_deployment_spec_replicas", namespace="konflux-user-support"}'
+
+        - '{__name__="argocd_app_reconcile_bucket", namespace="gitops-service-argocd"}'
+        - '{__name__="argocd_app_info", namespace="gitops-service-argocd"}'
+        - '{__name__="container_cpu_usage_seconds_total", namespace="release-service"}'
+        - '{__name__="container_cpu_usage_seconds_total", namespace="openshift-etcd"}'
+        - '{__name__="container_memory_usage_bytes", namespace="release-service"}'
+        - '{__name__="container_memory_usage_bytes", namespace="openshift-etcd"}'
+        - '{__name__="etcd_disk_wal_fsync_duration_seconds_bucket"}'
+        - '{__name__="etcd_disk_backend_commit_duration_seconds_bucket"}'
+        - '{__name__="etcd_server_proposals_failed_total"}'
+        - '{__name__="etcd_server_leader_changes_seen_total", namespace="openshift-etcd"}'
+        - '{__name__="etcd_server_has_leader", namespace="openshift-etcd"}'
+        - '{__name__="etcd_server_is_leader", namespace="openshift-etcd"}'
+        - '{__name__="etcd_server_id", namespace="openshift-etcd"}'
+        - '{__name__="etcd_server_quota_backend_bytes", namespace="openshift-etcd"}'
+        - '{__name__="etcd_mvcc_db_total_size_in_bytes", namespace="openshift-etcd"}'
+        - '{__name__="etcd_server_received_total", namespace="openshift-etcd"}'
+        - '{__name__="etcd_network_active_peers", namespace="openshift-etcd"}'
+        - '{__name__="etcd_network_peer_round_trip_time_seconds_bucket"}'
+        - '{__name__="etcd_disk_defrag_inflight"}'
+        - '{__name__="kube_job_spec_completions"}'
+        - '{__name__="kube_job_status_succeeded"}'
+        - '{__name__="kube_job_status_failed"}'
+        - '{__name__="node_cpu_seconds_total", mode="idle"}'
+        - '{__name__="node_memory_MemTotal_bytes"}'
+        - '{__name__="node_memory_MemAvailable_bytes"}'
+        - '{__name__="platform:hypershift_hostedclusters:max"}'
+        - '{__name__="kube_node_role"}'
+        - '{__name__="etcd_shield_trigger"}'
+        - '{__name__="etcd_shield_alert_triggered"}'
+        - '{__name__="apiserver_admission_webhook_rejection_count", name="vpipelineruns.konflux-ci.dev"}'
+        - '{__name__="apiserver_watch_events_total"}'
+        - '{__name__="apiserver_storage_objects"}'
+        - '{__name__="apiserver_current_inflight_requests"}'
+        - '{__name__="resource_verb:apiserver_request_total:rate5m"}'
+        - '{__name__="code:apiserver_request_total:rate5m"}'
+        - '{__name__="instance:apiserver_request_total:rate5m"}'
+        - '{__name__="prometheus_ready"}'
+        - '{__name__="process_cpu_seconds_total", job="apiserver"}'
+        - '{__name__="namespace:container_memory_usage_bytes:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+        - '{__name__="namespace:container_cpu_usage:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+        - '{__name__="node_namespace_pod:kube_pod_info:", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+        - '{__name__="kube_node_status_allocatable", resource=~"cpu|memory"}'
+        - '{__name__="kube_node_status_condition", condition="MemoryPressure", status="true"}'
+        - '{__name__="namespace_memory:kube_pod_container_resource_requests:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+        - '{__name__="namespace_cpu:kube_pod_container_resource_requests:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+        - '{__name__="namespace_memory:kube_pod_container_resource_limits:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+        - '{__name__="namespace_cpu:kube_pod_container_resource_limits:sum", namespace=~"openshift-etcd|openshift-kube-apiserver|build-service|image-controller|integration-service|konflux-ui|product-kubearchive|openshift-kueue-operator|tekton-kueue|kueue-external-admission|mintmaker|multi-platform-controller|namespace-lister|openshift-pipelines|tekton-results|project-controller|smee|smee-client"}'
+
     relabelings:
     # override the target's address by the prometheus-k8s service name.
     - action: replace
       targetLabel: __address__
       replacement: prometheus-k8s.openshift-monitoring.svc:9091
     # remove the default target labels as they aren't relevant in case of federation.
     - action: labeldrop
       regex: pod|namespace|service|endpoint|container
     # 30s interval creates 4 scrapes per minute
     # prometheus-k8s.svc x 2 ms-prometheus x (60s/ 30s) = 4

Recommendation:
While functionally correct, consider if there's a way to manage these extensive lists more modularly or generate them programmatically to improve readability and reduce the chance of errors during future updates. This is a design/maintenance improvement rather than a bug.


components/vector-kubearchive-log-collector/base/rbac.yaml (deleted) and components/vector-kubearchive-log-collector/base/kustomization.yaml (removed reference)

Concern: Missing RBAC Definitions

The deletion of rbac.yaml and its removal from kustomization.yaml means that the konflux-kubearchive group no longer has explicitly defined component-maintainer or admin roles for the product-kubearchive-logging namespace within this repository. This could be a bug if these permissions are still required for the vector-kubearchive-log-collector component to function correctly or for its maintainers to manage it.

Exact lines to consider:
The entire file components/vector-kubearchive-log-collector/base/rbac.yaml was deleted.
The line - rbac.yaml was removed from components/vector-kubearchive-log-collector/base/kustomization.yaml.

Recommendation:
Verify that the necessary RBAC permissions for the vector-kubearchive-log-collector component and its maintainers are now defined elsewhere, or that they are genuinely no longer required. If they are still needed and not defined elsewhere, this change introduces a functional or security regression.


components/tracing/otel-collector/staging/otel-collector-helm-values.yaml

Improvement: Clarify Observability Strategy Change

The removal of the prometheus exporter and the metrics pipeline from the OpenTelemetry collector configuration is a significant change to the observability setup. It implies that metrics previously exposed via Prometheus are no longer collected or are now handled by other means (e.g., directly by SignalFx).

Exact lines to consider:

--- a/components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
+++ b/components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
@@ -1,22 +1,20 @@
 config:
   exporters:
     sapm:
       endpoint: "https://ingest.us1.signalfx.com/v2/trace"
       access_token: ${env:SIGNALFX_API_TOKEN}
     signalfx:
       endpoint: "https://api.signalfx.com/v2/traces"
       realm: "us1"
       api_url: "https://api.us1.signalfx.com"
       access_token: ${env:SIGNALFX_API_TOKEN} 
-    prometheus:
-      endpoint: 0.0.0.0:8889
   extensions:
     # The health_check extension is mandatory for this chart.
     # Without the health_check extension the collector will fail the readiness and liveliness probes.
     # The health_check extension can be modified, but should never be removed.
     health_check: {}
   processors:
     attributes/stage:
       actions:
         - key: deployment.environment
           value: stage
@@ -56,37 +38,21 @@ config:
           - debug
           - sapm
           - signalfx
         processors:
           - memory_limiter
           - attributes/collector-info
           - batch
           - attributes/stage
         receivers:
           - otlp
-      metrics:
-        receivers:
-        - otlp
-        processors:
-        - memory_limiter
-        - batch
-        exporters:
-        - prometheus
-    telemetry:
-      metrics:
-        level: basic
-        readers:
-        - pull:
-            exporter: prometheus
-            host: 0.0.0.0
-            port: 8888
-
+      metrics: null
 # Configuration for ports
 ports:
   otlp:
     enabled: true
     containerPort: 4317
     servicePort: 4317
     hostPort: 4317
     protocol: TCP
   otlp-http:
     enabled: true

Recommendation:
Confirm with the monitoring/observability team that this change is intentional and that any metrics previously collected via the Prometheus exporter are no longer needed or are being collected through other, confirmed channels.


Other Changes (No immediate issues found)

  • .github/workflows/validate-notification.yaml (New File): The CI workflow for validating system notifications appears well-structured and robust. The yq and ajv-cli usage is appropriate for the task.
  • components/konflux-info/README.md: The updates to the README are comprehensive and clearly explain the new "System Notifications" feature and re-organize the "Konflux Banner" section. The use of numbered lists and internal links improves readability.
  • components/konflux-info/auto-alert-schema.json (Deleted): Consistent with the removal of "Auto Alerts".
  • components/konflux-info/notification-schema.json (New File): The new JSON schema for system notifications is well-defined, supporting both single objects and arrays, and using additionalProperties: false for strict validation.
  • Deletion of auto-alerts directories and files: Consistent with the replacement of "Auto Alerts" with "System Notifications".
  • Updates to kustomization.yaml files: The changes to include system-notifications and remove auto-alerts are logical for the rollout strategy.
  • Renamed and new system-notifications files: The renaming and creation of example notification files are consistent with the new feature.
  • components/mintmaker/production/base/manager_patch.yaml: Removal of GOMEMLIMIT is likely an intentional optimization for newer Go versions or specific memory management.
  • components/mintmaker/production/*/kustomization.yaml: Image updates are standard.
  • components/monitoring/grafana/base/dashboards/release/kustomization.yaml: Reference update is standard.
  • components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml: Changing op: replace to op: add for the relabeling rule is likely intended to prepend a new rule, which is functionally acceptable.
  • Deletion of endpoints-params.yaml and writeRelabelConfigs.yaml from production/base/monitoringstack and staging/base/monitoringstack: This is a good refactoring, moving the configurations directly into the base monitoringstack.yaml to simplify overlays.
  • components/monitoring/prometheus/development/monitoringstack/kustomization.yaml: Changing resource reference from staging to base is a simplification and ensures consistency.
  • components/monitoring/prometheus/staging/base/kustomization.yaml: The change in how monitoringstack is referenced in resources is a minor refactoring and appears functionally equivalent.
  • components/monitoring/prometheus/staging/stone-stg-rh01/kustomization.yaml and prometheusrule-uwm.yaml (deleted): This is an intentional removal of a specific PrometheusRule.
  • components/multi-platform-controller/base/kustomization.yaml: Image updates are standard.
  • components/release/development/kustomization.yaml: Image updates are standard.
  • components/vector-kubearchive-log-collector/staging/stone-stg-rh01/loki-helm-values.yaml: Adding a /tmp emptyDir volume for Loki is a minor operational change.

@testcara testcara force-pushed the konflux-system-alerts branch from 83f583c to ffbfbd5 Compare August 12, 2025 05:13
Copy link
Contributor

Code Review by Gemini

Here's a review of the provided code changes:

General Observations

  • The pull request title and body indicate a focus on "system notifications" and README improvements. However, the diff includes a significant number of changes unrelated to this core feature, such as updates to mintmaker, release-service, multi-platform-controller images, extensive Prometheus monitoring configuration changes, and removal of RBAC/OWNERS files for vector-kubearchive-log-collector.
  • Recommendation: For better reviewability and to minimize risk, it's generally recommended to separate unrelated changes into distinct pull requests. This allows each PR to be reviewed for its specific purpose without being diluted by other concerns.

Detailed Review

1. new file .github/workflows/validate-notification.yaml

  • Suggestion: The yq installation uses sudo wget directly. While common in GitHub Actions, for enhanced security and maintainability, consider using a dedicated action if available (e.g., mikefarah/yq-action) or a more robust installation method if this becomes a pattern for multiple tools. For this specific case, it's acceptable.

2. components/konflux-info/README.md

  • Bug: An internal link is incorrect.
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -39,7 +39,7 @@
     **How to fix validation errors?**
     
     - Review the error message in the PR checks.
    - - Compare your changes with the [schema](./banner-schema.json) and [examples in README](#usage-scenarios--examples).
    + - Compare your changes with the [schema](./banner-schema.json) and [examples in README](#UsageScenariosExamples).
     
     ## ✅ Banner Content Specification
     
    Suggested change:
    --- a/components/konflux-info/README.md
    +++ b/components/konflux-info/README.md
    @@ -39,7 +39,7 @@
     **How to fix validation errors?**
     
     - Review the error message in the PR checks.
    - - Compare your changes with the [schema](./banner-schema.json) and [examples in README](#usage-scenarios--examples).
    + - Compare your changes with the [schema](./banner-schema.json) and [examples in README](#UsageScenariosExamples).
     
     ## ✅ Banner Content Specification
     

3. components/konflux-info/base/rbac.yaml

  • Bug (Security Issue): The konflux-public-info-view-role is broadened to allow get access to all configmaps in the namespace, instead of only specific ones. This is a significant security regression. Roles should adhere to the principle of least privilege.
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -5,9 +5,6 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
     ---
    Suggested change: Revert this change and explicitly list the resourceNames for the configmaps that the UI needs to read, or use a label selector if multiple configmaps with a specific label need to be read.
    --- a/components/konflux-info/base/rbac.yaml
    +++ b/components/konflux-info/base/rbac.yaml
    @@ -5,9 +5,9 @@
     rules:
       - verbs:
           - get
         apiGroups:
           - ''
    -    resourceNames:
    -      - konflux-public-info
    -      - konflux-banner-configmap
         resources:
           - configmaps
    +    resourceNames:
    +      - konflux-public-info
    +      - konflux-banner-configmap
    +      - konflux-system-notification-configmap-1 # Add any new specific configmap names here
     ---
    Alternatively, if the intent is to allow reading any configmap with the konflux.system.notification: "true" label, you would need a different approach, possibly using a ClusterRole with a resourceNames or labelSelector if supported by your Kubernetes version and RBAC setup for ConfigMaps. Given the current setup, explicitly listing is safer.

4. components/konflux-info/notification-schema.json

  • Suggestion: The activeTimestamp pattern ^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z$ is very strict and only allows YYYY-MM-DDTHH:mm:ssZ. If timestamps with milliseconds (e.g., 2025-08-12T14:30:00.000Z) or other valid ISO 8601 timezone offsets (e.g., +01:00) are ever expected, this regex will cause validation failures. Consider if a more flexible ISO 8601 regex is needed, or confirm that only this exact format will be used.
    --- /dev/null
    +++ b/components/konflux-info/notification-schema.json
    @@ -0,0 +1,37 @@
    +{
    ...
    +        "activeTimestamp": {
    +          "type": "string",
    +          "pattern": "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z$",
    +          "description": "ISO 8601 timestamp(e.g., 2025-08-11T11:08:17Z) when the notification becomes active. Omit for the resource’s creationTimestamp."
    +        }
    ...
    Suggested change (if more flexibility is desired):
    "pattern": "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:Z|[+-]\\d{2}:\\d{2})?$",
    This regex allows optional milliseconds (.d+) and optional timezone offsets (Z or +HH:MM/-HH:MM).

5. components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml

  • Suggestion: The writeRelabelConfigs and params sections are now directly embedded. This is a good consolidation. No specific issues, but it's a large block of changes.
    --- a/components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
    +++ b/components/monitoring/prometheus/base/monitoringstack/monitoringstack.yaml
    @@ -35,21 +35,37 @@
           secret:
             key: client-id
             name: rhobs
         clientSecret:
           key: client-secret
           name: rhobs
         endpointParams:
           audience: # added by overlays
         tokenUrl: https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
       url: # added by overlays
    -      # writeRelabelConfigs:  # added by overlays
    +      writeRelabelConfigs:
    +      - action: LabelKeep
    +        regex: "__name__|source_environment|source_cluster|namespace|app|pod|container|\
    +          label_pipelines_appstudio_openshift_io_type|health_status|dest_namespace|\
    +          controller|service|reason|phase|type|resource|resourcequota|le|app|image|\
    +          commit_hash|job|operation|tokenName|rateLimited|state|persistentvolumeclaim|\
    +          storageclass|volumename|release_reason|instance|result|deployment_reason|\
    +          validation_reason|strategy|succeeded|target|name|method|code|sp|le|\
    +          unexpected_status|failure|hostname|label_app_kubernetes_io_managed_by|status|\
    +          pipeline|pipelinename|pipelinerun|schedule|check|grpc_service|grpc_code|\
    +          grpc_method|lease|lease_holder|deployment|platform|mode|cpu|role|node|kind|\
    +          verb|request_kind|tested_cluster|resource_type|exported_job|http_method|\
    +          http_route|http_status_code|gin_errors|rule_result|rule_execution_cause|\
    +          policy_name|policy_background_mode|rule_type|policy_type|policy_validation_mode|\
    +          resource_request_operation|resource_kind|policy_change_type|event_type"
    +          
    +
    ---
    ...
    +  - params:
    +      'match[]': # scrape only required metrics from in-cluster prometheus
    +        - '{__name__="pipeline_service_schedule_overhead_percentage_sum"}'
    ...
    No specific changes needed here, but it's a large block of new configuration.

6. components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml

  • Suggestion: Changing op: replace to op: add for a patch at index 0 (/spec/endpoints/0/relabelings/0) implies that there was no existing relabeling at that index, or that you intend to add a new one at the beginning of the list. If there was an existing relabeling at index 0, this would result in two entries instead of replacing one. Given the base monitoringstack.yaml doesn't define relabelings at that specific path, add is likely the correct operation to insert it.
    --- a/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    +++ b/components/monitoring/prometheus/development/monitoringstack/cluster-type-patch.yaml
    @@ -1,6 +1,6 @@
     ---
    -- op: replace
    +- op: add
       path: /spec/endpoints/0/relabelings/0
       value:
         targetLabel: source_environment
         replacement: development-cluster
    No specific change needed if this behavior is intended.

7. components/tracing/otel-collector/staging/otel-collector-helm-values.yaml

  • Suggestion: The removal of the prometheus exporter and setting metrics: null means the OpenTelemetry collector will no longer expose its own metrics via a Prometheus endpoint. This is a significant change to the observability setup. Confirm that these metrics are no longer needed or are being collected by an alternative mechanism.
    --- a/components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
    +++ b/components/tracing/otel-collector/staging/otel-collector-helm-values.yaml
    @@ -1,22 +1,20 @@
     config:
       exporters:
         sapm:
           endpoint: "https://ingest.us1.signalfx.com/v2/trace"
           access_token: ${env:SIGNALFX_API_TOKEN}
         signalfx:
           endpoint: "https://api.signalfx.com/v2/traces"
           realm: "us1"
           api_url: "https://api.us1.signalfx.com"
           access_token: ${env:SIGNALFX_API_TOKEN} 
    -    prometheus:
    -      endpoint: 0.0.0.0:8889
       extensions:
         # The health_check extension is mandatory for this chart.
         # Without the health_check extension the collector will fail the readiness and liveliness probes.
         # The health_check extension can be modified, but should never be removed.
         health_check: {}
       processors:
         attributes/stage:
           actions:
             - key: deployment.environment
               value: stage
    @@ -56,14 +54,7 @@
           - attributes/collector-info
           - batch
           - attributes/stage
         receivers:
           - otlp
    -      metrics:
    -        receivers:
    -        - otlp
    -        processors:
    -        - memory_limiter
    -        - batch
    -        exporters:
    -        - prometheus
    -    telemetry:
    -      metrics:
    -        level: basic
    -        readers:
    -        - pull:
    -            exporter: prometheus
    -            host: 0.0.0.0
    -            port: 8888
    -
    +      metrics: null
     # Configuration for ports
     ports:
       otlp:
         enabled: true
         containerPort: 4317
    No specific change needed if this is an intentional architectural decision.

8. components/vector-kubearchive-log-collector/OWNERS (deleted) and components/vector-kubearchive-log-collector/base/rbac.yaml (deleted)

  • Suggestion: The deletion of the OWNERS file and the rbac.yaml for kubearchive-logging implies that ownership and RBAC for this component are now managed elsewhere or are no longer needed. Confirm that these changes are intentional and that the necessary permissions for the vector-kubearchive-log-collector are still in place through other means (e.g., cluster-wide roles, or if the component is no longer deployed).
    --- a/components/vector-kubearchive-log-collector/OWNERS
    +++ /dev/null
    @@ -1,17 +0,0 @@
    ...
    --- a/components/vector-kubearchive-log-collector/base/rbac.yaml
    +++ /dev/null
    @@ -1,28 +0,0 @@
    ...
    No specific change needed if this is an intentional refactoring of ownership/RBAC.

Overall, the core "system notifications" feature seems well-implemented with a new workflow, schema, and README updates. The primary concern is the security regression in rbac.yaml and the bundling of many unrelated changes.

@testcara
Copy link
Contributor Author

/retest

1 similar comment
@testcara
Copy link
Contributor Author

/retest

Copy link

openshift-ci bot commented Aug 12, 2025

@testcara: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/konflux-e2e-v416-optional ffbfbd5 link false /test konflux-e2e-v416-optional
ci/prow/appstudio-e2e-tests ffbfbd5 link true /test appstudio-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants