Skip to content

Conversation

tjungblu
Copy link
Contributor

@tjungblu tjungblu commented Oct 9, 2025

No description provided.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 9, 2025
@openshift-ci-robot
Copy link

@tjungblu: This pull request explicitly references no jira issue.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from dusk125 and ironcladlou October 9, 2025 12:39
Copy link

coderabbitai bot commented Oct 9, 2025

Walkthrough

Adjusts etcd alert thresholds and definitions, adds a new etcdHighFsyncDurations alert, removes older fsync alerts from the PrometheusRule, updates the exclusion list to skip the new alert in jsonnet processing, and bumps two dependency versions in jsonnetfile.lock.json.

Changes

Cohort / File(s) Summary of Changes
Prometheus alerts: etcd thresholds and rules
jsonnet/custom.libsonnet, manifests/0000_90_etcd-operator_03_prometheusrule.yaml
Lowered etcdHighCommitDurations threshold from >0.5 to >0.025. Added new etcdHighFsyncDurations alert (>0.010, 10m, critical, with runbook/annotations). Removed older fsync alert variants from the PrometheusRule.
Jsonnet wiring: alert exclusions
jsonnet/main.jsonnet
Added 'etcdHighFsyncDurations' to excludedAlerts list; no other logic changes.
Dependencies lockfile updates
jsonnet/jsonnetfile.lock.json
Updated two dependency version SHAs; no other fields or sums changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning No description was provided, leaving reviewers without context or rationale and failing to describe any part of the changeset. Please add a brief summary of the changes and their intent, such as why threshold values were adjusted and how the new alert will improve monitoring.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title concisely describes adding recommended etcd alert thresholds, which aligns with the changes to commit and fsync duration alerts, but it includes a non-descriptive “NO-JIRA:” prefix that does not contribute meaningful context.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0357803 and f2f7037.

📒 Files selected for processing (4)
  • jsonnet/custom.libsonnet (2 hunks)
  • jsonnet/jsonnetfile.lock.json (2 hunks)
  • jsonnet/main.jsonnet (1 hunks)
  • manifests/0000_90_etcd-operator_03_prometheusrule.yaml (1 hunks)
🔇 Additional comments (6)
jsonnet/main.jsonnet (1)

9-9: LGTM! Correctly excludes the new custom alert.

The addition of etcdHighFsyncDurations to the excludedAlerts list is correct and necessary, as this alert is now being defined locally in custom.libsonnet rather than imported from the upstream etcd mixin.

manifests/0000_90_etcd-operator_03_prometheusrule.yaml (2)

107-107: Consistent with jsonnet definition.

The threshold change matches the definition in jsonnet/custom.libsonnet line 64. See the review comments on that file regarding verification of this aggressive threshold value.


111-119: Consistent with jsonnet definition.

This alert definition matches jsonnet/custom.libsonnet lines 74-86. The runbook URL is correctly included for this critical alert. See the review comments on that file regarding verification of the 10ms threshold.

jsonnet/custom.libsonnet (2)

74-86: Threshold aligns with etcd docs; verify testing and severity

  • 0.010 s p99 threshold matches etcd’s recommendation.
  • Confirm tests cover target storage types (local SSD, EBS, Azure Disk, etc.).
  • Validate that critical severity is justified (or consider warning).

64-64: Threshold matches etcd recommendation; validate in your environment

Matches etcd’s recommended p99 backend commit duration (< 25 ms). Confirm this threshold has been tested in representative environments and monitor for potential alert fatigue.

jsonnet/jsonnetfile.lock.json (1)

11-11: Confirm etcd & grafonnet bump includes etcd v3.5 thresholds: verify updated commits introduce the recommended 99th-percentile thresholds (commit > 0.25 s; fsync warning > 0.5 s, critical > 1 s) and document any changes.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2025
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

@tjungblu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node f2f7037 link true /test e2e-aws-ovn-single-node
ci/prow/unit f2f7037 link true /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants