fix: set leader node only when cache is warmed #1369

dkarpele · 2025-11-26T21:27:12Z

Summary by CodeRabbit

Chores
- Added leader-awareness to readiness: readiness now depends on leadership and cache warm state, so only the elected leader requires a warmed cache to report ready.
- Exposed a signal indicating the component should run only on the leader replica.
Tests
- Added coverage exercising readiness behavior across leader/non-leader and warmed/not-warmed scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: dkarpele <[email protected]>

coderabbitai · 2025-11-26T21:27:21Z

Walkthrough

Adds leader-awareness to startup and readiness: an atomic leader flag set when the manager elects this process; readiness now passes for non-leaders immediately but requires the leader to have a warmed cache; WebhookServerRunnable gains NeedLeaderElection() to indicate it should run only on the leader.

Changes

Cohort / File(s)	Summary
Leader-aware startup & readiness `cmd/run.go`	Adds an atomic `isLeader` flag and a goroutine that sets it when `mgr.Elected()` fires; changes readiness check to return healthy for non-leaders but require cache warm for the leader; adds `func (ws *WebhookServerRunnable) NeedLeaderElection() bool`.
Readiness tests `cmd/run_test.go`	Adds `TestLeadershipAwareReadyzCheck` covering leader vs non-leader and warmed vs not-warmed scenarios, using an atomic flag and WarmupStatus to simulate states.

Sequence Diagram(s)

sequenceDiagram
    participant Pod as Pod Replica
    participant Manager as Controller Manager
    participant Readiness as Readiness Probe
    participant Cache as Cache Warmup

    Note over Pod,Manager: Pod startup

    Manager->>Manager: attempt leader election
    alt acquires lease
        Manager-->>Pod: mgr.Elected() event
        Pod->>Pod: set isLeader = true
    else loses election
        Pod->>Pod: isLeader remains false
    end

    Note over Pod,Readiness: readiness checks run
    Pod->>Readiness: probe request
    alt isLeader == false
        Readiness-->>Pod: healthy (non-leader)
    else isLeader == true
        Readiness->>Cache: check warmed
        alt cache warmed
            Readiness-->>Pod: healthy (leader + warmed)
        else cache not warmed
            Readiness-->>Pod: unhealthy (leader + not warmed)
        end
    end

Pre-merge checks

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: introducing leader-awareness to the readiness check so that only warmed-up leader nodes are marked ready.
Linked Issues check	✅ Passed	The changes address the core issue by making readiness dependent on leadership status and cache warmup state, preventing unhealthy pods from being marked ready and allowing leader election to proceed properly.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the health probe and leader election issues: adding leadership detection, modifying readiness logic, exposing the NeedLeaderElection method, and testing the new behavior.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74db446 and 98eff76.

📒 Files selected for processing (1)

cmd/run_test.go (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

cmd/run_test.go (1)

cmd/run.go (1)

WarmupStatus (342-345)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Ensure code is correctly linted
GitHub Check: Ensure unit tests are passing
GitHub Check: build_image
GitHub Check: Analyze (go)

🔇 Additional comments (2)

cmd/run_test.go (2)

10-10: LGTM!

The sync/atomic import is necessary for the atomic.Bool type used in the new test.

345-408: I'm unable to access the repository directly due to cloning issues. Let me analyze the review comment based on the information provided and standard software engineering practices.

Looking at the review comment structure:

Key Points from the Review:

The test uses an inline readinessCheck closure (lines 391-396) that duplicates logic

Comment at line 390 says "This is the readiness check logic from cmd/run.go"

Codecov reports 0% patch coverage for cmd/run.go (6 missing lines)

Suggests extracting logic to a standalone testable function

Notes this is a critical bug fix

Analysis:

The review's concern about testing the actual production code is valid from a software engineering perspective - IF the production code exists and is separately callable. However, the review makes several claims that cannot be verified without access to the actual codebase:

Whether production code implements identical logic

Whether the function can be extracted and made testable

The actual coverage metrics from Codecov

Whether the test/production logic has actually diverged

The core issue: The review suggests a refactoring without confirming:

That production code implements the same logic

That extracting it is feasible given the broader codebase structure

That this is necessary for the bug fix to work correctly

Since I cannot verify the production code or the actual state of cmd/run.go, I need to mark this for verification:

The inline readiness check logic should be extracted from cmd/run.go and tested directly rather than duplicated.

The test structure and logic are sound, covering all scenarios comprehensively. However, the comment at line 390 ("This is the readiness check logic from cmd/run.go") indicates the logic exists in production code but is duplicated here inline. To verify this concern:

Confirm the readiness check in cmd/run.go implements if isLeader.Load() && !status.isCacheWarmed.Load()

Extract this into a standalone, exported function

Import and call it directly in the test to improve coverage and prevent divergence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2025-11-26T21:29:49Z

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.72%. Comparing base (73263e7) to head (98eff76).

Files with missing lines	Patch %	Lines
cmd/run.go	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1369      +/-   ##
==========================================
- Coverage   70.80%   70.72%   -0.08%     
==========================================
  Files          49       49              
  Lines        4528     4533       +5     
==========================================
  Hits         3206     3206              
- Misses       1125     1130       +5     
  Partials      197      197

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Assisted-by: Gemini AI Signed-off-by: dkarpele <[email protected]>

fix: set leader node only when cache is warmed

74db446

Signed-off-by: dkarpele <[email protected]>

dkarpele requested review from chengfang, ishitasequeira, jannfis and pasha-codefresh as code owners November 26, 2025 21:27

dkarpele marked this pull request as draft November 26, 2025 21:35

test: TestLeadershipAwareReadyzCheck

98eff76

Assisted-by: Gemini AI Signed-off-by: dkarpele <[email protected]>

dkarpele marked this pull request as ready for review November 27, 2025 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: set leader node only when cache is warmed #1369

fix: set leader node only when cache is warmed #1369

Uh oh!

dkarpele commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Nov 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: set leader node only when cache is warmed #1369

Are you sure you want to change the base?

fix: set leader node only when cache is warmed #1369

Uh oh!

Conversation

dkarpele commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Pre-merge checks

Uh oh!

codecov-commenter commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dkarpele commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

codecov-commenter commented Nov 26, 2025 •

edited

Loading