Skip to content

Conversation

@dkarpele
Copy link
Collaborator

@dkarpele dkarpele commented Nov 26, 2025

Closes #1368

Summary by CodeRabbit

  • Chores
    • Added leader-awareness to readiness: readiness now depends on leadership and cache warm state, so only the elected leader requires a warmed cache to report ready.
    • Exposed a signal indicating the component should run only on the leader replica.
  • Tests
    • Added coverage exercising readiness behavior across leader/non-leader and warmed/not-warmed scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

Adds leader-awareness to startup and readiness: an atomic leader flag set when the manager elects this process; readiness now passes for non-leaders immediately but requires the leader to have a warmed cache; WebhookServerRunnable gains NeedLeaderElection() to indicate it should run only on the leader.

Changes

Cohort / File(s) Summary
Leader-aware startup & readiness
cmd/run.go
Adds an atomic isLeader flag and a goroutine that sets it when mgr.Elected() fires; changes readiness check to return healthy for non-leaders but require cache warm for the leader; adds func (ws *WebhookServerRunnable) NeedLeaderElection() bool.
Readiness tests
cmd/run_test.go
Adds TestLeadershipAwareReadyzCheck covering leader vs non-leader and warmed vs not-warmed scenarios, using an atomic flag and WarmupStatus to simulate states.

Sequence Diagram(s)

sequenceDiagram
    participant Pod as Pod Replica
    participant Manager as Controller Manager
    participant Readiness as Readiness Probe
    participant Cache as Cache Warmup

    Note over Pod,Manager: Pod startup

    Manager->>Manager: attempt leader election
    alt acquires lease
        Manager-->>Pod: mgr.Elected() event
        Pod->>Pod: set isLeader = true
    else loses election
        Pod->>Pod: isLeader remains false
    end

    Note over Pod,Readiness: readiness checks run
    Pod->>Readiness: probe request
    alt isLeader == false
        Readiness-->>Pod: healthy (non-leader)
    else isLeader == true
        Readiness->>Cache: check warmed
        alt cache warmed
            Readiness-->>Pod: healthy (leader + warmed)
        else cache not warmed
            Readiness-->>Pod: unhealthy (leader + not warmed)
        end
    end
Loading

Pre-merge checks

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing leader-awareness to the readiness check so that only warmed-up leader nodes are marked ready.
Linked Issues check ✅ Passed The changes address the core issue by making readiness dependent on leadership status and cache warmup state, preventing unhealthy pods from being marked ready and allowing leader election to proceed properly.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the health probe and leader election issues: adding leadership detection, modifying readiness logic, exposing the NeedLeaderElection method, and testing the new behavior.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74db446 and 98eff76.

📒 Files selected for processing (1)
  • cmd/run_test.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cmd/run_test.go (1)
cmd/run.go (1)
  • WarmupStatus (342-345)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Ensure code is correctly linted
  • GitHub Check: Ensure unit tests are passing
  • GitHub Check: build_image
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
cmd/run_test.go (2)

10-10: LGTM!

The sync/atomic import is necessary for the atomic.Bool type used in the new test.


345-408: I'm unable to access the repository directly due to cloning issues. Let me analyze the review comment based on the information provided and standard software engineering practices.

Looking at the review comment structure:

Key Points from the Review:

  1. The test uses an inline readinessCheck closure (lines 391-396) that duplicates logic
  2. Comment at line 390 says "This is the readiness check logic from cmd/run.go"
  3. Codecov reports 0% patch coverage for cmd/run.go (6 missing lines)
  4. Suggests extracting logic to a standalone testable function
  5. Notes this is a critical bug fix

Analysis:

The review's concern about testing the actual production code is valid from a software engineering perspective - IF the production code exists and is separately callable. However, the review makes several claims that cannot be verified without access to the actual codebase:

  • Whether production code implements identical logic
  • Whether the function can be extracted and made testable
  • The actual coverage metrics from Codecov
  • Whether the test/production logic has actually diverged

The core issue: The review suggests a refactoring without confirming:

  1. That production code implements the same logic
  2. That extracting it is feasible given the broader codebase structure
  3. That this is necessary for the bug fix to work correctly

Since I cannot verify the production code or the actual state of cmd/run.go, I need to mark this for verification:


The inline readiness check logic should be extracted from cmd/run.go and tested directly rather than duplicated.

The test structure and logic are sound, covering all scenarios comprehensively. However, the comment at line 390 ("This is the readiness check logic from cmd/run.go") indicates the logic exists in production code but is duplicated here inline. To verify this concern:

  1. Confirm the readiness check in cmd/run.go implements if isLeader.Load() && !status.isCacheWarmed.Load()
  2. Extract this into a standalone, exported function
  3. Import and call it directly in the test to improve coverage and prevent divergence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.72%. Comparing base (73263e7) to head (98eff76).

Files with missing lines Patch % Lines
cmd/run.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1369      +/-   ##
==========================================
- Coverage   70.80%   70.72%   -0.08%     
==========================================
  Files          49       49              
  Lines        4528     4533       +5     
==========================================
  Hits         3206     3206              
- Misses       1125     1130       +5     
  Partials      197      197              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dkarpele dkarpele marked this pull request as draft November 26, 2025 21:35
Assisted-by: Gemini AI
Signed-off-by: dkarpele <[email protected]>
@dkarpele dkarpele marked this pull request as ready for review November 27, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image Updater v1.0.1 stucks in "Running" with failing health checks / leader election never succeeds

2 participants