Skip to content

Failure when acquiring leadership #2032

@fanat12k

Description

@fanat12k

We have encountered the same problem described in this issue.

Problem description

We are seeing frequent 409 Conflict errors during leader election when multiple pods try to update the same ConfigMap. However, the more critical issue is the following:

One pod receives a 409 Conflict when attempting to update the ConfigMap, but it still believes it is the leader. It does not receive an OnRevokedEvent, and continues operating as if it still holds leadership.

As a result, two pods can believe they are leaders at the same time.

This seems to occur during a race condition or watcher resynchronization (possibly due to Fabric8 client reconnecting). The pod that failed the update appears to miss the leadership change notification.

Questions

  1. Why do 409 Conflict errors happen so frequently, even with just two replicas?
  2. Is it expected that a pod might miss the OnRevokedEvent after a watcher reconnect or conflict?
  3. What is the recommended approach to ensure pods always receive correct leadership state updates?
    We’re happy to provide logs or a reproducible test case if needed.

Environment

  • Spring Boot: 3.5.5
  • spring-cloud-kubernetes-fabric8-leader: 3.3.0
  • Kubernetes: v1.33.4-gke.1172000

Number of replicas: 2

Image Image

Thanks in advance! Let us know if we should open a new issue for this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions