system-design/notes/leader-election.md at main · sajidhamdard/system-design

Leader Election is a distributed systems concept.

👉 In a cluster of distributed servers (like database replicas, job schedulers, or coordination services), you often need one node to act as the “leader” to coordinate work, while others act as followers.

The leader decides things (e.g., assigns tasks, handles writes, maintains consistency).
The followers replicate the state and can take over if the leader fails.

Why is leader election needed?

To avoid conflicts (e.g., two nodes writing at the same time).
To ensure consistency and avoid "split-brain" problems.
To automatically recover from failures (if the leader crashes, a new one is elected).

Tools commonly used

Zookeeper: Provides primitives like ephemeral znodes to implement leader election.
etcd (used by Kubernetes): Provides distributed consensus using Raft protocol for leader election.

Example Scenarios

Database clusters (e.g., MongoDB, Elasticsearch) → one primary (leader), many replicas.
Kubernetes control plane → one leader kube-controller-manager, others standby.
Distributed job schedulers → only leader assigns jobs to avoid duplication.

➡️ Trade-off: Leader election introduces some latency and complexity but is critical for high availability and fault tolerance in distributed systems.

Leader election can be done in different ways depending on the system. Two of the most common approaches are:

🔹 1. Zookeeper-based Leader Election

Zookeeper is a distributed coordination service. It doesn’t elect a leader automatically, but it provides primitives to build it.

How it works (using Ephemeral Sequential Znodes):

All candidate servers (say S1, S2, S3, S4) connect to Zookeeper.
Each one creates an ephemeral sequential znode under /election.
- Example:
```
/election/node_0001
/election/node_0002
/election/node_0003
```
Zookeeper assigns numbers automatically (like tickets).
The server with the lowest sequence number becomes the leader.
If the leader crashes, its ephemeral znode is deleted automatically.
The next lowest node becomes the new leader.

👉 This ensures only one leader at a time and automatic failover.

🔹 2. Consensus Protocol (Raft, Paxos, etc.)

This is used by etcd, Consul, Kubernetes, and modern databases.

How Raft Leader Election Works:

Nodes start in Follower state.
If a follower doesn’t hear from a leader within a timeout, it becomes a Candidate.
The candidate requests votes from other nodes.
If it gets majority votes, it becomes the Leader.
The leader periodically sends heartbeats to followers to prevent new elections.
If the leader fails, the process repeats.

👉 Advantage:

Ensures strong consistency (via majority vote).
Works even in network partitions (as long as majority available).
Fully automatic and robust (better than manually using Zookeeper primitives).

✅ Comparison

Aspect	Zookeeper (Ephemeral Znodes)	Raft/etcd
Simplicity	Simpler, app handles leader logic	Built-in consensus protocol
Consistency	Needs careful design	Strong consistency (via majority votes)
Recovery	Ephemeral node deletion triggers new leader	Automatic re-election
Use Cases	Custom job schedulers, Hadoop, Kafka	Kubernetes, databases, key-value stores

💡 Example in real systems

Kafka: Uses Zookeeper to elect a controller broker (leader).
Kubernetes: Uses etcd (Raft-based) for leader election.
MongoDB: Replica sets elect a primary using Raft-like consensus.

👉 In system design interviews, if you’re asked about leader election, you can mention:

Zookeeper (ephemeral znodes) → simpler, external service.
Raft/etcd → more robust, built-in consensus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is leader election needed?

Tools commonly used

Example Scenarios

🔹 1. Zookeeper-based Leader Election

How it works (using Ephemeral Sequential Znodes):

🔹 2. Consensus Protocol (Raft, Paxos, etc.)

How Raft Leader Election Works:

✅ Comparison

FilesExpand file tree

leader-election.md

Latest commit

History

leader-election.md

File metadata and controls

Why is leader election needed?

Tools commonly used

Example Scenarios

🔹 1. Zookeeper-based Leader Election

How it works (using Ephemeral Sequential Znodes):

🔹 2. Consensus Protocol (Raft, Paxos, etc.)

How Raft Leader Election Works:

✅ Comparison