This project implements a production style Raft consensus protocol in Go. It includes leader election, log replication, crash safe persistence, and snapshot based log compaction, and uses Porcupine to check linearizability. It also implements a Raft backed replicated KV store. It also includes an observability layer which tracks metrics for the Raft and KV store. I have summarized my learnings in the blog post Agreeing under Chaos.
- Leader Election: Randomized timeouts
- Log Replication: Optimized replication with fast conflict resolution
- Persistence: Crash safe state and snapshot persistence with atomic writes
- Snapshots: Local log compaction with InstallSnapshot RPC for lagging followers
- Fault Tolerance: Handles network partitions, node crashes, and message loss/reordering
- Strong Consistency: Linearizable operations via state machine replication
- Observability: Metrics hooks for election, replication, persistence, and apply latency
The core Raft implementation provides:
- Leader Election: Heartbeats, RequestVote RPC, randomized election timeouts
- Log Replication: AppendEntries RPC with fast conflict resolution (XTerm/XIndex/XLen)
- Replication Engine: Per-peer replicator loops with trigger channels for immediate log pushes
- Persistence: Disk persistence with both synchronous and asynchronous persistence loops
- Snapshots:
Snapshot()andInstallSnapshotRPC for log compaction - State Machine Interface: Commits log entries via
ApplyMsgchannel - Observability: Metrics sink integration for tracing Raft activity
The KV service is built on top of Raft using state machine replication:
- Client/Server:
src/kvraft1/client.goandsrc/kvraft1/server.go - Apply path: Raft delivers committed commands and installed snapshots to the service via the
Raft
ApplyMsgchannel, the KV layer applies them deterministically. - Snapshots: The KV layer can snapshot its state. Raft persists the snapshot and uses it
to compact the log and to bring lagging peers up to date via
InstallSnapshot.
Implement optimization from the extended Raft paper, allowing leaders to back off nextIndex by entire terms instead of one entry at a time.
- Per-peer replication loops wake on heartbeats or explicit triggers
- RPC calls are bounded with a timeout to avoid stalled replication
- Pluggable metrics sink tracks elections, replication, persistence, and apply path latency
- Prometheus backed sink is available for dashboards and alerting
- Log entries are periodically compacted into snapshots
- Followers that fall behind receive snapshots via
InstallSnapshotRPC - The log is bounded, preventing unbounded growth
Persistent state (term, vote, log) and service snapshots are written through a Persister. Snapshot payloads include (index,term) metadata to support safe recovery and restart reconciliation.
- There are two persistence modes: Disk mode and Memory mode.
Memory mode is the default mode. Disk mode is enabled by setting the
PERSISTER_DISKenvironment variable to1.
- Decoupled I/O from the Raft: Persistence requests are queued and handled by a background loop so Raft can continue processing RPCs without blocking on disk.
- Synchronous persistence: Some transitions (e.g: term/vote changes and durable snapshot installation) use synchronous persistence to ensure correctness across crashes.
- Snapshot writes: Local snapshots are persisted asynchronously.
InstallSnapshotfrom the leader persists the received snapshot synchronously before queuing it to the state machine.
- Versioned writes:
Put(key, value, expectedVersion)provides optimistic concurrency control per key. - Client uncertainty: The client may return
ErrMaybewhen it cannot safely determine whether aPutapplied during retries/leader churn. - Snapshot driven compaction: The KV layer snapshots state and Raft uses it to compact the log and
catch up lagging replicas via
InstallSnapshot.
Note: This implementation is based on MIT 6.5840 (formerly 6.824) Distributed Systems course materials.
My Work: The entire core Raft implementation (
raft1/raft.go). KV store implementation (kvraft1/server.go), (kvraft1/client.go), and (kvraft1/rsm.go).Course-Provided:Memory mode tester (
tester1/), RPC framework (labrpc/), serialization library (labgob/), and test files. These are included for completeness but are not my original work.