Skip to content

A multi-threaded, Key-Value store in Go. Features a custom RESP parser, fine-grained concurrency (RWMutex), and sustains 440k+ req/s in benchmarks

Notifications You must be signed in to change notification settings

faisal-990/redis-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


High-Performance Redis-Compatible Server

A partitioned, multi-threaded in-memory storage engine engineered in Go. This project demonstrates a transition from monolithic locking to a Shared-Nothing Sharded Architecture, capable of processing over 15 Million requests in a single 30-second stress test.

🛠️ Supported Commands

The following commands are implemented using the RESP (Redis Serialization Protocol) and are routed via the sharding layer to ensure thread safety and high concurrency.

Strings

  • SET key value [PX ms]: Stores a string value with optional millisecond TTL (Time-To-Live). Complexity: .
  • GET key: Retrieves the string value of a key, handling lazy expiration if a TTL was set. Complexity: .

Lists

  • LPUSH key val [val ...]: Prepends one or more values to the head of the list. Complexity: .
  • RPUSH key val [val ...]: Appends one or more values to the tail of the list. Complexity: .
  • LPOP key [count]: Removes and returns the first element of the list, or the specified count of elements. Complexity: for single, for count.
  • LLEN key: Returns the current length of the list. Complexity: .
  • LRANGE key start stop: Returns a range of elements. Uses a bidirectional walk to start from the Head or Tail based on which is closer to the index. Complexity: .

System

  • PING: Returns PONG; used for connection heartbeats and latency testing.
  • ECHO message: Returns the provided message back to the client.

🚀 Architectural Evolution

This server was optimized through three distinct phases, shifting the bottleneck from algorithmic complexity to hardware limits:

  1. Phase 1 (Slices): Initial implementation using Go slices. Proved inefficient for LPUSH ( prepending).
  2. Phase 2 (DLL): Migrated to Doubly Linked Lists for writes. Hit a throughput ceiling due to Global Mutex contention.
  3. Phase 4 (Sharding): Partitioned state into 64 independent silos using FNV-1a hashing. This unlocked multi-core scaling and hit the current performance peaks.

📊 Performance Benchmarks (The Raw Data)

Environment: 11th Gen Intel Core i5-11300H (8 Threads) @ 3.10GHz | 50 Concurrent Clients Stress Test: 30s -benchtime per command.

Current Phase: 64-Shard Results

Operation Raw Throughput (No Race) Throughput with Race Detector Latency (ns/op)
SET (Write) 386,191 req/s 324,998 req/s 2,589 ns
GET (Read) 294,673 req/s 255,331 req/s 3,394 ns
PING (Atomic) 305,666 req/s 259,119 req/s 3,272 ns
LPUSH (List Write) 275,975 req/s 249,143 req/s 3,623 ns
LRANGE (List Read) 60,833 req/s 100,826 req/s* 16,438 ns

Total Requests Processed: Hit a peak of 15,794,269 requests for SET operations in a single session.


Comparative Progress (Evolutionary Data)

This table tracks the performance delta as the architecture evolved from a single-lock to a 64-shard model.

Strategy SET Peak (req/s) % Change (SET) LPUSH Peak (req/s) % Change (LPUSH) LRANGE Peak (req/s) % Change (LRANGE)
Phase 1: Slices ~429,523 - ~4,100 - ~167,000 -
Phase 2: DLL ~431,000 +0.3% ~260,000 +6,241% ~172,000 +3.0%
Phase 3: 32 Shards ~349,013 -19.0% ~275,226 +5.8% ~60,424 -64.8%
Phase 4: 64 Shards 386,191 +10.6% 289,499 +5.1% 60,833 +0.7%

🔍 Engineering Insights

1. The Sharding Breakthrough

By replacing the global mutex with 64 localized mutexes, we successfully moved the bottleneck from Software (Lock Contention) to Hardware (Memory Bandwidth). In Phase 4, the CPU finally operates on all 8 logical threads without thread-suspension stalls.

2. The "Memory Wall" in LRANGE

The data shows a significant throughput regression for LRANGE compared to Phase 2.

  • The Cause: Sharding fragments memory maps. In a Doubly Linked List, the CPU must "chase pointers." When these pointers are scattered across 64 different shard maps, the L3 Cache hits a high miss rate.
  • The Metric: A latency of 16,438 ns/op is the physical signature of the CPU waiting for RAM (approx. 100ns per hop) because the data is not in the cache.

3. Bitwise Routing Optimization

To keep the "Router" fast, we avoided the expensive mod (%) operator. By utilizing a power-of-two shard count, we calculate indices using a Bitwise AND mask:

index := hash & (shardCount - 1) // hash & 63

This reduces routing costs from ~30 CPU cycles to 1 cycle.


🛠️ Usage

1. Build and Run

go build -o redis-server cmd/main.go
./redis-server

2. Standard Redis CLI

# In a separate terminal
redis-cli -p 6379 LPUSH mylist "item1" "item2"
redis-cli -p 6379 LRANGE mylist 0 -1

🧪 Testing

Run Thread-Safety Validation: go test -race -v ./db ./cmd/

Run High-Volume Benchmarks: go test -v cmd/benchmark_test.go -bench=. -benchtime=30s -benchmem


About

A multi-threaded, Key-Value store in Go. Features a custom RESP parser, fine-grained concurrency (RWMutex), and sustains 440k+ req/s in benchmarks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •