A partitioned, multi-threaded in-memory storage engine engineered in Go. This project demonstrates a transition from monolithic locking to a Shared-Nothing Sharded Architecture, capable of processing over 15 Million requests in a single 30-second stress test.
The following commands are implemented using the RESP (Redis Serialization Protocol) and are routed via the sharding layer to ensure thread safety and high concurrency.
SET key value [PX ms]: Stores a string value with optional millisecond TTL (Time-To-Live). Complexity: .GET key: Retrieves the string value of a key, handling lazy expiration if a TTL was set. Complexity: .
LPUSH key val [val ...]: Prepends one or more values to the head of the list. Complexity: .RPUSH key val [val ...]: Appends one or more values to the tail of the list. Complexity: .LPOP key [count]: Removes and returns the first element of the list, or the specified count of elements. Complexity: for single, for count.LLEN key: Returns the current length of the list. Complexity: .LRANGE key start stop: Returns a range of elements. Uses a bidirectional walk to start from the Head or Tail based on which is closer to the index. Complexity: .
PING: ReturnsPONG; used for connection heartbeats and latency testing.ECHO message: Returns the provided message back to the client.
This server was optimized through three distinct phases, shifting the bottleneck from algorithmic complexity to hardware limits:
- Phase 1 (Slices): Initial implementation using Go slices. Proved inefficient for
LPUSH( prepending). - Phase 2 (DLL): Migrated to Doubly Linked Lists for writes. Hit a throughput ceiling due to Global Mutex contention.
- Phase 4 (Sharding): Partitioned state into 64 independent silos using FNV-1a hashing. This unlocked multi-core scaling and hit the current performance peaks.
Environment: 11th Gen Intel Core i5-11300H (8 Threads) @ 3.10GHz | 50 Concurrent Clients
Stress Test: 30s -benchtime per command.
| Operation | Raw Throughput (No Race) | Throughput with Race Detector | Latency (ns/op) |
|---|---|---|---|
| SET (Write) | 386,191 req/s | 324,998 req/s | 2,589 ns |
| GET (Read) | 294,673 req/s | 255,331 req/s | 3,394 ns |
| PING (Atomic) | 305,666 req/s | 259,119 req/s | 3,272 ns |
| LPUSH (List Write) | 275,975 req/s | 249,143 req/s | 3,623 ns |
| LRANGE (List Read) | 60,833 req/s | 100,826 req/s* | 16,438 ns |
Total Requests Processed: Hit a peak of 15,794,269 requests for
SEToperations in a single session.
This table tracks the performance delta as the architecture evolved from a single-lock to a 64-shard model.
| Strategy | SET Peak (req/s) | % Change (SET) | LPUSH Peak (req/s) | % Change (LPUSH) | LRANGE Peak (req/s) | % Change (LRANGE) |
|---|---|---|---|---|---|---|
| Phase 1: Slices | ~429,523 | - | ~4,100 | - | ~167,000 | - |
| Phase 2: DLL | ~431,000 | +0.3% | ~260,000 | +6,241% | ~172,000 | +3.0% |
| Phase 3: 32 Shards | ~349,013 | -19.0% | ~275,226 | +5.8% | ~60,424 | -64.8% |
| Phase 4: 64 Shards | 386,191 | +10.6% | 289,499 | +5.1% | 60,833 | +0.7% |
By replacing the global mutex with 64 localized mutexes, we successfully moved the bottleneck from Software (Lock Contention) to Hardware (Memory Bandwidth). In Phase 4, the CPU finally operates on all 8 logical threads without thread-suspension stalls.
The data shows a significant throughput regression for LRANGE compared to Phase 2.
- The Cause: Sharding fragments memory maps. In a Doubly Linked List, the CPU must "chase pointers." When these pointers are scattered across 64 different shard maps, the L3 Cache hits a high miss rate.
- The Metric: A latency of 16,438 ns/op is the physical signature of the CPU waiting for RAM (approx. 100ns per hop) because the data is not in the cache.
To keep the "Router" fast, we avoided the expensive mod (%) operator. By utilizing a power-of-two shard count, we calculate indices using a Bitwise AND mask:
index := hash & (shardCount - 1) // hash & 63This reduces routing costs from ~30 CPU cycles to 1 cycle.
go build -o redis-server cmd/main.go
./redis-server
# In a separate terminal
redis-cli -p 6379 LPUSH mylist "item1" "item2"
redis-cli -p 6379 LRANGE mylist 0 -1
Run Thread-Safety Validation:
go test -race -v ./db ./cmd/
Run High-Volume Benchmarks:
go test -v cmd/benchmark_test.go -bench=. -benchtime=30s -benchmem