High-Performance Redis-Compatible Server

A partitioned, multi-threaded in-memory storage engine engineered in Go. This project demonstrates a transition from monolithic locking to a Shared-Nothing Sharded Architecture, capable of processing over 15 Million requests in a single 30-second stress test.

🛠️ Supported Commands

The following commands are implemented using the RESP (Redis Serialization Protocol) and are routed via the sharding layer to ensure thread safety and high concurrency.

Strings

SET key value [PX ms]: Stores a string value with optional millisecond TTL (Time-To-Live). Complexity: .
GET key: Retrieves the string value of a key, handling lazy expiration if a TTL was set. Complexity: .

Lists

LPUSH key val [val ...]: Prepends one or more values to the head of the list. Complexity: .
RPUSH key val [val ...]: Appends one or more values to the tail of the list. Complexity: .
LPOP key [count]: Removes and returns the first element of the list, or the specified count of elements. Complexity: for single, for count.
LLEN key: Returns the current length of the list. Complexity: .
LRANGE key start stop: Returns a range of elements. Uses a bidirectional walk to start from the Head or Tail based on which is closer to the index. Complexity: .

System

PING: Returns PONG; used for connection heartbeats and latency testing.
ECHO message: Returns the provided message back to the client.

🚀 Architectural Evolution

This server was optimized through three distinct phases, shifting the bottleneck from algorithmic complexity to hardware limits:

Phase 1 (Slices): Initial implementation using Go slices. Proved inefficient for LPUSH ( prepending).
Phase 2 (DLL): Migrated to Doubly Linked Lists for writes. Hit a throughput ceiling due to Global Mutex contention.
Phase 4 (Sharding): Partitioned state into 64 independent silos using FNV-1a hashing. This unlocked multi-core scaling and hit the current performance peaks.

📊 Performance Benchmarks (The Raw Data)

Environment: 11th Gen Intel Core i5-11300H (8 Threads) @ 3.10GHz | 50 Concurrent Clients Stress Test: 30s -benchtime per command.

Current Phase: 64-Shard Results

Operation	Raw Throughput (No Race)	Throughput with Race Detector	Latency (ns/op)
SET (Write)	386,191 req/s	324,998 req/s	2,589 ns
GET (Read)	294,673 req/s	255,331 req/s	3,394 ns
PING (Atomic)	305,666 req/s	259,119 req/s	3,272 ns
LPUSH (List Write)	275,975 req/s	249,143 req/s	3,623 ns
LRANGE (List Read)	60,833 req/s	100,826 req/s*	16,438 ns

Total Requests Processed: Hit a peak of 15,794,269 requests for SET operations in a single session.

Comparative Progress (Evolutionary Data)

This table tracks the performance delta as the architecture evolved from a single-lock to a 64-shard model.

Strategy	SET Peak (req/s)	% Change (SET)	LPUSH Peak (req/s)	% Change (LPUSH)	LRANGE Peak (req/s)	% Change (LRANGE)
Phase 1: Slices	~429,523	-	~4,100	-	~167,000	-
Phase 2: DLL	~431,000	+0.3%	~260,000	+6,241%	~172,000	+3.0%
Phase 3: 32 Shards	~349,013	-19.0%	~275,226	+5.8%	~60,424	-64.8%
Phase 4: 64 Shards	386,191	+10.6%	289,499	+5.1%	60,833	+0.7%

🔍 Engineering Insights

1. The Sharding Breakthrough

By replacing the global mutex with 64 localized mutexes, we successfully moved the bottleneck from Software (Lock Contention) to Hardware (Memory Bandwidth). In Phase 4, the CPU finally operates on all 8 logical threads without thread-suspension stalls.

2. The "Memory Wall" in LRANGE

The data shows a significant throughput regression for LRANGE compared to Phase 2.

The Cause: Sharding fragments memory maps. In a Doubly Linked List, the CPU must "chase pointers." When these pointers are scattered across 64 different shard maps, the L3 Cache hits a high miss rate.
The Metric: A latency of 16,438 ns/op is the physical signature of the CPU waiting for RAM (approx. 100ns per hop) because the data is not in the cache.

3. Bitwise Routing Optimization

To keep the "Router" fast, we avoided the expensive mod (%) operator. By utilizing a power-of-two shard count, we calculate indices using a Bitwise AND mask:

index := hash & (shardCount - 1) // hash & 63

This reduces routing costs from ~30 CPU cycles to 1 cycle.

🛠️ Usage

1. Build and Run

go build -o redis-server cmd/main.go
./redis-server

2. Standard Redis CLI

# In a separate terminal
redis-cli -p 6379 LPUSH mylist "item1" "item2"
redis-cli -p 6379 LRANGE mylist 0 -1

🧪 Testing

Run Thread-Safety Validation: go test -race -v ./db ./cmd/

Run High-Volume Benchmarks: go test -v cmd/benchmark_test.go -bench=. -benchtime=30s -benchmem

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
cmd		cmd
command		command
db		db
resp		resp
server		server
.gitattributes		.gitattributes
README.md		README.md
go.mod		go.mod
go.sum		go.sum
load_test.js		load_test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Performance Redis-Compatible Server

🛠️ Supported Commands

Strings

Lists

System

🚀 Architectural Evolution

📊 Performance Benchmarks (The Raw Data)

Current Phase: 64-Shard Results

Comparative Progress (Evolutionary Data)

🔍 Engineering Insights

1. The Sharding Breakthrough

2. The "Memory Wall" in LRANGE

3. Bitwise Routing Optimization

🛠️ Usage

1. Build and Run

2. Standard Redis CLI

🧪 Testing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

faisal-990/redis-go

Folders and files

Latest commit

History

Repository files navigation

High-Performance Redis-Compatible Server

🛠️ Supported Commands

Strings

Lists

System

🚀 Architectural Evolution

📊 Performance Benchmarks (The Raw Data)

Current Phase: 64-Shard Results

Comparative Progress (Evolutionary Data)

🔍 Engineering Insights

1. The Sharding Breakthrough

2. The "Memory Wall" in LRANGE

3. Bitwise Routing Optimization

🛠️ Usage

1. Build and Run

2. Standard Redis CLI

🧪 Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages