Skip to content

nickemma/Lineradb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

LineraDB

Status Go Version Rust Version License CI

A Globally Distributed, Linearizable SQL Database Built From First Principles

One engineer. Zero shortcuts. Built from first principles.

Architecture β€’ Roadmap β€’ Contributing β€’ Design Doc


🎯 What is LineraDB?

LineraDB is an educational distributed SQL database built to understand how planet-scale systems work at the deepest level. Think of it as the distributed systems equivalent of writing your own compiler or operating system. A complete implementation that demonstrates mastery of:

  • Distributed consensus (Raft with leader leases)
  • Multi-region replication (active-active across cloud providers)
  • Linearizable transactions (strong consistency guarantees)
  • Custom storage engines (LSM trees in Rust)
  • Fault tolerance (chaos engineering, automatic failover)
  • Distributed SQL execution (parsing, planning, optimization)

⚠️ Important: LineraDB is not production-ready and is not intended to replace CockroachDB, Spanner, or PostgreSQL. It's a learning project that proves one person can build distributed infrastructure from scratch.


πŸš€ Status

Component Status Description
Project Structure βœ… Complete Modular architecture, CI/CD pipeline
Hybrid Logical Clock πŸ”„ In Progress Causal ordering & timestamping
Raft Consensus πŸ“‹ Planned Leader election, log replication, safety
Storage Engine πŸ“‹ Planned LSM tree with WAL in Rust
SQL Parser πŸ“‹ Planned SELECT, INSERT, UPDATE, DELETE, JOINs
Distributed Transactions πŸ“‹ Planned 2PC with MVCC and snapshot isolation
Sharding πŸ“‹ Planned Automatic partitioning and rebalancing
Multi-Region πŸ“‹ Planned Cross-region linearizable reads/writes
Observability πŸ“‹ Planned Prometheus, Grafana, OpenTelemetry
Chaos Engineering πŸ“‹ Planned Fault injection, partition testing

Current Milestone: Building foundational distributed systems primitives (HLC, Raft)


πŸ—οΈ Architecture

LineraDB follows a modular monolith architecture with clear domain boundaries:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     SQL Query Layer (Go)                 β”‚
β”‚              Parser β†’ Planner β†’ Optimizer β†’ Executor     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Transaction Coordinator (Go)                β”‚
β”‚           2PC β€’ MVCC β€’ Snapshot Isolation                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Raft Consensus  β”‚   Sharding       β”‚   Replication    β”‚
β”‚  (Go)            β”‚   (Go)           β”‚   (Go)           β”‚
β”‚  Leader Election β”‚   Consistent     β”‚   Cross-Region   β”‚
β”‚  Log Replication β”‚   Hashing        β”‚   Sync           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Storage Engine (Rust + Go FFI)              β”‚
β”‚           LSM Tree β€’ WAL β€’ Compaction β€’ Indexing         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Principles:

  • Hexagonal Architecture - Domain logic isolated from infrastructure
  • Domain-Driven Design - Clear bounded contexts per module
  • Contract-First - Protobuf definitions for inter-module communication
  • Physics-Aware - Explicit constraints documented (see docs/CONSTRAINTS.md)

For detailed architecture, see docs/ARCHITECTURE.md.


πŸŽ“ What You'll Learn

Building LineraDB teaches you the same concepts used at Google (Spanner), Cockroach Labs (CockroachDB), and Amazon (DynamoDB):

Distributed Consensus
  • Raft protocol implementation (leader election, log replication, safety)
  • Leader leases for linearizable reads
  • Handling network partitions and split-brain scenarios
  • Quorum-based decision making
Storage Systems
  • LSM tree implementation (memtable, SSTables, compaction)
  • Write-ahead logging (WAL) for durability
  • Crash recovery and consistency
  • Bloom filters and indexing strategies
Distributed Transactions
  • Two-phase commit (2PC) protocol
  • Multi-version concurrency control (MVCC)
  • Snapshot isolation and serializability
  • Deadlock detection and resolution
Network & Geographic Distribution
  • Cross-region latency optimization (speed of light limits)
  • WAN replication strategies
  • Clock synchronization (Hybrid Logical Clocks)
  • Failure detection in distributed systems
Query Execution
  • SQL parsing and AST construction
  • Query planning and optimization
  • Distributed query execution
  • Cost-based optimization
Operational Excellence
  • Chaos engineering and fault injection
  • Observability (metrics, logs, traces)
  • Zero-downtime deployments
  • Capacity planning and autoscaling

πŸ“ Roadmap

Phase 1: Foundation (Current)

  • Project structure and CI/CD
  • Hybrid Logical Clock (HLC) implementation
  • Basic Raft consensus (leader election)
  • In-memory storage engine
  • Simple key-value operations

Goal: Single-node database with time synchronization

Phase 2: Consensus

  • Full Raft implementation (log replication, safety)
  • Leader leases for linearizable reads
  • Multi-node cluster (3-5 nodes)
  • Persistent storage (LSM tree in Rust)
  • Write-ahead log (WAL)

Goal: 3-node replicated database with strong consistency

Phase 3: SQL & Transactions

  • SQL parser (SELECT, INSERT, UPDATE, DELETE)
  • Query planner and executor
  • Two-phase commit (2PC)
  • MVCC and snapshot isolation
  • Basic indexing

Goal: Single-region SQL database with ACID transactions

Phase 4: Distribution

  • Automatic sharding (consistent hashing)
  • Shard rebalancing
  • Distributed query execution
  • Cross-shard transactions
  • Metadata service

Goal: Horizontally scalable SQL database

Phase 5: Multi-Region

  • Cross-region Raft clusters
  • Geographic routing
  • Multi-region transactions
  • Conflict resolution
  • Region evacuation

Goal: Globally distributed database

Phase 6: Production Readiness

  • Full observability stack (Prometheus, Grafana, OpenTelemetry)
  • Chaos engineering suite
  • End-to-end encryption (TLS 1.3)
  • Authentication and authorization
  • Backup and point-in-time recovery
  • Jepsen testing for correctness

Goal: Production-grade distributed database

For detailed milestones, see docs/ROADMAP.md.


πŸ› οΈ Tech Stack

Layer Technology Why
Storage Engine Rust Memory safety, performance, FFI to Go
Consensus & SQL Go Excellent concurrency, network libraries
RPC gRPC + Protobuf Type-safe, efficient, language-agnostic
Cloud AWS/GCP Multi-Region Real-world deployment constraints
Observability Prometheus, Grafana, OpenTelemetry Industry-standard monitoring
Testing Jepsen, Chaos Engineering Correctness validation

🚦 Quick Start

Prerequisites

  • Go 1.25
  • Rust 1.92 (for storage engine, coming soon)
  • Docker (optional, for multi-node testing)

Run Locally

# Clone the repository
git clone https://github.com/nickemma/lineradb.git
cd lineradb

# Build the server
make build

# Run the server
make run

# Run tests
make test

# Run with race detector (recommended)
make test-race

Run with Docker

# Coming soon: Multi-node cluster
docker-compose up

πŸ“– Documentation


🀝 Contributing

LineraDB is primarily a learning project, but contributions are welcome! See CONTRIBUTING.md for guidelines.

Areas where help is appreciated:

  • πŸ› Bug reports and fixes
  • πŸ“ Documentation improvements
  • πŸ§ͺ Test coverage
  • πŸ’‘ Design feedback (especially from distributed systems experts!)
  • 🎨 Performance optimizations

πŸ”’ Security

If you discover a security vulnerability, please see SECURITY.md for responsible disclosure.


πŸ“œ License

Licensed:


🌟 Why This Exists

"I'm fascinated by how planet-scale systems work, but most engineers never get to build them from scratch. LineraDB is my answer: a complete implementation that proves one person can still understand and build the kind of infrastructure that powers Google Spanner, Snowflake or CockroachDB Labs."

If this project demonstrates anything, it's that:

  • Deep technical work still matters in an age of abstractions
  • Understanding systems from first principles beats black-box thinking
  • One engineer with focus can build something that matters

This project is my golden ticket a demonstration of deep, hands-on expertise in distributed systems, not just theoretical knowledge. It's the rΓ©sumΓ© artifact that screams:

"I don't just use distributed databases. I build them from scratch."


🎯 Who This Is For

  • Engineers learning distributed systems - Follow along, ask questions, contribute
  • Hiring managers at infrastructure companies - This is what mastery looks like
  • Students - See a real-world implementation of concepts from papers
  • Open source enthusiasts - Help make this better

πŸ‘€ Author

@nickemma β€’ Building distributed systems from first principles

πŸ’Ό Open to opportunities at Google, Cockroach Labs, Snowflake, Databricks, AWS, Meta, or any company building serious infrastructure.

πŸ“§ Contact: nicholasemmanuel321@gmail.com
🐦 Twitter: @techieemma
πŸ’Ό LinkedIn: Nicholas Emmanuel


⭐ Support

If you believe one engineer can still build production-grade distributed infrastructure, star this repo and follow along.

Let's prove that deep technical work still matters.


πŸ’– Sponsors

LineraDB is built in public with love and a lot of coffee. If you'd like to support the journey:

Sponsor LineraDB

Thank you to all future sponsors β€” your support keeps the lights on and the commits flowing! πŸš€


Building Systems, Building Faith - One Day at A Time

⬆ Back to Top

About

LineraDB: A linearizable, globally distributed SQL database built from first principles as an educational project

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors