A Globally Distributed, Linearizable SQL Database Built From First Principles
One engineer. Zero shortcuts. Built from first principles.
Architecture β’ Roadmap β’ Contributing β’ Design Doc
LineraDB is an educational distributed SQL database built to understand how planet-scale systems work at the deepest level. Think of it as the distributed systems equivalent of writing your own compiler or operating system. A complete implementation that demonstrates mastery of:
- Distributed consensus (Raft with leader leases)
- Multi-region replication (active-active across cloud providers)
- Linearizable transactions (strong consistency guarantees)
- Custom storage engines (LSM trees in Rust)
- Fault tolerance (chaos engineering, automatic failover)
- Distributed SQL execution (parsing, planning, optimization)
| Component | Status | Description |
|---|---|---|
| Project Structure | β Complete | Modular architecture, CI/CD pipeline |
| Hybrid Logical Clock | π In Progress | Causal ordering & timestamping |
| Raft Consensus | π Planned | Leader election, log replication, safety |
| Storage Engine | π Planned | LSM tree with WAL in Rust |
| SQL Parser | π Planned | SELECT, INSERT, UPDATE, DELETE, JOINs |
| Distributed Transactions | π Planned | 2PC with MVCC and snapshot isolation |
| Sharding | π Planned | Automatic partitioning and rebalancing |
| Multi-Region | π Planned | Cross-region linearizable reads/writes |
| Observability | π Planned | Prometheus, Grafana, OpenTelemetry |
| Chaos Engineering | π Planned | Fault injection, partition testing |
Current Milestone: Building foundational distributed systems primitives (HLC, Raft)
LineraDB follows a modular monolith architecture with clear domain boundaries:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SQL Query Layer (Go) β
β Parser β Planner β Optimizer β Executor β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Transaction Coordinator (Go) β
β 2PC β’ MVCC β’ Snapshot Isolation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββ
β Raft Consensus β Sharding β Replication β
β (Go) β (Go) β (Go) β
β Leader Election β Consistent β Cross-Region β
β Log Replication β Hashing β Sync β
ββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Storage Engine (Rust + Go FFI) β
β LSM Tree β’ WAL β’ Compaction β’ Indexing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Design Principles:
- Hexagonal Architecture - Domain logic isolated from infrastructure
- Domain-Driven Design - Clear bounded contexts per module
- Contract-First - Protobuf definitions for inter-module communication
- Physics-Aware - Explicit constraints documented (see
docs/CONSTRAINTS.md)
For detailed architecture, see docs/ARCHITECTURE.md.
Building LineraDB teaches you the same concepts used at Google (Spanner), Cockroach Labs (CockroachDB), and Amazon (DynamoDB):
Distributed Consensus
- Raft protocol implementation (leader election, log replication, safety)
- Leader leases for linearizable reads
- Handling network partitions and split-brain scenarios
- Quorum-based decision making
Storage Systems
- LSM tree implementation (memtable, SSTables, compaction)
- Write-ahead logging (WAL) for durability
- Crash recovery and consistency
- Bloom filters and indexing strategies
Distributed Transactions
- Two-phase commit (2PC) protocol
- Multi-version concurrency control (MVCC)
- Snapshot isolation and serializability
- Deadlock detection and resolution
Network & Geographic Distribution
- Cross-region latency optimization (speed of light limits)
- WAN replication strategies
- Clock synchronization (Hybrid Logical Clocks)
- Failure detection in distributed systems
Query Execution
- SQL parsing and AST construction
- Query planning and optimization
- Distributed query execution
- Cost-based optimization
Operational Excellence
- Chaos engineering and fault injection
- Observability (metrics, logs, traces)
- Zero-downtime deployments
- Capacity planning and autoscaling
- Project structure and CI/CD
- Hybrid Logical Clock (HLC) implementation
- Basic Raft consensus (leader election)
- In-memory storage engine
- Simple key-value operations
Goal: Single-node database with time synchronization
- Full Raft implementation (log replication, safety)
- Leader leases for linearizable reads
- Multi-node cluster (3-5 nodes)
- Persistent storage (LSM tree in Rust)
- Write-ahead log (WAL)
Goal: 3-node replicated database with strong consistency
- SQL parser (SELECT, INSERT, UPDATE, DELETE)
- Query planner and executor
- Two-phase commit (2PC)
- MVCC and snapshot isolation
- Basic indexing
Goal: Single-region SQL database with ACID transactions
- Automatic sharding (consistent hashing)
- Shard rebalancing
- Distributed query execution
- Cross-shard transactions
- Metadata service
Goal: Horizontally scalable SQL database
- Cross-region Raft clusters
- Geographic routing
- Multi-region transactions
- Conflict resolution
- Region evacuation
Goal: Globally distributed database
- Full observability stack (Prometheus, Grafana, OpenTelemetry)
- Chaos engineering suite
- End-to-end encryption (TLS 1.3)
- Authentication and authorization
- Backup and point-in-time recovery
- Jepsen testing for correctness
Goal: Production-grade distributed database
For detailed milestones, see docs/ROADMAP.md.
| Layer | Technology | Why |
|---|---|---|
| Storage Engine | Rust | Memory safety, performance, FFI to Go |
| Consensus & SQL | Go | Excellent concurrency, network libraries |
| RPC | gRPC + Protobuf | Type-safe, efficient, language-agnostic |
| Cloud | AWS/GCP Multi-Region | Real-world deployment constraints |
| Observability | Prometheus, Grafana, OpenTelemetry | Industry-standard monitoring |
| Testing | Jepsen, Chaos Engineering | Correctness validation |
- Go 1.25
- Rust 1.92 (for storage engine, coming soon)
- Docker (optional, for multi-node testing)
# Clone the repository
git clone https://github.com/nickemma/lineradb.git
cd lineradb
# Build the server
make build
# Run the server
make run
# Run tests
make test
# Run with race detector (recommended)
make test-race# Coming soon: Multi-node cluster
docker-compose up- Architecture Overview - System design and module boundaries
- Constraints & Physics - Network latency, CAP theorem, failure modes
- Trade-offs - Design decisions and alternatives considered
- Roadmap - Detailed milestones and timeline
- Runbook - Operations guide (coming soon)
LineraDB is primarily a learning project, but contributions are welcome! See CONTRIBUTING.md for guidelines.
Areas where help is appreciated:
- π Bug reports and fixes
- π Documentation improvements
- π§ͺ Test coverage
- π‘ Design feedback (especially from distributed systems experts!)
- π¨ Performance optimizations
If you discover a security vulnerability, please see SECURITY.md for responsible disclosure.
Licensed:
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
"I'm fascinated by how planet-scale systems work, but most engineers never get to build them from scratch. LineraDB is my answer: a complete implementation that proves one person can still understand and build the kind of infrastructure that powers Google Spanner, Snowflake or CockroachDB Labs."
If this project demonstrates anything, it's that:
- Deep technical work still matters in an age of abstractions
- Understanding systems from first principles beats black-box thinking
- One engineer with focus can build something that matters
This project is my golden ticket a demonstration of deep, hands-on expertise in distributed systems, not just theoretical knowledge. It's the rΓ©sumΓ© artifact that screams:
"I don't just use distributed databases. I build them from scratch."
- Engineers learning distributed systems - Follow along, ask questions, contribute
- Hiring managers at infrastructure companies - This is what mastery looks like
- Students - See a real-world implementation of concepts from papers
- Open source enthusiasts - Help make this better
@nickemma β’ Building distributed systems from first principles
πΌ Open to opportunities at Google, Cockroach Labs, Snowflake, Databricks, AWS, Meta, or any company building serious infrastructure.
π§ Contact: nicholasemmanuel321@gmail.com
π¦ Twitter: @techieemma
πΌ LinkedIn: Nicholas Emmanuel
If you believe one engineer can still build production-grade distributed infrastructure, star this repo and follow along.
Let's prove that deep technical work still matters.
LineraDB is built in public with love and a lot of coffee. If you'd like to support the journey:
Thank you to all future sponsors β your support keeps the lights on and the commits flowing! π
Building Systems, Building Faith - One Day at A Time