🏗️ Architecture

Why append-only logs?

They provide:

Immutability
Deterministic replay
Simple conflict resolution
Easy sync

Why local-first?

No loading spinners
Works offline
User owns data

Why WASM + Go?

Deterministic execution
Shared logic between backend and browser
Strong concurrency model

Why stateless servers?

Easier scaling
No trust required
Replaceable relays

This document explains the core design decisions behind the decentralized database.

🎯 Core Principles

Offline-first: Operations work without network connectivity
Conflict-free: CRDT merge logic ensures eventual consistency
Privacy-first: End-to-end encryption, zero server trust
Simple sync: Dumb relay server keeps infrastructure minimal

📝 Why Append-Only Logs?

The Problem

Traditional databases use mutable state:

UPDATE statements modify rows in-place
DELETE removes data permanently
Concurrent writes require locks
Conflicts need central coordination

Our Solution: Immutable Event Log

Every operation is appended, never modified:

[op1: set user:1 = "Alice"]
[op2: set user:1 = "Bob"]      ← Doesn't override op1
[op3: delete user:1]            ← Doesn't erase previous ops

Benefits

1️⃣ Crash Recovery

System crashes mid-write? No corruption.
Restart and replay the log from disk.
Atomic writes: operation fully written or not at all.

2️⃣ Distributed Sync

Can merge logs from different devices
No "last modified" conflicts
Deterministic replay on all peers

3️⃣ Audit Trail

Complete history of all changes
Can reconstruct state at any point in time
Useful for debugging and compliance

4️⃣ Simple Implementation

func Append(op Operation) error {
    data := json.Marshal(op)
    file.Write(append(data, '\n'))  // Just append!
}

No in-place updates, no B-trees, no WAL complexity.

Trade-offs

❌ Log grows unbounded (needs compaction strategy)
❌ Slower queries (must replay entire log)
✅ Rock-solid consistency guarantees
✅ Trivial to implement correctly

🔐 Why Single-Writer Model?

The Problem with Multiple Writers

Race condition example:

// Thread 1                    // Thread 2
ops := store.ops               ops := store.ops
ops = append(ops, op1)         ops = append(ops, op2)
store.ops = ops                store.ops = ops
                               ← op1 lost!

Traditional solutions:

Mutexes (slow, blocking)
Lock-free algorithms (complex, error-prone)
Transactions (heavyweight overhead)

Our Solution: Single Goroutine Owner

func (s *Store) run() {
    for op := range s.input {
        s.ops = append(s.ops, op)  // Only this goroutine writes
    }
}

func Apply(op Operation) {
    s.input <- op  // Everyone else just sends
}

Why This Works

1️⃣ No Race Conditions

Only one goroutine modifies s.ops
Go channels provide synchronization
No mutex overhead

2️⃣ Natural Backpressure

Slow disk I/O? Channel fills up
Fast producers automatically wait
No manual flow control needed

3️⃣ Sequential Consistency

Operations processed in channel order
Easier to reason about
Deterministic behavior

4️⃣ Simple Testing

go test -race  # Zero data races!

Performance Implications

✅ Single-threaded writes scale to ~100K ops/sec
✅ Reads can be concurrent (RWMutex for Get)
❌ Can't parallelize writes across cores
❌ Bounded by single disk throughput

For most CRDT use cases (collaborative apps, local-first), this is plenty fast.

🌐 Why IndexedDB (Browser Storage)?

Options Comparison

Storage	Capacity	Async	Structured	Verdict
`localStorage`	5-10 MB	❌ Sync	❌ String-only	Too limited
`sessionStorage`	5-10 MB	❌ Sync	❌ String-only	Too limited
IndexedDB	50+ MB	✅ Async	✅ Objects	Winner
WebSQL	Deprecated	✅	✅	Dead standard

Why IndexedDB Wins

1️⃣ Capacity

// localStorage: Only ~5MB
localStorage.setItem('ops', JSON.stringify(ops))  // Quota exceeded!

// IndexedDB: 50MB+ (browser-dependent)
db.put('operations', ops)  // Plenty of room

2️⃣ Asynchronous API

// localStorage blocks main thread
const data = localStorage.getItem('big-data')  // UI freezes 😰

// IndexedDB doesn't block
db.getAll('operations').then(ops => {
    // UI stays responsive ✨
})

3️⃣ Structured Data

// localStorage: Everything is a string
const user = JSON.parse(localStorage.getItem('user'))  // Manual parse

// IndexedDB: Store objects directly
db.put('users', { id: 1, name: 'Alice' })  // Native objects

4️⃣ Indexing and Queries

// Find all ops after timestamp 100
const tx = db.transaction('operations')
const index = tx.objectStore('operations').index('timestamp')
const ops = index.getAll(IDBKeyRange.lowerBound(100))

5️⃣ Transactions

const tx = db.transaction(['operations', 'seen'], 'readwrite')
tx.objectStore('operations').add(op)
tx.objectStore('seen').put(op.id, true)
// Atomic: both succeed or both fail

WASM Integration

//go:build js && wasm

import "syscall/js"

func (s *IndexedDBStorage) Append(op Operation) error {
    db := js.Global().Get("indexedDB")
    // Call browser IndexedDB APIs from Go!
}

🔒 Why Client-Side Encryption?

Zero-Trust Architecture

┌─────────────────┐
│  User's Device  │
│  ┌───────────┐  │
│  │ Plain Ops │  │     Encryption happens HERE
│  └─────┬─────┘  │            ↓
│        │ AES-GCM│     ┌──────────────┐
│  ┌─────▼─────┐  │     │ Encrypted Op │
│  │ Cipher Op │──┼────>│ (Base64)     │
│  └───────────┘  │     └──────────────┘
└─────────────────┘              │
                                 │
                          ┌──────▼──────┐
                          │    Server   │
                          │  (can't read)│
                          └─────────────┘

Why Encrypt on Client?

1️⃣ Server Can't Read Your Data

// Server receives this:
{
  "id": "op123",
  "value": "nfj38f9h2f9h2f9..."  // Encrypted blob
}

// Server has zero idea what this means

Even if server is compromised:

✅ Attacker gets encrypted gibberish
✅ No encryption keys on server
✅ No plaintext data

2️⃣ Compliance-Friendly

GDPR: "Right to be forgotten" → Just delete your key
HIPAA: PHI never leaves device unencrypted
SOC2: Server doesn't process sensitive data

3️⃣ Multi-Tenant Security

User A's key encrypts User A's data
User B's key encrypts User B's data

Server stores both, can't distinguish them

No user can decrypt another user's data, even if they hack the server.

4️⃣ Offline Encryption

// Works without network
encrypted := Encrypt(data, identity.Key)
storage.Append(encrypted)

// Sync later when online

Implementation

func Encrypt(data, key []byte) ([]byte, error) {
    block, _ := aes.NewCipher(key)
    gcm, _ := cipher.NewGCM(block)
    
    nonce := make([]byte, 12)
    rand.Read(nonce)
    
    // Authenticated encryption
    return gcm.Seal(nonce, nonce, data, nil), nil
}

AES-GCM provides:

Confidentiality (encryption)
Authenticity (can't tamper without detection)
Associated data (can include metadata)

🤝 Why Dumb Relay Server?

Smart Server (Traditional)

┌──────────────────────┐
│    Server            │
│  ┌────────────────┐  │
│  │ User Auth      │  │  
│  │ Permissions    │  │  ← Complex logic
│  │ Conflict Res   │  │  ← State machine  
│  │ Data Transform │  │  ← Business rules
│  └────────────────┘  │
└──────────────────────┘

Problems:

❌ Server must understand application logic
❌ Tight coupling between client and server
❌ Complex deployment and scaling
❌ Single point of failure

Dumb Relay (Our Approach)

func relay() {
    for msg := range broadcast {
        for client := range clients {
            client.Send(msg)  // Just forward!
        }
    }
}

That's it. 42 lines of code.

Why This Is Powerful

1️⃣ Zero Application Logic

Server doesn't know:

What the data means
Who can access what
How to resolve conflicts

All intelligence in clients.

2️⃣ Trivial to Scale

# Run 10 relays behind load balancer
for i in 1..10; do
    ./relay --port=$((8080 + i)) &
done

No shared state, no coordination.

3️⃣ Cheap Infrastructure

Client A ──┐
           ├──> Relay ($5/month VPS) ──┐
Client B ──┘                           ├──> Client C
                                       └──> Client D

Compare to Firebase ($25-$500/month) with vendor lock-in.

4️⃣ Self-Hosting Friendly

# Users can run their own relay
docker run -p 8080:8080 decentralized-db/relay

No need to trust a third party.

5️⃣ Swappable Backends

┌─────────┐
│ WebRTC  │  ← Peer-to-peer, no server
└─────────┘

┌─────────┐
│ Relay   │  ← Simple broadcast
└─────────┘

┌─────────┐
│ libp2p  │  ← Decentralized network
└─────────┘

Same CRDT core, different transport layers.

Security Model

Client sends: ENCRYPTED operation
Relay sees:   Random bytes
Relay does:   Broadcast to peers
Peers decrypt using their keys

No auth needed! Encrypted data is self-authenticating.

🎯 Design Wins

What We Get Right

✅ Simplicity: Core CRDT logic is ~300 lines
✅ Correctness: Deterministic merge, no race conditions
✅ Privacy: E2E encryption, server can't read data
✅ Resilience: Offline-first, crash recovery
✅ Cost: Self-host on $5/month VPS

Known Trade-offs

❌ Storage overhead: Log needs compaction
❌ Query performance: Must replay log
❌ No indexes: Linear scan for queries
❌ No transactions: Eventually consistent only

When to Use This

✅ Collaborative apps (Notion, Figma)
✅ Local-first software (offline editing)
✅ Privacy-critical apps (healthcare, finance)
✅ Multi-device sync (phone + laptop + tablet)

❌ Real-time analytics (need aggregations)
❌ Complex joins (need relational DB)
❌ Strong consistency (need distributed consensus)

📚 Further Reading

CRDTs: The Hard Parts - Martin Kleppmann
Local-First Software
Designing Data-Intensive Applications

Architecture designed for simplicity, privacy, and resilience.

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History