This document covers WILD's replication system for read scaling and high availability.
WILD implements a primary-replica replication model designed for read scaling while maintaining the extreme performance characteristics of the single-node database.
- Primary Node: Handles all writes, replicates to replicas
- Replica Nodes: Handle reads, receive updates from primary
- WAL-Based Replication: Real-time streaming of write-ahead log batches
- Authenticated Communication: All replication traffic is authenticated
- Client-Side Load Balancing: Automatic replica selection and failover
./zig-out/bin/wild \
--mode primary \
--auth-secret "mysecret" \
--enable-wal \
--wal-path primary.wal \
--port 7878 \
--replication-port 9001./zig-out/bin/wild \
--mode replica \
--auth-secret "mysecret" \
--primary-address 127.0.0.1 \
--primary-port 9001 \
--port 7879# Client automatically uses primary for writes, replicas for reads
./zig-out/bin/wild-client-example load-testRole: Write node, replication source
- Accepts all write operations (SET, DELETE)
- Streams WAL batches to connected replicas
- Maintains replica health and connection state
- Handles replica catch-up for lagging nodes
Role: Read node, replication target
- Accepts read operations (GET)
- Receives and applies WAL batches from primary
- Maintains connection to primary with health monitoring
- Supports automatic reconnection and catch-up
Role: Intelligent load balancer
- Routes writes to primary node
- Routes reads to healthy replica nodes (round-robin)
- Monitors replica health with automatic failover
- Falls back to primary if all replicas fail
./zig-out/bin/wild \
--mode primary \ # Enable primary mode
--auth-secret "production-secret" \ # Authentication secret
--capacity 2000000 \ # Database capacity
--enable-wal \ # Required for replication
--wal-path /data/primary.wal \ # WAL file location
--port 7878 \ # Client connection port
--bind-address 0.0.0.0 \ # Bind to all interfaces
--replication-port 9001 # Replica connection portPrimary-Specific Options:
--replication-port: Port for replica connections (required)--enable-wal: WAL required for replication (automatic check)
./zig-out/bin/wild \
--mode replica \ # Enable replica mode
--auth-secret "production-secret" \ # Same secret as primary
--primary-address primary.example.com \ # Primary server address
--primary-port 9001 \ # Primary replication port
--port 7879 \ # Client connection port (unique)
--bind-address 0.0.0.0 # Bind to all interfacesReplica-Specific Options:
--primary-address: Hostname/IP of primary node (required)--primary-port: Primary's replication port (required)--port: Unique port for this replica's client connections
Clients use a configuration that includes all database nodes:
const config = client_router.ClientRouter.Config{
.primary_address = "primary.example.com",
.primary_port = 7878,
.replica_addresses = &[_][]const u8{
"replica1.example.com",
"replica2.example.com",
"replica3.example.com"
},
.replica_ports = &[_]u16{ 7879, 7880, 7881 },
.auth_secret = "production-secret",
.max_connections_per_endpoint = 10,
.health_check_interval_ms = 30000,
};./zig-out/bin/wild \
--mode primary \
--auth-secret "$(cat /etc/wild/secret)" \
--capacity 4000000 \
--enable-wal \
--wal-path /data/wild/primary.wal \
--port 7878 \
--bind-address 0.0.0.0 \
--replication-port 9001./zig-out/bin/wild \
--mode replica \
--auth-secret "$(cat /etc/wild/secret)" \
--primary-address primary.example.com \
--primary-port 9001 \
--port 7879 \
--bind-address 0.0.0.0./zig-out/bin/wild \
--mode replica \
--auth-secret "$(cat /etc/wild/secret)" \
--primary-address primary.example.com \
--primary-port 9001 \
--port 7880 \
--bind-address 0.0.0.0 ┌─────────────────┐
│ Load Balancer │
│ (Optional) │
└─────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌───────▼──────┐ ┌────▼────┐ ┌─────▼─────┐
│ Primary │ │ Replica │ │ Replica │
│ :7878 (RW) │ │ :7879(R)│ │ :7880(R) │
└──────────────┘ └─────────┘ └───────────┘
│ ▲ ▲
│ │ │
└──── Replication ────────────
:9001
[Unit]
Description=WILD Database Primary Node
After=network.target
[Service]
Type=simple
User=wild
Group=wild
ExecStart=/usr/local/bin/wild \
--mode primary \
--auth-secret-file /etc/wild/secret \
--capacity 4000000 \
--enable-wal \
--wal-path /data/wild/primary.wal \
--port 7878 \
--replication-port 9001
Restart=always
RestartSec=5
WorkingDirectory=/data/wild
[Install]
WantedBy=multi-user.target[Unit]
Description=WILD Database Replica Node
After=network.target
[Service]
Type=simple
User=wild
Group=wild
ExecStart=/usr/local/bin/wild \
--mode replica \
--auth-secret-file /etc/wild/secret \
--primary-address primary.example.com \
--primary-port 9001 \
--port 7879
Restart=always
RestartSec=5
Environment=WILD_REPLICA_ID=replica1
[Install]
WantedBy=multi-user.target- WAL Streaming: Primary streams write-ahead log batches to replicas
- Batched Updates: Multiple operations bundled for efficiency
- Minimal Lag: Typically microsecond-level replication delay
- Ordered Delivery: Operations applied in same order as primary
- Health Monitoring: Continuous replica health checks
- Automatic Reconnection: Replicas reconnect on connection loss
- Catch-Up Protocol: Lagging replicas automatically sync missing data
- Bootstrap Support: New replicas can join existing clusters
- Read Load Balancing: Round-robin distribution across healthy replicas
- Automatic Failover: Seamless failover when replicas become unhealthy
- Primary Fallback: Reads fall back to primary if all replicas fail
- Connection Pooling: Efficient connection reuse for performance
# Check primary replication status
curl -s http://primary.example.com:7878/stats
# Monitor replica connections
netstat -tn | grep :9001
# Check WAL file growth
ls -lah /data/wild/primary.wal# Test replica health
./zig-out/bin/wild-client-example health
# Load test with replication
./zig-out/bin/wild-client-example replication-test- Measurement: Difference between primary and replica batch IDs
- Target: < 1ms under normal load
- Monitoring: Built into client router statistics
- Primary overhead: ~5% performance impact for replication
- Replica performance: 95% of primary read performance
- Network bandwidth: Proportional to write volume
Replica Connection Failures
# Check network connectivity
telnet primary.example.com 9001
# Verify authentication
grep "auth" /var/log/wild/replica.log
# Check primary replication port binding
netstat -ln | grep :9001Replication Lag
# Check network latency
ping primary.example.com
# Monitor WAL batch rates
tail -f /var/log/wild/primary.log | grep "batch"
# Check replica catch-up status
grep "catch-up" /var/log/wild/replica.logClient Failover Issues
# Test individual endpoints
./zig-out/bin/wild-client-example --endpoint replica1.example.com:7879 health
./zig-out/bin/wild-client-example --endpoint replica2.example.com:7880 health
# Check client router configuration
./zig-out/bin/wild-client-example config-test-
Prepare replica server
# Install WILD binary # Configure authentication secret # Set up systemd service
-
Start replica with primary connection
./zig-out/bin/wild \ --mode replica \ --auth-secret "$(cat /etc/wild/secret)" \ --primary-address primary.example.com \ --primary-port 9001 \ --port 7881 -
Update client configurations
# Add new replica to client router config # Deploy updated client configurations
-
Remove from client configurations
# Update client router config # Deploy updated configurations
-
Gracefully stop replica
systemctl stop wild-replica
-
Clean up replica data (if desired)
rm -rf /data/wild/replica/
WILD currently requires manual failover procedures:
-
Stop primary node
systemctl stop wild-primary
-
Promote replica to primary
# Stop replica systemctl stop wild-replica # Reconfigure as primary ./zig-out/bin/wild \ --mode primary \ --auth-secret "$(cat /etc/wild/secret)" \ --enable-wal \ --wal-path /data/wild/new-primary.wal \ --port 7878 \ --replication-port 9001
-
Update client configurations
# Point clients to new primary # Update replica configurations
- Primary overhead: ~5% performance reduction with replication enabled
- Network bandwidth: Proportional to write rate
- Memory usage: Additional connection buffers for each replica
- Read scaling: Add replicas to distribute read load
- Write scaling: Single primary limits write throughput
- Network capacity: Ensure adequate bandwidth for replication traffic
- Same datacenter: Minimize replication latency
- Private network: Use dedicated network for replication traffic
- SSD storage: Fast storage for WAL files improves replication performance
- Single primary: No multi-primary support
- Manual failover: No automatic primary promotion
- No conflict resolution: Split-brain scenarios require manual intervention
- Memory-bound: All nodes must fit data in L3 cache
- Automatic failover: Leader election and automatic promotion
- Multi-datacenter replication: Cross-datacenter deployment support
- Read preferences: Client-side read preference configuration
- Replica priorities: Weighted replica selection
This replication system enables WILD to scale read workloads while maintaining its extreme performance characteristics. For production deployments, carefully plan your network topology and monitoring strategy.