Skip to content

feat: WAL-based RocksDB replication with HTTP streaming and failover#366

Open
JackGuslerGit wants to merge 7 commits intomatrix-construct:mainfrom
JackGuslerGit:failover
Open

feat: WAL-based RocksDB replication with HTTP streaming and failover#366
JackGuslerGit wants to merge 7 commits intomatrix-construct:mainfrom
JackGuslerGit:failover

Conversation

@JackGuslerGit
Copy link

This relates to #35.

Summary:

  • Adds a primary/secondary replication system using RocksDB's WAL (Write-Ahead Log) streamed over HTTP
  • Secondary bootstraps from a full checkpoint on startup, then streams incremental WAL frames
  • Failover is triggered via POST /_tuwunel/replication/promote — no process restart needed
  • All replication endpoints are protected by a shared secret token

Test plan:

  • Ran two Docker containers (primary on :8008, secondary on :8009)
  • Secondary bootstrapped from primary checkpoint at seq 281 and began streaming
  • Stopped primary with docker stop (graceful SIGTERM)
  • Promoted secondary via curl — responded {"status":"promoted"}
  • All messages from before the failover were present on the promoted instance
  • Measured RPO ~0 on planned failover, RTO = seconds

Relevant config options added:

  • rocksdb_primary_url — URL of primary for WAL streaming
  • rocksdb_replication_token — shared secret for endpoint auth
  • rocksdb_replication_interval_ms — heartbeat interval (default 250ms)

@JackGuslerGit JackGuslerGit marked this pull request as ready for review March 12, 2026 14:54
@JackGuslerGit JackGuslerGit marked this pull request as draft March 12, 2026 14:59
@JackGuslerGit JackGuslerGit marked this pull request as ready for review March 13, 2026 15:04
@pschichtel
Copy link

this implements async replication, so some dataloss is to be expected after an expected failover (node failure, disk failure process crash, ...), right?

@JackGuslerGit
Copy link
Author

@pschichtel yes, that is correct. Under normal write load, RPO is determined just by network RTT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants