DoltSwarm is a peer-to-peer synchronization library for Dolt databases.
It is designed to make a group of peers converge to the same linear main history (no merge commits) using a total order based on Hybrid Logical Clocks (HLC), without requiring a central coordinator.
- Pull-only sync: a peer never causes remote writes; it only reads remote objects and updates its own local
main. - Local writes: mutations happen locally via SQL +
DOLT_COMMIT. - Deterministic ordering: commits carry an HLC timestamp and are ordered by
(wall, logical, peerID)everywhere. - Deterministic reconciliation: when concurrent commits “cross”, peers deterministically replay (cherry-pick) in HLC order so history converges.
- Finalized prefix (watermark): peers compute a conservative “finalized base” from digests/checkpoints and avoid rewriting history at or before that point (tail-only replay), reducing replay churn as history grows.
- First write wins (FWW) on conflict: later conflicting commits are dropped deterministically.
- Offline commit resubmission: commits created while offline that are “too old” relative to the network watermark are re-submitted as new commits with provenance instead of attempting to rewrite finalized history (enabled by default).
- Skew guard: adverts with HLC wall time too far in the future are rejected (
MaxClockSkew). - Anti-entropy repair: periodic digests/checkpoints trigger re-sync even if commit adverts are missed.
At a high level there are two planes:
- Control plane (gossip): small messages that advertise “there is a commit with HLC X” and periodically summarize history.
- Data plane (Dolt-native fetch): bulk object transfer is done through Dolt’s existing fetch pipeline, using a read-only
swarm://remote.
This repo is organized into small packages so the “what” (protocol), “how” (core engine), and “I/O boundary” (transport) are separated.
The top-level doltswarm package remains the public import path and re-exports the main API for convenience.
doltswarm.go: public facade (type aliases + wrapper functions) so consumers can keep importinggithub.com/nustiueudinastea/doltswarm.protocol/: wire- and identity-level types used across the system:- HLC (
protocol/hlc.go), repo identity (protocol/repo_id.go) - commit metadata format + signing (v2 metadata with optional
kind/origin) (protocol/metadata.go,protocol/signer.go) - gossip payload structs (
protocol/messages.go)
- HLC (
transport/: the pluggable networking boundary (no Dolt logic here):- control plane interfaces (
transport/transport.go) - data plane provider interfaces (
transport/provider.go) - best-effort provider hints + retry exclusions via context (
transport/provider_hint.go)
- control plane interfaces (
core/: the synchronization engine and Dolt integration:core/node.go:Node(gossip loop, debounce, fetch+reconcile orchestration)core/reconciler_core.go: deterministic linearizer (temp-branch replay + conflict drops)core/finalization.go: watermark computation + finalized base helpers (tail-only replay)core/db.go/core/sql.go: Dolt SQL driver wrapper + commit helpers (DOLT_COMMIT,DOLT_FETCH, merge-base helpers)core/swarm_dbfactory.go/core/swarm_registry.go:swarm://dbfactory registration + provider picker registrycore/remote_chunk_store.go/core/remote_table_file.go: read-only chunk store/table-file adapters used by Dolt’s fetch pipelinecore/index.go/core/index_mem.go: local-only commit index used for pending/applied/rejected HLC tracking, digest checkpoints, finalized base, and resubmission idempotencycore/bundles.go: (reference/experimental) commit-bundle builder/importer; current pipeline primarily uses Dolt-native fetch
integration/: demo + docker-based integration tests and transport implementations (libp2p gossip + gRPC chunkstore):integration/main.go:ddoltdemo CLI used by testsintegration/integration_test.go: container-per-peer test harnessintegration/transport/: sample transports (gossipsub,grpcswarm,overlay)
Node(core/node.go): the sync engine you embed; runs gossip loops, performs fetch/reconcile, and exposesCommit/ExecAndCommit.DB(core/db.go,core/sql.go): thin wrapper around the Dolt SQL driver; creates signed metadata commits; provides helpers likeFetchSwarm,MergeBase.Reconciler(core/reconciler_core.go): deterministic linearizer (no networking). Given local+imported commits after a merge base, it produces a linearmain.CommitMetadata(protocol/metadata.go): JSON stored in the Dolt commit message, including(HLC, author, date, content_hash)plus a signature.- Resubmission provenance (
protocol/metadata.go): resubmitted commits setkind:"resubmission"and include anoriginenvelope containing the original commit’s metadata/signature/HLC and a deterministicorigin_event_id. swarm://dbfactory (core/swarm_dbfactory.go,core/swarm_registry.go,core/remote_chunk_store.go): a process-global hook that lets Dolt treat “the swarm” as a remote by selecting any live provider and serving chunks/table files read-only.- Transport boundary (
transport/):Transport/Gossipcontrol plane and provider interfaces for the read-only data plane. - Local “anti-entropy” index (
core/index.go,core/index_mem.go): tracks which HLCs are pending/applied/rejected, publishes checkpoints in digests, and stores a local finalized base.
Local write:
Pull-first sync (best-effort)
Resubmit any late offline commits (best-effort, if enabled)
Exec SQL locally → Commit on local main (message = signed CommitMetadata JSON)
→ Publish CommitAd(repo, HLC, metadata_json, metadata_sig)
→ Publish Digest(repo, head, checkpoints) (best-effort, immediate)
Remote reception:
Receive CommitAd → validate basic invariants + skew guard → verify metadata signature (when IdentityResolver is configured) → mark HLC as pending
→ trigger a sync pass (debounced)
Sync pass:
DOLT_FETCH('swarm') (transport picks any provider; may honor a preferred-provider hint)
if hinted fetch returns no commits, retry other providers (bounded)
find merge base: MergeBase(main, remotes/swarm/main)
try fast-forward main to remote head (ff-only merge)
else replay deterministically on a temp branch in HLC order (tail-only when a finalized base is set)
attempt to advance finalized base from digests/checkpoints
resubmit any newly-detected late offline commits (best-effort, if enabled)
Provider hints are best-effort: the node remembers which peer mentioned a given HLC (from commit adverts, digests, and missing checkpoints) for a short TTL and tries that provider first.
If a hinted fetch returns stale data (no imported commits for the hint), the node retries the same sync pass against other providers (bounded), avoiding waiting for the next digest/RepairInterval.
When a node commits locally (Node.Commit / Node.ExecAndCommit):
- It generates an HLC timestamp (
HLC.Now()). - It builds
CommitMetadata(message, HLC, content hash, author/email/date) and signs it. - It creates a Dolt commit whose commit message is the metadata JSON and tags the commit hash with a signature (secondary).
- It gossips a
CommitAdV1containing the HLC and the metadata bytes. - It publishes a
DigestV1best-effort so peers can repair missed adverts without waiting forRepairInterval.
Peers do not push objects to one another. A received advert is only a hint to pull.
Before committing, Node does a short best-effort sync pass (PullFirstTimeout, PullFirstPasses) so that, when possible, new commits are made on top of already-known earlier HLC commits. This reduces the number of times history must be rewritten via replay.
After a fetch, if main cannot be fast-forwarded to the fetched remote head, the node deterministically linearizes:
- Compute the merge base between
mainandremotes/swarm/main. - Collect commits after the base from:
- local
main(already present) - remote
remotes/swarm/main(now present locally after fetch)
- local
- Deduplicate and sort by HLC (tie-break by
peerID). - Create a temp branch (
replay_tmp) at the merge base. - Cherry-pick commits in HLC order onto
replay_tmp, and rewrite commit metadata so all peers replay with the same author/date/message. - Update
mainto the temp result after verifyingmaindid not advance during replay (crash-safe / restartable).
As histories grow, replaying from the merge base can become expensive. To reduce replay churn, nodes maintain a local finalized base:
- Nodes publish digests containing recent checkpoints (HLC + commit hash).
- From recent digests, a node computes a conservative watermark based on checkpoints that appear in at least
MinPeersForWatermarkactive peers (withinPeerActivityTTL), minusWatermarkSlack. - The node advances its finalized base to the highest commonly-known checkpoint at or before the watermark.
When a finalized base is set, reconciliation becomes tail-only:
- Replay starts from the finalized base’s local commit hash (not the merge base), and commits at or before the finalized HLC are filtered out.
- This guarantees the finalized prefix is never rewritten, improving convergence time in long-running clusters.
If a cherry-pick encounters a conflict, the reconciler drops the later commit (the one being applied when the conflict occurs).
Dropped commits are recorded locally as rejected (handled) to avoid livelock via repeated adverts/checkpoints. An optional callback can be invoked (CommitRejectedCallback).
Periodically (and also immediately after local commits), each node publishes a compact DigestV1 that contains:
- its head HLC/hash, and
- recent “checkpoints” (HLC + commit hash).
If you missed an advert (packet loss, temporary disconnect), a peer’s digest will usually expose a checkpoint you don’t have, which triggers a sync pass to fetch and reconcile.
If a peer was offline and created commits with an HLC that is at or before the current watermark, those commits are “late” relative to the network. Because the finalized prefix is not rewritten, DoltSwarm treats these as resubmissions:
- The node cherry-picks the original commit onto the current
main, producing a new commit with a fresh HLC. - The new commit’s metadata sets
kind:"resubmission"and includes anoriginenvelope with the original metadata/signature/HLC plus a deterministicorigin_event_idfor idempotency. - Resubmissions are advertised like normal commits, and conflicts are handled with the usual FWW rule.
Most consumers should use Node and treat transport as a plug-in.
OpenNode(cfg NodeConfig) (*Node, error)and(*Node).Run(ctx)start the background gossip+sync loops.(*Node).Commit(msg)and(*Node).ExecAndCommit(exec, msg)perform local writes and automatically advertise.(*Node).Sync(ctx)/(*Node).SyncHint(ctx, hlc)allow manual/one-shot sync passes (useful for CLIs/tests).
NodeConfig requires:
Repo(RepoID{Org, RepoName}): logical repo identity.Signer: signs metadata for local commits and providesVerifyfor remote signature checks.Identity(optional): if set, incoming commit adverts are verified against peer public keys; unverifiable/invalid adverts are rejected.Transport: provides (1) gossip message delivery and (2) a provider picker for the read-only data plane.
Optional convergence-related knobs (all have conservative defaults):
MinPeersForWatermark: minimum active peers required to compute a finalized base watermark (default: 2).WatermarkSlack: slack subtracted from the watermark to be conservative about finalization (default: 30s).PeerActivityTTL: how long a peer remains “active” for watermark computation (default: 5m).EnableResubmission: enable/disable offline commit resubmission (default: enabled).
cfg := doltswarm.NodeConfig{
Dir: "/path/to/dolt/working/dir",
Repo: doltswarm.RepoID{RepoName: "mydb"},
Signer: signer, // implements doltswarm.Signer
Transport: transport, // implements doltswarm.Transport (details omitted)
}
n, _ := doltswarm.OpenNode(cfg)
go n.Run(ctx)
_, _ = n.ExecAndCommit(func(tx *sql.Tx) error {
_, err := tx.Exec("CREATE TABLE IF NOT EXISTS t (pk INT PRIMARY KEY, v TEXT)")
return err
}, "init schema")