Skip to content

[Question] Internal flow of log replay and proposal serialization #411

@shcw

Description

@shcw

请教初始化流程与提案有序性的流程

  • Dragonboat version: v3.x

1. 关于初始化与回放的触发机制 (Initialization & Replay)

我的理解 (My Understanding):
NodeHost 启动时,节点应先加载最新的快照(Snapshot),然后重放(Replay)快照索引之后的 Raft 日志(Log Entries)来恢复状态机。

疑问 (Question):
我在 node.go 中看到了 replayLog 方法,它通过 logdb.ReadRaftState 获取了 EntryCount 和 Commit 索引。但我没找到物理触发点:

到底是谁、在哪个时刻,真正把这些从磁盘读出来的 pb.Update.CommittedEntries 丢进 TaskChan 的?

是在 StartCluster 之后由 node.run 的第一次迭代触发的吗?

2. 提案的有序性与并发控制 (Ordering & Concurrency)

我的理解 (My Understanding):
Dragonboat 支持高并发提案。我关注当我在“追加日志(初始化后重放)” 的期间尝试 SyncPropose 时,系统如何保证有序性。

疑问 (Question):
这种有序性是通过在 node.go 层的某个 Request Queue 加锁实现的,还是依靠 internal/raft 协议栈内部的串行化处理?
当 applyWorker 正在忙于回放大量旧日志时,新到达的 Propose 是否会被阻塞在某个队列中?它是如何实现无锁(或低锁)的高效入队的?


1. Mechanism of Initialization and Replay

My Understanding:
When a NodeHost starts, a node should first load the most recent snapshot and then replay the Raft log entries (aplog) following the snapshot index to restore the state machine's state.

Question:
I have examined the replayLog method in node.go, which retrieves the EntryCount and Commit index via logdb.ReadRaftState. However, I am struggling to locate the exact "physical" trigger point:

  • Who is responsible, and at what exact moment, for actually pushing those pb.Update.CommittedEntries (read from disk) into the TaskChan?
  • Is this process triggered during the very first iteration of the node.run loop after StartCluster is called? I suspect it is related to the gap between applied and committed index during Raft initialization, but I would like to confirm the exact call stack.

2. Ordering and Concurrency Control

My Understanding:
Dragonboat supports high-concurrency proposals. I am interested in how the system guarantees ordering when a SyncPropose is attempted while the system is still replaying a large volume of historical logs during the post-initialization phase.

Question:

  • Is this strict ordering enforced by a specific Request Queue with locking at the node.go layer, or does it rely on the internal serialization of the internal/raft protocol stack?
  • While the applyWorker is busy replaying a large backlog of logs, will a newly arrived Propose be blocked in a specific queue?
  • How does Dragonboat achieve high-efficiency entry into the queue? Is it implemented via a lock-free or low-lock mechanism (e.g., specific Go channels or internal task queues)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions