Skip to content

handle create repl_dev failure #807

@Besroy

Description

@Besroy

Creating a replication device (repl dev) can fail and leave garbage repl devs on certain members. For example:

  • Leader adds member F2, but the operation times out.
  • Leader assumes F2 is not in the group, but F2 successfully joins.
  • When the group is destroyed, F2 is not included, leaving an orphaned group on F2.

Related Logs

Leader:

  • Sent join request to F2:
    [09/20/25 09:01:12.178923] [I] [76] [handle_join_leave.cxx:149:invite_srv_to_join_cluster] sent join request to peer 984278082, 98f2b032-095d-4d78-a069-d1e21f95603d [group=e3be8382-63f2-4edd-85ed-8851e8e65641]
    
  • Timeout occurred:
    [09/20/25 09:01:14.179209] [I] [61] [raft_server.cxx:1639:handle_ext_resp_err] receive an rpc error response from peer server, Deadline Exceeded 12 [group=e3be8382-63f2-4edd-85ed-8851e8e65641]
    

Follower (F2):

  • Received the join request after the leader timed out:
    [09/20/25 09:01:14.390810] [I] [65] [handle_join_leave.cxx:188:handle_join_cluster_req] got join cluster req from leader 2114978300 [group=e3be8382-63f2-4edd-85ed-8851e8e65641]
    

For more context, refer to the related discussion: GitHub PR #136 - Discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions