-
Notifications
You must be signed in to change notification settings - Fork 391
Handoffs
engelsanchez edited this page Mar 1, 2013
·
4 revisions
- Handoff starts when riak core decides a vnode is on a node (responsible for data) that it shouldn't be. There are two types of handoff: ownership transfer and hinted handoff. ownership transfer occurs when a vnode is determined to no longer belong to the physical node it is running on (e.g. when a new node joins the cluster) and the vnode needs to be moved. hinted handoff occurs when a "secondary" vnode took the responsibility for a "primary" vnode but the primary vnode is reachable again. In this case, riak core determines that the secondary vnode should handoff data to the primary. These two cases are opaque to a riak core application
- There is a third type of transfer called "repair". This is only worth noting because it complicates the handoff implementation. The rest of these notes ignore repair. More about repair can be found in this commit: https://github.com/basho/riak_core/commit/036e409eb83903315dd43a37c7a93c9256863807
- A ring update event for a ring that all other nodes have already seen.
- A secondary vnode is idle for a period of time and the primary, original owner of the partition is up again.
- When riak core decides a vnode should perform handoff it calls
Mod:handoff_startingandMod:is_empty. This can occur even if a handoff is already in progress on that vnode - In the case a handoff is already started and a call to start a new one allows a second handoff to start, riak will prevent this https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L415-427
- When handoff starts, if
is_emptyreturns true the handoff will never be added to the handoff manager. instead the handoff is finished (callingMod:handoff_finishedandMod:delete) immediately. - Therefore, in order to have two handoffs start (and result in the scenario described above where riak core cancels the second), riak core must detect that a vnode must be moved at the same time that a handoff is in progress for that vnode and the vnode must return true of
handoff_startingand false foris_emptyin response to the second handoff message - If a handoff starts,
handoff_startingreturns true andis_emptyreturns false, the vnode "adds an outbound" to theriak_core_handoff_managerhttps://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L598-606. - This in turn starts a
riak_core_vnode_senderif it is determined that handoff can start (no other handoff running , max handoff concurrency not reached) https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L143-152 & https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L453-458 - When the sender starts, it sends the
?FOLD_REQ(a request that represents how to fold over the vnodes data to transfer during handoff) synchronously to the vnode. It seems, in the case of riak_kv for example that the the vnode doesn't necessarily have to respond (as to not block the vnode) but in our case we will not block the vnode and respond so this is somewhat inconsequential. https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L453-458 - The vnode receives
?FOLD_REQwhich allows it to fold over the data it holds to transfer (see this for more info: https://github.com/rzezeski/try-try-try/tree/master/2011/riak-core-the-vnode#handle_handoff_commandrequest-sender-state---result) - Each iteration of the fold calls
Mod:encode_handoff_itemwhich allows riak to shuttle the binary representation of the data over to the target vnode https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_sender.erl#L264 - After the synchronous command to the
?FOLD_REQrequest replies, the vnode is sent ahandoff_completeevent by the sender https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_sender.erl#L191 - When the vnode receives the
handoff_completeevent it sends thehandoff_completeevent to theriak_core_vnode_managerhttps://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L295-297 - When the
riak_core_vnode_managerreceives thehandoff_completeevent it sends thefinish_handoffevent to the vnode https://github.com/basho/riak_core/blob/master/src/riak_core_vnode_manager.erl#L410-413 - When the vnode receives
finish_handoffit callsMod:handoff_finishedhttps://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L415-424 - to complete the handoff after receiving
finish_handoffthe vnode callsMod:deleteand then unregisters itself so that it can no longer receive commands https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L391 - in the case the vnode has already been deleted when it receives
finish_handoffits basically a no-op https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L411-414 -
Handoff Error:
- the vnode handles the error in
handoff_cancelled. it is not made aware of what the error was . - how handoff error propagates:
- the
riak_core_handoff_sendercatches an error https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_sender.erl#L200-218 - in some cases, it sends the vnode a
handoff_errorevent, which will subsequently be forwarded on to theriak_core_vnode_managerhttps://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L299 & https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L652 - in the other cases, the sender dies with a non-
normalreason. the sender is monitored by theriak_core_handoff_managerwhich in turn will take the reason the sender died and pass it on to the vnode https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L255-282 the vnode then in turn takes that error and passes it on to theriak_core_vnode_managerlike the other case
- the
- once the
riak_core_vnode_managerreceives thehandoff_errorevent it sends thecancel_handoffevent to the vnode: https://github.com/basho/riak_core/blob/master/src/riak_core_vnode_manager.erl#L414-417 - when the vnode receives the
cancel_handoffevent it callsMod:handoff_cancelledand then returns to a non-handoff state https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L426-437
- the vnode handles the error in
- basically everything is operates normally for this vnode as if nothing is going on
- the only difference is periodically
Mod:handle_handoff_datais called with data being handed of from the source vnode. it is expected that the target vnode will do something with this data - there are a bunch of mechanics that make this go but they don't seem worth noting here right now. For more info start with
riak_core_handoff_receiver.