Skip to content

Mitigate potential overloads in Round Robin load balancing in the event of node failure #676

@havaker

Description

@havaker

In the default load balancing policy, round robin can lead to overloading of nodes in the event of a node failure. Under the usual round robin order, if a node such as A fails, the next node in the sequence (in this case, B) will take on all of A's requests, potentially causing it to become overloaded.

A potential solution to this issue is to shuffle chosen nodes in each load balancing plan's group, which would distribute the failed node's load more evenly among the remaining nodes. However, it should be noted that random shuffling is currently only implemented for replica choosing in the scylla::transport::load_balancing::DefaultPolicy. Shuffling all the nodes in the later stages of constructing a load balancing plan was considered, but deemed too costly, resulting in the use of round robin (#612 (comment)).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions