Skip to content

Rackawareness in 5dc setup causes significant disbalance of first region nodes load #1425

@vponomaryov

Description

@vponomaryov

Argus: scylla-staging/valerii/vp-rolling-upgrade-custom-d2-w1-latency-regression#11
NOTE: Not posting logs directly as it contains sensitive schema which must not be disclosed.

Test scenario details:

  • Setup: 5 DCs with 3 i3en.xlarge DB nodes in each. Total - 15 DB nodes.
  • First DC is special one from load point of view - 5 loaders in first DC and 1 in each other. Total - 9 loaders.
  • Number of racks - 3

Latte benchmarking tool version: 0.32.0-scylladb
Scylla rust driver version: 1.3.1

So, in this scenario was enabled rackawareness on loaders.
Having 3 nodes, each in its own rack in each DC and 5 loaders in first region I expected following load proportions: 2 2 1
But, in fact, actual proportions were closes to 1 0 0.

Screenshot from monitoring:

Image

The nodes from the first regions are 10.11.6.89 (node-1), 10.11.5.88 (node-2) and 10.11.7.67 (node-3).
As we can see on the screenshot, their load proportions are 12300 2 2

So, in fact, only first node among 3 in first region was used for load serving x3 requests.

Later test was failing because of the failing network services on this first node, but it is not the point of this bug report.

Latte code for handling DCs and racks is here: https://github.com/scylladb/latte/blob/a6701c7a8c5713d00cc29cfc9756a3c4afa01dda/src/scripting/connect.rs#L34-L77

Then, I disabled the rackawareness in other test runs.
And it instantly removed the problem.
Example test run: scylla-staging/valerii/vp-rolling-upgrade-custom-d2-w1-latency-regression#17
Screenshot:

Image

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions