Skip to content

Out of control request cascading in pushsync in-neighborhood phase #5067

@lat-murmeldjur

Description

@lat-murmeldjur

Context & Summary

The recently merged PR #5037 introduced a serious issue in pushsync.

Before, the early store and return following "is in AOR" condition met prohibited in neighborhood nodes from doing splitting behaviour. However, the new "pushToClosest" call instead of returning immediately created an exponential cascading effect where every in neighborhood node creates 2x requests for each request received, leading to never terminating request cascades, ultimately leading to temporary disconnects within the neighborhood (caused by the continously timing out - hanging requests creating accounting issues as well)

Expected behavior & suggestions

The splitting behaviour should only be excercised by a non-AOR node who pushes towards an in-AOR node. When the downstream peer is in-AOR, the node should check that itself is not in-AOR before doing splitting.
Alternatively, when doing pushToClosest after in-AOR is determined and store() is done, an extra argument could make sure that splitting doesnt happen in this case (so only 1x request is done towards the actual closest node in-neighborhood). Furthermore in this case it is important to make sure that the actual closest node doesn't keep pushing to the 2nd-closest etc, creating a non-exponential request cascade, by making sure errwantself is not suppressed in this case at any point in the function call chain.

Actual behavior

Steps to reproduce

Possible solution

Metadata

Metadata

Assignees

Labels

needs-triagingnew issues that need triaging

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions