Skip to content

Reconnect teacher may experience high virtual pipeline backpressure #22553

@artemananiev

Description

@artemananiev

This issue was found as a part of #22388, but it's a separate problem and should be tracked in a separate ticket.

During reconnect, shortly after synchronization is started, the teacher starts experiencing high virtual pipeline backpressure. If reconnect takes long, the teacher eventually starts switching between ACTIVE and CHECKING and may even fall behind the network and need to reconnect (as a learner).

This happens because the state used for teaching is not released till the very end of reconnect. It prevents virtual map copies from merging / flushing to disk and results in high virtual node cache size and backpressure. This is a regression from #21464. That fix was correct, but it revealed a problem that had existed for years: unreleased copies could be flushed to disk. With 21464 that problem was addressed, but it resulted in a different problem with reconnects.

Metadata

Metadata

Assignees

Labels

BugAn error that causes the feature to behave differently than what was expected based on design.P1High priority issue, which must be completed in the milestone otherwise the release is at risk.PlatformTickets pertaining to the platformPlatform ReconnectRegressionBehavior that used to work in a released product or service that no longer works with a new release.

Type

Projects

Status

👀 In Review

Relationships

None yet

Development

No branches or pull requests

Issue actions