YQ-5091 fixed PQ partitions balancer idle / reconnects#34486
Conversation
|
🟢 |
There was a problem hiding this comment.
Pull request overview
This PR fixes issues in the PQ (PersQueue) partitions balancer related to idle timeout handling, reconnection hanging, and counter naming. The changes include a significant refactoring of the composite read session implementation to better manage partition states and improve observability through enhanced metrics.
Changes:
- Refactored partition state management in composite read session with separate tracking of suspended, pending, ready, and idle partitions
- Added sequence number tracking to counter updates to prevent out-of-order message processing
- Introduced new signal utilities for coordinating asynchronous operations between actors and read sessions
- Fixed counter naming conventions and added more granular metrics for debugging
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| yql_pq_composite_read_session.h | Added Cluster field to settings and standardized NMonitoring namespace usage |
| yql_pq_composite_read_session.cpp | Major refactoring of balancer actor and read session with improved partition state management, metrics, and logging |
| ya.make | Added signals library dependency |
| dq_pq_read_actor.cpp | Added cluster-specific counters and wakeup scheduling for hanging detection |
| dq_events.proto | Added SeqNo field for ordering counter updates |
| dq_info_aggregation_actor.cpp | Implemented sequence number handling and sender cleanup on actor termination |
| signal_utils.h/cpp | New utility classes for counter management and future signaling |
| kqp_federated_query_helpers.cpp | Increased max queued requests to prevent request throttling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/dq/actors/compute/dq_info_aggregation_actor.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
…composite_read_session.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…composite_read_session.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…composite_read_session.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…composite_read_session.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…composite_read_session.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…composite_read_session.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
| void DoUpdate(i64 newValue, std::optional<i64> oldValue, const TActorId& sender) final { | ||
| if (oldValue) { | ||
| OrderedValues.erase({*oldValue, sender}); | ||
| Y_VALIDATE(OrderedValues.erase({*oldValue, sender}), "Unexpected OrderedValues"); |
There was a problem hiding this comment.
(Точно не на-сейчас):
В таких местах можно избегать реаллокации с set::extract -> rec.value().= -> set::insert(move) (ну, и если это вызывается часто -- запланировать когда-нибудь heap вместо set; хип, впрочем, нужен самописный, стандартный не умеет update)
There was a problem hiding this comment.
Окей, пока оставлю как есть
ydb/library/yql/dq/actors/compute/dq_info_aggregation_actor.cpp
Outdated
Show resolved
Hide resolved
ydb/library/yql/providers/pq/gateway/clients/composite/yql_pq_composite_read_session.cpp
Outdated
Show resolved
Hide resolved
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
Changelog entry
Fixed PQ partitions balancer idle / reconnects
Changelog category
Description for reviewers
Fixed Idle setting handling, fixed hanging on reconnect, fixed counters names
Bugfix ticket: https://st.yandex-team.ru/YQ-5091