LOGBROKER-7430 added unit test on reading with restarts#34487
Conversation
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
🟢 |
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
There was a problem hiding this comment.
Pull request overview
This PR targets LOGBROKER-7430 by adding a restart-heavy read test scenario and adjusting topic read-session internals to better handle decompression/cancellation and improve hang diagnostics during reconnects.
Changes:
- Add/extend a unit test flow that repeatedly restarts PQRB tablet while reading to validate recovery behavior.
- Update read-session internals around “not-ready tail” cleanup, self-check timing, and diagnostic logging.
- Improve test crash diagnostics by enabling YDB backtraces and installing terminate/signal handlers in the test setup.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| ydb/public/sdk/cpp/tests/integration/topic/utils/managed_executor.cpp | Ensure moved tasks are cleared to prevent re-running them. |
| ydb/public/sdk/cpp/src/client/topic/ut/ut_utils/ya.make | Add peer dependency needed by new test utilities. |
| ydb/public/sdk/cpp/src/client/topic/ut/ut_utils/topic_sdk_test_setup.cpp | Add backtrace/terminate/signal handling for better test crash output. |
| ydb/public/sdk/cpp/src/client/topic/ut/basic_usage_ut.cpp | Enhance restart-based read test loop and tablet restart behavior. |
| ydb/public/sdk/cpp/src/client/topic/impl/read_session_impl.ipp | Adjust event-queue tail deletion, self-check cadence, and expand warning diagnostics. |
| ydb/public/sdk/cpp/src/client/topic/impl/read_session_impl.h | Add small APIs for diagnostics/counters and tweak decompression event behavior. |
Comments suppressed due to low confidence (5)
ydb/public/sdk/cpp/src/client/topic/ut/basic_usage_ut.cpp:188
IReadSession::WaitEvent()returns a future, but the returned value is ignored here, so this does not actually wait for an event and can cause the loop to spin until the timeout assertion trips. UseReadSession->WaitEvent().Wait(...)(with a remaining-time/deadline) instead of ignoring the future and sleeping.
auto WaitTasks = [&, timeout = TInstant::Now() + TDuration::Seconds(60)](auto f, size_t c) {
while (f() < c) {
UNIT_ASSERT(timeout > TInstant::Now());
ReadSession->WaitEvent();
std::this_thread::sleep_for(100ms);
ydb/public/sdk/cpp/src/client/topic/ut/ut_utils/topic_sdk_test_setup.cpp:47
- This constructor installs global
std::terminateand process-wide signal handlers but never restores previous handlers. That can leak side effects across unrelated tests and is hard to debug in large test suites. Consider guarding this with astd::once_flagor storing/restoring previous handlers in a RAII helper; also prefersigactionand avoid non-async-signal-safe operations inside a signal handler.
NKikimr::EnableYDBBacktraceFormat();
std::set_terminate(&TerminateHandler);
for (auto sig : {SIGFPE, SIGILL, SIGSEGV}) {
signal(sig, &BackTraceSignalHandler);
ydb/public/sdk/cpp/src/client/topic/impl/read_session_impl.ipp:2224
- PR description/title say this change adds a unit test, but the PR also modifies production read-session logic (
read_session_impl.*) and test harness behavior (terminate/signal handlers). Please update the PR description to reflect the functional/runtime changes so reviewers and release notes consumers aren’t surprised.
ui64 notStartedTasks = 0;
for (const auto& decompressionItem : DecompressionQueue) {
if (const auto& stream = *decompressionItem.PartitionStream; handledPartitionStreams.emplace(stream.GetAssignId()).second) {
readyEventsCount += stream.GetReadyEventsCount();
amountEvents += stream.GetEventsCount();
notStartedTasks += decompressionItem.BatchInfo->GetNotStartedTasksCunt();
ydb/public/sdk/cpp/src/client/topic/impl/read_session_impl.ipp:2195
- The comments in
SelfCheck()still reference a 1-minute threshold, but the code now usesTDuration::Seconds(30). Please update the comment(s) to match the new 30-second window so future debugging isn’t misled.
if (delta < TDuration::Seconds(30)) {
// Session ok, we got at least one event from server since last 1 minute
return;
ydb/public/sdk/cpp/src/client/topic/impl/read_session_impl.ipp:2242
amountEventsis computed viaGetEventsCount()(total queue size: ready+not-ready and data+non-data), but the log label says "not ready data events". This makes the warning misleading during incident/debugging; either compute the metric that matches the label or rename the label to reflect what’s actually being reported.
<< ", pending decompression batches: " << DecompressionQueue.size()
<< ", not ready data events: " << amountEvents
<< ", alive partition streams: " << PartitionStreams.size()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
Changelog entry
Added unit test on reading with restarts
Changelog category
Description for reviewers
Bugfix ticket: https://st.yandex-team.ru/LOGBROKER-7430#6992f432cf614c2f332fbc7a