*: support new columnar data source#10842
Conversation
Signed-off-by: yongman <yming0221@gmail.com>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an opt-in columnar disaggregated read path: new CMake option and submodule, parses and propagates use_columnar through config and startup, adjusts proxy config/start logic, routes StorageDisaggregated reads to proxy-backed columnar code, declares RNProxy types, implements proxy read tasks/streams/operators with error mapping and retry, and provides disabled-build fallbacks. ChangesColumnar Disaggregated Storage Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
dbms/src/Storages/KVStore/ProxyStateMachine.h (1)
132-132: ⚖️ Poor tradeoffConsider wrapping
init_onlylogic in conditional compilation.The
init_onlyvariable (line 135) and its assignment logic (lines 139-151) are only used within the#if SERVERLESS_PROXY == 1block (lines 205-206). WhenSERVERLESS_PROXYis not defined, this variable and the associated logic are unused. Consider wrapping the entireinit_onlyvariable declaration and assignment in#if SERVERLESS_PROXY == 1for cleaner conditional compilation.♻️ Suggested refactor
bool tryParseFromConfig( const Poco::Util::LayeredConfiguration & config, const DisaggregatedMode disaggregated_mode, const bool use_autoscaler, const bool use_columnar, const LoggerPtr & log) { - bool init_only = false; // tiflash_compute doesn't need proxy except when using columnar. if (disaggregated_mode == DisaggregatedMode::Compute && use_autoscaler) { if (use_columnar) { +#if SERVERLESS_PROXY == 1 + bool init_only = true; LOG_INFO( log, "TiFlash Proxy will start because columnar is enabled with AutoScale Disaggregated Compute Mode " "specified."); - init_only = true; +#else + LOG_INFO(log, "TiFlash Proxy will not start because SERVERLESS_PROXY is not enabled."); + return false; +#endif }Also applies to: 135-151
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/KVStore/ProxyStateMachine.h` at line 132, The variable init_only and its assignment logic are only used under the SERVERLESS_PROXY feature flag; to avoid unused-code when SERVERLESS_PROXY != 1, move the declaration and assignment of init_only inside a `#if` SERVERLESS_PROXY == 1 / `#endif` block. Locate the init_only declaration and the block that assigns it (inside the ProxyStateMachine constructor or where init_only is defined) and wrap both the declaration and the subsequent assignment logic in the conditional compilation so the symbol is only compiled when SERVERLESS_PROXY is enabled.dbms/src/Storages/StorageDisaggregatedColumnar.h (1)
234-234: 💤 Low valueRemove or document the commented-out field.
The commented line
//double duration_wait_ready_task_sec = 0;should either be removed if not needed or uncommented with documentation if planned for future use.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.h` at line 234, Remove or properly document the commented-out field `duration_wait_ready_task_sec` in StorageDisaggregatedColumnar: either delete the line `//double duration_wait_ready_task_sec = 0;` if it's unused, or uncomment it and add a brief comment explaining its purpose and units (e.g., "time spent waiting for ready tasks in seconds") and why it's kept for future use so its intent is clear to readers and static analysis.dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)
221-224: 💤 Low valueConsider improving readability of the boolean parameter.
The
falseparameter inconvertTimeZoneByOffsetlacks context. If the API doesn't support named parameters, consider adding a brief comment explaining what the boolean controls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 221 - 224, The anonymous boolean literal `false` passed to convertTimeZoneByOffset reduces readability; update the call site where timezone_info.timezone_offset is handled to either replace `false` with a clearly named constant/enum (e.g., use a constant like IGNORE_DST or DONT_APPLY_DST) or add an inline comment describing the flag's meaning (for example: /* apply_dst = false */) so future readers immediately understand what that parameter controls; locate the call to convertTimeZoneByOffset near convertTimeZone and timezone_info.timezone_offset in StorageDisaggregatedColumnar.cpp to make the change.
566-566: ⚡ Quick winUse fmt-style Exception constructor.
These exceptions use the old-style constructor. As per coding guidelines, prefer the fmt-style constructor with error code first:
throw Exception(ErrorCodes::SOME_CODE, "Message").♻️ Suggested fix
- throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");- throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error");- throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "unknown error type");As per coding guidelines: "Prefer the fmt-style constructor for
DB::Exceptionwith error code first:throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);"Also applies to: 571-571, 576-576
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 566, Replace old-style DB::Exception construction with the fmt-style (error code first) for the thrown exceptions in StorageDisaggregatedColumnar.cpp: change the calls that currently use throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) (and the other two similar occurrences) to use throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") so the error code is the first argument; ensure you update each instance that uses the old-ordered arguments to the new fmt-style Exception(ErrorCodes::..., "message") form.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmake/find_tiflash_proxy.cmake`:
- Around line 63-64: Replace the invalid CMake message mode "FATAL" with the
correct "FATAL_ERROR" in the conditional that checks for the next-gen
tiflash-proxy; specifically update the message(...) call inside the if
(ENABLE_NEXT_GEN AND NOT ENABLE_COLUMNAR_DISAGG AND NOT EXISTS
".../contrib/tiflash-proxy-next-gen/.../cloud_helper.rs") branch so it uses
FATAL_ERROR (matching other occurrences like the one near line 58 and
find_tipb.cmake) to ensure configuration stops when the file is missing.
---
Nitpick comments:
In `@dbms/src/Storages/KVStore/ProxyStateMachine.h`:
- Line 132: The variable init_only and its assignment logic are only used under
the SERVERLESS_PROXY feature flag; to avoid unused-code when SERVERLESS_PROXY !=
1, move the declaration and assignment of init_only inside a `#if`
SERVERLESS_PROXY == 1 / `#endif` block. Locate the init_only declaration and the
block that assigns it (inside the ProxyStateMachine constructor or where
init_only is defined) and wrap both the declaration and the subsequent
assignment logic in the conditional compilation so the symbol is only compiled
when SERVERLESS_PROXY is enabled.
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 221-224: The anonymous boolean literal `false` passed to
convertTimeZoneByOffset reduces readability; update the call site where
timezone_info.timezone_offset is handled to either replace `false` with a
clearly named constant/enum (e.g., use a constant like IGNORE_DST or
DONT_APPLY_DST) or add an inline comment describing the flag's meaning (for
example: /* apply_dst = false */) so future readers immediately understand what
that parameter controls; locate the call to convertTimeZoneByOffset near
convertTimeZone and timezone_info.timezone_offset in
StorageDisaggregatedColumnar.cpp to make the change.
- Line 566: Replace old-style DB::Exception construction with the fmt-style
(error code first) for the thrown exceptions in
StorageDisaggregatedColumnar.cpp: change the calls that currently use throw
Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) (and the other two
similar occurrences) to use throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
"lock error") so the error code is the first argument; ensure you update each
instance that uses the old-ordered arguments to the new fmt-style
Exception(ErrorCodes::..., "message") form.
In `@dbms/src/Storages/StorageDisaggregatedColumnar.h`:
- Line 234: Remove or properly document the commented-out field
`duration_wait_ready_task_sec` in StorageDisaggregatedColumnar: either delete
the line `//double duration_wait_ready_task_sec = 0;` if it's unused, or
uncomment it and add a brief comment explaining its purpose and units (e.g.,
"time spent waiting for ready tasks in seconds") and why it's kept for future
use so its intent is clear to readers and static analysis.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0082d8d8-0032-40bb-9aac-04d5ea134b44
📒 Files selected for processing (25)
.gitmodulesCMakeLists.txtcmake/find_tiflash_proxy.cmakecontrib/tiflash-proxy-cmake/CMakeLists.txtcontrib/tiflash-proxy-columnardbms/CMakeLists.txtdbms/src/Common/ErrorCodes.cppdbms/src/Common/config.h.indbms/src/Core/TiFlashDisaggregatedMode.cppdbms/src/Core/TiFlashDisaggregatedMode.hdbms/src/DataStreams/AddExtraTableIDColumnTransformAction.cppdbms/src/DataStreams/AddExtraTableIDColumnTransformAction.hdbms/src/DataTypes/DataTypeDecimal.cppdbms/src/Interpreters/Context.cppdbms/src/Interpreters/SharedContexts/Disagg.hdbms/src/Server/Server.cppdbms/src/Storages/KVStore/KVStore.cppdbms/src/Storages/KVStore/KVStore.hdbms/src/Storages/KVStore/ProxyStateMachine.hdbms/src/Storages/KVStore/TMTContext.cppdbms/src/Storages/StorageDisaggregated.cppdbms/src/Storages/StorageDisaggregated.hdbms/src/Storages/StorageDisaggregatedColumnar.cppdbms/src/Storages/StorageDisaggregatedColumnar.hdbms/src/Storages/StorageDisaggregatedRemote.cpp
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)
821-824: ⚡ Quick winLog the swallowed destructor exception.
Keeping the destructor non-throwing is right, but silently discarding the failure makes accounting/logging issues invisible.
tryLogCurrentExceptionkeeps the behavior and preserves the signal.As per coding guidelines, "In broad `catch (...)` paths, prefer `tryLogCurrentException(log, "context")` to avoid duplicated exception-formatting code"🪵 Suggested change
catch (...) { - // Destructors must not throw. + tryLogCurrentException(log, "Failed to finalize proxy read accounting"); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 821 - 824, The destructor currently swallows all exceptions with a bare catch(...) in StorageDisaggregatedColumnar (or the related destructor shown) — preserve the non-throwing behavior but log the exception by calling tryLogCurrentException(log, "StorageDisaggregatedColumnar::~StorageDisaggregatedColumnar") (or an appropriate context string) inside the catch block; ensure you reference the existing logger variable used in this translation unit and do not rethrow so destructors remain noexcept.
555-577: ⚡ Quick winUse the error-code-first
DB::Exceptionconstructor here.These new throws use the legacy
(message, code)overload. Please switch them to the repo-standard(code, fmt, ...)form for consistency.As per coding guidelines, "Prefer the fmt-style constructor for `DB::Exception` with error code first: `throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);`"🔧 Suggested cleanup
- throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error"); @@ - throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error"); @@ - throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception( + ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, + "unknown error type"); @@ - throw Exception("read_block failed in tiflash-proxy", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in tiflash-proxy");Also applies to: 849-851
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 555 - 577, Replace the legacy DB::Exception constructor calls that pass (message, code) with the fmt-style, error-code-first form; specifically, in the branches handling ColumnarReaderErrorType::LockedError, ::PdClientError and the default non-OK branch (where current throws are throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) and throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR)), change them to throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") / throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error") / throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "unknown error type, error_type {}", uint8_t(columnar_reader.error_type)) respectively; apply the same pattern to the analogous throws near the other occurrence that uses the old overload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 241-277: In StorageDisaggregated::readThroughProxy, guard against
empty remote regions / no proxy tasks and empty pipeline before doing any sizing
or header work: after calling buildRemoteTableRanges(), if remote_table_ranges
is empty (or region_num==0) return an empty BlockInputStreams immediately; after
building read_proxy_tasks and populating pipeline.streams, if
pipeline.streams.empty() return early as well before accessing
pipeline.firstStream() or creating analyzer; ensure these early returns happen
before any divisions (regions_per_reader) or calls that assume a non-empty
header (analyzer, extraCast, filterConditionsWithPushedDownFilters) so
downstream code won’t dereference empty streams or divide by zero.
- Around line 936-1003: In RNProxySourceOp::readImpl, RNProxySourceOp::awaitImpl
and RNProxySourceOp::executeIOImpl replace all occurrences of the function-like
macros written without parentheses (e.g. "if likely (cond)" or "if unlikely
(cond)") with the canonical form that calls the macro around the condition (e.g.
"if (likely(cond))" / "if (unlikely(cond))"); ensure every check such as done,
t_block.has_value(), current_reader_idx < 0, and block && block.rows() uses the
corrected syntax so the macros expand to __builtin_expect properly.
---
Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 821-824: The destructor currently swallows all exceptions with a
bare catch(...) in StorageDisaggregatedColumnar (or the related destructor
shown) — preserve the non-throwing behavior but log the exception by calling
tryLogCurrentException(log,
"StorageDisaggregatedColumnar::~StorageDisaggregatedColumnar") (or an
appropriate context string) inside the catch block; ensure you reference the
existing logger variable used in this translation unit and do not rethrow so
destructors remain noexcept.
- Around line 555-577: Replace the legacy DB::Exception constructor calls that
pass (message, code) with the fmt-style, error-code-first form; specifically, in
the branches handling ColumnarReaderErrorType::LockedError, ::PdClientError and
the default non-OK branch (where current throws are throw Exception("lock
error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR), throw Exception("pd client error",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) and throw Exception("unknown error type",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR)), change them to throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") / throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error") / throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "unknown error type, error_type
{}", uint8_t(columnar_reader.error_type)) respectively; apply the same pattern
to the analogous throws near the other occurrence that uses the old overload.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: db0c5d76-3eaf-4ef6-8e24-116bf2b71c3a
📒 Files selected for processing (2)
dbms/src/Storages/KVStore/ProxyStateMachine.hdbms/src/Storages/StorageDisaggregatedColumnar.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- dbms/src/Storages/KVStore/ProxyStateMachine.h
| BlockInputStreams StorageDisaggregated::readThroughProxy(const Context & context, unsigned num_streams) | ||
| { | ||
| DAGPipeline pipeline; | ||
| const UInt64 start_ts = sender_target_mpp_task_id.gather_id.query_id.start_ts; | ||
| auto [remote_table_ranges, region_num] = buildRemoteTableRanges(); | ||
| const auto generated_column_infos = genGeneratedColumnInfosForDisaggregatedRead(table_scan); | ||
| auto read_proxy_tasks = RNProxyReadTask::buildProxyReadTaskWithBackoff( | ||
| log, | ||
| context, | ||
| start_ts, | ||
| table_scan, | ||
| filter_conditions, | ||
| remote_table_ranges, | ||
| num_streams); | ||
| for (auto & task : read_proxy_tasks) | ||
| { | ||
| auto streams = task->getInputStreams(); | ||
| pipeline.streams.insert(pipeline.streams.end(), streams.begin(), streams.end()); | ||
| } | ||
| // Avoid reading generated columns from proxy, generate placeholders locally. | ||
| executeGeneratedColumnPlaceholder(generated_column_infos, log, pipeline); | ||
| NamesAndTypes source_columns; | ||
| source_columns.reserve(table_scan.getColumnSize()); | ||
| const auto & stream_header = pipeline.firstStream()->getHeader(); | ||
| for (const auto & col : stream_header) | ||
| { | ||
| source_columns.emplace_back(col.name, col.type); | ||
| } | ||
| analyzer = std::make_unique<DAGExpressionAnalyzer>(std::move(source_columns), context); | ||
|
|
||
| // Handle duration/timestamp cast for proxy path. | ||
| // We still execute pushed-down filters on RN side, so timestamp columns in those filters | ||
| // must also be converted from UTC to session timezone. | ||
| extraCast(*analyzer, pipeline, /*include_pushed_down_filter_columns=*/true); | ||
| // Handle filter | ||
| filterConditionsWithPushedDownFilters(*analyzer, pipeline); | ||
| return pipeline.streams; |
There was a problem hiding this comment.
Handle empty scans before sizing readers and building headers.
If region splitting returns no remote regions, Line 705 makes real_num_streams == 0 and Line 708 divides by zero. Even after guarding that, Line 264 and Line 314 still assume a non-empty task/header. Please return early for the empty-scan case before computing regions_per_reader and before creating the analyzer.
💡 Suggested guard rails
BlockInputStreams StorageDisaggregated::readThroughProxy(const Context & context, unsigned num_streams)
{
DAGPipeline pipeline;
@@
auto read_proxy_tasks = RNProxyReadTask::buildProxyReadTaskWithBackoff(
log,
context,
start_ts,
table_scan,
filter_conditions,
remote_table_ranges,
num_streams);
+ if (read_proxy_tasks.empty())
+ return {};
+
for (auto & task : read_proxy_tasks)
{
auto streams = task->getInputStreams();
pipeline.streams.insert(pipeline.streams.end(), streams.begin(), streams.end());
}
@@
void StorageDisaggregated::readThroughProxy(
PipelineExecutorContext & exec_context,
PipelineExecGroupBuilder & group_builder,
const Context & context,
unsigned num_streams)
{
@@
auto read_proxy_tasks = RNProxyReadTask::buildProxyReadTaskWithBackoff(
log,
context,
start_ts,
table_scan,
filter_conditions,
remote_table_ranges,
num_streams);
+ if (read_proxy_tasks.empty())
+ return;
+
const auto generated_column_infos = genGeneratedColumnInfosForDisaggregatedRead(table_scan);
@@
unsigned region_num = all_remote_regions_by_region.size();
unsigned physical_table_num = physical_table_ids.size();
+ if (region_num == 0 || num_streams == 0)
+ return tasks;
+
unsigned real_num_streams = std::min(num_streams, region_num);Also applies to: 281-324, 703-708
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 241 - 277,
In StorageDisaggregated::readThroughProxy, guard against empty remote regions /
no proxy tasks and empty pipeline before doing any sizing or header work: after
calling buildRemoteTableRanges(), if remote_table_ranges is empty (or
region_num==0) return an empty BlockInputStreams immediately; after building
read_proxy_tasks and populating pipeline.streams, if pipeline.streams.empty()
return early as well before accessing pipeline.firstStream() or creating
analyzer; ensure these early returns happen before any divisions
(regions_per_reader) or calls that assume a non-empty header (analyzer,
extraCast, filterConditionsWithPushedDownFilters) so downstream code won’t
dereference empty streams or divide by zero.
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
dbms/src/Storages/KVStore/ProxyStateMachine.h (2)
100-110: 💤 Low value
args.reserveis now under-sized.With the new key-only emission for empty values, each entry can push up to two elements (key + value). The reservation should account for that to avoid reallocations:
- args.reserve(val_map.size() + 1); + args.reserve(val_map.size() * 2 + 1);Also worth noting: this changes the semantics of any existing
flash.proxy.*entry whose value happens to be empty — previously such entries emitted an empty-string argument, now the value is dropped. If that's intended (likely yes), consider a short comment to that effect.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/KVStore/ProxyStateMachine.h` around lines 100 - 110, The args.reserve in the function building the argv vector (in ProxyStateMachine.h) is too small: because each map entry may push both key and value (or only key when value is empty), reserve should account for up to two elements per entry plus the initial program name—e.g. reserve(val_map.size() * 2 + 1). Update the reserve call accordingly and add a short comment near this loop explaining that empty values intentionally emit only the key (dropping empty-string values) to document the semantics change.
135-136: 💤 Low valuePrefer
[[maybe_unused]]overUNUSED(...).A
[[maybe_unused]]attribute on the declaration is more idiomatic than a separateUNUSED(...)macro call, and it goes away naturally if the variable is also conditionally declared (see the related comment on guarding theuse_columnarbranch).- bool init_only = false; - UNUSED(init_only); + [[maybe_unused]] bool init_only = false;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/KVStore/ProxyStateMachine.h` around lines 135 - 136, Replace the UNUSED(init_only) macro usage by marking the local boolean declaration itself as possibly unused: change the variable declaration of init_only in ProxyStateMachine (the init_only local used in the surrounding init logic) to use the C++ attribute [[maybe_unused]] so the compiler will not warn when the variable is unused (remove the separate UNUSED(init_only) line and annotate the declaration instead).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dbms/src/Storages/KVStore/ProxyStateMachine.h`:
- Around line 135-155: The use_columnar branch must be guarded by the build
macro to avoid starting the proxy without --init-only on non-columnar builds:
wrap the inner if (use_columnar) branch in `#if` ENABLE_NEXT_GEN_COLUMNAR ...
`#else` emit a LOG_ERROR/LOG_INFO explaining that columnar was requested but the
binary was built without ENABLE_NEXT_GEN_COLUMNAR and return false (so
DisaggregatedMode::Compute + use_autoscaler + use_columnar cannot proceed), and
keep the `#endif` so that init_only and the later tryParseFromConfig behavior
remain consistent with columnar-enabled builds; reference symbols: use_columnar,
init_only, DisaggregatedMode::Compute, use_autoscaler, tryParseFromConfig,
ENABLE_NEXT_GEN_COLUMNAR.
---
Nitpick comments:
In `@dbms/src/Storages/KVStore/ProxyStateMachine.h`:
- Around line 100-110: The args.reserve in the function building the argv vector
(in ProxyStateMachine.h) is too small: because each map entry may push both key
and value (or only key when value is empty), reserve should account for up to
two elements per entry plus the initial program name—e.g. reserve(val_map.size()
* 2 + 1). Update the reserve call accordingly and add a short comment near this
loop explaining that empty values intentionally emit only the key (dropping
empty-string values) to document the semantics change.
- Around line 135-136: Replace the UNUSED(init_only) macro usage by marking the
local boolean declaration itself as possibly unused: change the variable
declaration of init_only in ProxyStateMachine (the init_only local used in the
surrounding init logic) to use the C++ attribute [[maybe_unused]] so the
compiler will not warn when the variable is unused (remove the separate
UNUSED(init_only) line and annotate the declaration instead).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d4657d45-5eb5-4763-8001-724e350b8ed7
📒 Files selected for processing (7)
CMakeLists.txtcmake/find_tiflash_proxy.cmakecontrib/tiflash-proxy-cmake/CMakeLists.txtdbms/CMakeLists.txtdbms/src/Common/config.h.indbms/src/Storages/KVStore/ProxyStateMachine.hdbms/src/Storages/StorageDisaggregatedRemote.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- dbms/src/Storages/StorageDisaggregatedRemote.cpp
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.gitmodules:
- Line 41: Replace the SSH-style submodule URLs with HTTPS so CI and
contributors without SSH keys can clone them; locate the submodule entries that
currently use "git@github.com:tidbcloud/cloud-storage-engine.git" (and the
similar SSH URL referenced a second time) and update them to the HTTPS form
"https://github.com/tidbcloud/cloud-storage-engine.git" inside .gitmodules,
ensuring both occurrences are changed and then run git submodule sync/update to
apply the change.
- Around line 39-41: The .gitmodules entry for submodule
"contrib/tiflash-proxy-next-gen" lacks a branch specification, causing
inconsistent update behavior with the other submodule; add a line "branch =
tiflash-proxy-7.0" to the "contrib/tiflash-proxy-next-gen" submodule block
(matching the branch used by "tiflash-proxy-columnar") so both submodules point
to the same branch when running git submodule update --remote.
In `@dbms/src/Common/TiFlashBuildInfo.cpp`:
- Around line 150-152: In the vector initializer in TiFlashBuildInfo (within the
block guarded by ENABLE_NEXT_GEN_COLUMNAR), add a trailing comma after the
"columnar" string literal so that when ENABLE_NEXT_GEN_COLUMNAR and ENABLE_CLARA
are both defined the preprocessor output remains valid; update the entry guarded
by ENABLE_NEXT_GEN_COLUMNAR (the "columnar" literal) to include a trailing comma
to match the other feature entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: de829508-355d-4471-be4e-f87a177dfdc3
📒 Files selected for processing (5)
.gitmodulescontrib/tiflash-proxy-columnardbms/src/Common/TiFlashBuildInfo.cppdbms/src/Storages/DeltaMerge/DMContext.cppdbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
💤 Files with no reviewable changes (2)
- dbms/src/Storages/DeltaMerge/DMContext.cpp
- dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- contrib/tiflash-proxy-columnar
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)
241-278:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEmpty-scan / zero-region path still crashes or divides by zero — past comment not yet addressed.
Several callsites assume the proxy read produced at least one stream/concurrency and at least one region:
- Line 264:
pipeline.firstStream()->getHeader()will deref a nullIBlockInputStreamPtrifread_proxy_taskscame back empty (no remote regions, or all filtered out).- Line 314:
group_builder.getCurrentHeader()has the same issue when noRNProxySourceOpwas added by the loop above.- Lines 705 / 708: if
region_num == 0(ornum_streams == 0),real_num_streams == 0and(region_num + real_num_streams - 1) / real_num_streamsdivides by zero.Please add early-returns / guards as proposed previously (return empty
BlockInputStreams/ return; before sizing and before building the analyzer; returntasksearly before computingregions_per_reader).Also applies to: 281-324, 703-708
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 241 - 278, The readThroughProxy path doesn't guard against empty proxy reads or zero regions/streams and dereferences pipeline.firstStream() and group_builder.getCurrentHeader() and computes regions_per_reader with possible division by zero; update StorageDisaggregated::readThroughProxy to early-return an empty BlockInputStreams when read_proxy_tasks is empty or pipeline.streams is empty (before accessing pipeline.firstStream() and before building DAGExpressionAnalyzer), and add guards to return early when region_num == 0 or num_streams == 0 (or compute real_num_streams == 0) before any division or sizing logic that uses (region_num + real_num_streams - 1) / real_num_streams; ensure the same defensive checks are applied around the places that call group_builder.getCurrentHeader() so no null stream is dereferenced.
985-985:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winLine 985 still has the broken
if likely (...)syntax — the past sweep missed this site.
likely/unlikelyare function-like macros (__builtin_expect(!!(x), N)); the macro must wrap the condition. With the space, the expansion becomesif __builtin_expect(!!(block && block.rows() > 0), 1), which is not a validifstatement. Cppcheck still reports a syntax error at this line, and this file would fail to compile underENABLE_NEXT_GEN_COLUMNAR. The earlier fix cleaned up lines 938/956/961/971/976 but left this one.🛠️ Proposed fix
- if likely (block && block.rows() > 0) + if (likely(block && block.rows() > 0))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 985, The if statement uses a spaced macro invocation ("if likely (block && block.rows() > 0)") which prevents the function-like likely(...) macro from expanding and breaks compilation; change it to use the macro invocation form, e.g. if (likely(block && block.rows() > 0)) (or if likely(block && block.rows() > 0)) so the macro name is immediately followed by '('; update the occurrence that checks block && block.rows() > 0 in StorageDisaggregatedColumnar (the statement referencing variable block) to this corrected form.
🧹 Nitpick comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)
856-906: ⚡ Quick winDeduplicate the per-column read path and hoist
fn_physical_table_idout of the loop.The
extra_handleand regular-column branches (lines 871–885 and 890–905) differ only in which Rust FFI fetches the column buffer. The 9-line deserialize+SCOPE_EXIT block is copy-pasted, andfn_physical_table_id(reader)is invoked once per column even though it only depends on the reader. Pull the physical-table-id call out of the loop and factor the deserialize into a small lambda/helper to keep the read path readable.♻️ Sketch
TableID physical_table_id = -1; Block header = getHeader(); const ColumnsWithTypeAndName col_type_and_name = header.getColumnsWithTypeAndName(); MutableColumns columns = header.cloneEmptyColumns(); + physical_table_id = proxy_helper->cloud_storage_engine_interfaces.fn_physical_table_id(reader); + auto deserialize_from = [&](size_t i, const RustStrWithView & col_data) { + ReadBufferFromMemory buf(col_data.buff.data, static_cast<size_t>(col_data.buff.len)); + col_type_and_name[i].type->deserializeBinaryBulkWithMultipleStreams( + *columns[i], + [&](const IDataType::SubstreamPath &) { return &buf; }, + rows, + /*avg_value_size_hint=*/-1.0, // -1 => Decimal format from proxy + true, + {}); + }; for (UInt32 i = 0; i < col_type_and_name.size(); ++i) { Int64 col_id = col_type_and_name[i].column_id; if (col_id == MutSup::extra_table_id_col_id) continue; - if (col_id == MutSup::extra_handle_id) { ... } else { ... } + RustStrWithView col_data = (col_id == MutSup::extra_handle_id) + ? proxy_helper->cloud_storage_engine_interfaces.fn_read_handle(reader) + : proxy_helper->cloud_storage_engine_interfaces.fn_read_column(reader, col_id); + SCOPE_EXIT({ RustGcHelper::instance().gcRustPtr(col_data.inner.ptr, col_data.inner.type); }); + deserialize_from(i, col_data); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 856 - 906, Call proxy_helper->cloud_storage_engine_interfaces.fn_physical_table_id(reader) once before the loop and store it in physical_table_id, and factor the repeated deserialize+SCOPE_EXIT logic into a small helper/lambda (e.g., a local lambda that accepts RustStrWithView or ReadBufferFromMemory& and the column reference) so both the fn_read_handle and fn_read_column branches only differ by which fetch function they call; ensure the lambda runs the RustGcHelper::instance().gcRustPtr cleanup and then invokes col_type_and_name[i].type->deserializeBinaryBulkWithMultipleStreams(...) on the target column (columns[i]) with the same parameters used now, leaving the special-case skip for MutSup::extra_table_id_col_id unchanged.
567-577: ⚡ Quick winUse the fmt-style
Exceptionconstructor with the error code first.The coding guidelines require
throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);. Lines 567, 572, 577, and 851 still use the legacy(message, code)form.🛠️ Proposed fix
- throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error"); @@ - throw Exception("pd client error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "pd client error"); @@ - throw Exception("unknown error type", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception( + ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, + "unknown error type, error_type={}", + static_cast<uint8_t>(columnar_reader.error_type)); @@ - throw Exception("read_block failed in tiflash-proxy", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in tiflash-proxy");As per coding guidelines: "Prefer the fmt-style constructor for
DB::Exceptionwith error code first:throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);".Also applies to: 851-851
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 567 - 577, The throws in StorageDisaggregatedColumnar (where columnar_reader.error_type is checked) use the legacy Exception(message, code) signature; update each to the fmt-style Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "message"[, args...]) form—i.e. for the lock, pd client, and unknown error branches (references: columnar_reader.error_type, ColumnarReaderErrorType, LOG_WARNING) replace throw Exception("...", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) with throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "create columnar reader failed, pd client error") or the corresponding descriptive message and include the uint8_t(columnar_reader.error_type) as a fmt argument for the unknown case; apply the same pattern to the other occurrence noted (error code usage at the later throw).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 494-553: Check and handle the return value of
region_error.ParseFromString(error_msg) (and similarly
lock_info.ParseFromString) and if parsing fails, log a parse error and treat it
as an internal/different error path instead of proceeding with a
default-constructed errorpb::Error; when iterating
region_error.epoch_not_match().current_regions() collect all region ids into
unavailable_regions and build a joined region_id_ver string (use FmtBuffer for
efficient concatenation) rather than overwriting region_id_ver each loop, and
either remove the unused retry_regions variables (region_error branch and the
other branch) or actually use them where intended; keep existing
dropRegion(region_ver_id), lock_guard(output_lock), and throw RegionException
with the aggregated region_id_ver (or a deterministic representative) and
correct RegionReadStatus after parsing succeeds.
---
Duplicate comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 241-278: The readThroughProxy path doesn't guard against empty
proxy reads or zero regions/streams and dereferences pipeline.firstStream() and
group_builder.getCurrentHeader() and computes regions_per_reader with possible
division by zero; update StorageDisaggregated::readThroughProxy to early-return
an empty BlockInputStreams when read_proxy_tasks is empty or pipeline.streams is
empty (before accessing pipeline.firstStream() and before building
DAGExpressionAnalyzer), and add guards to return early when region_num == 0 or
num_streams == 0 (or compute real_num_streams == 0) before any division or
sizing logic that uses (region_num + real_num_streams - 1) / real_num_streams;
ensure the same defensive checks are applied around the places that call
group_builder.getCurrentHeader() so no null stream is dereferenced.
- Line 985: The if statement uses a spaced macro invocation ("if likely (block
&& block.rows() > 0)") which prevents the function-like likely(...) macro from
expanding and breaks compilation; change it to use the macro invocation form,
e.g. if (likely(block && block.rows() > 0)) (or if likely(block && block.rows()
> 0)) so the macro name is immediately followed by '('; update the occurrence
that checks block && block.rows() > 0 in StorageDisaggregatedColumnar (the
statement referencing variable block) to this corrected form.
---
Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 856-906: Call
proxy_helper->cloud_storage_engine_interfaces.fn_physical_table_id(reader) once
before the loop and store it in physical_table_id, and factor the repeated
deserialize+SCOPE_EXIT logic into a small helper/lambda (e.g., a local lambda
that accepts RustStrWithView or ReadBufferFromMemory& and the column reference)
so both the fn_read_handle and fn_read_column branches only differ by which
fetch function they call; ensure the lambda runs the
RustGcHelper::instance().gcRustPtr cleanup and then invokes
col_type_and_name[i].type->deserializeBinaryBulkWithMultipleStreams(...) on the
target column (columns[i]) with the same parameters used now, leaving the
special-case skip for MutSup::extra_table_id_col_id unchanged.
- Around line 567-577: The throws in StorageDisaggregatedColumnar (where
columnar_reader.error_type is checked) use the legacy Exception(message, code)
signature; update each to the fmt-style
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "message"[, args...]) form—i.e.
for the lock, pd client, and unknown error branches (references:
columnar_reader.error_type, ColumnarReaderErrorType, LOG_WARNING) replace throw
Exception("...", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) with throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "create columnar reader failed,
pd client error") or the corresponding descriptive message and include the
uint8_t(columnar_reader.error_type) as a fmt argument for the unknown case;
apply the same pattern to the other occurrence noted (error code usage at the
later throw).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d1f9ec7e-99ee-41cb-a6a3-bb1d4ef932cb
📒 Files selected for processing (8)
.gitmodulescontrib/tiflash-proxy-columnardbms/src/Common/TiFlashBuildInfo.cppdbms/src/Server/Server.cppdbms/src/Storages/DeltaMerge/DMContext.cppdbms/src/Storages/KVStore/ProxyStateMachine.hdbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cppdbms/src/Storages/StorageDisaggregatedColumnar.cpp
💤 Files with no reviewable changes (2)
- dbms/src/Storages/DeltaMerge/DMContext.cpp
- dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
✅ Files skipped from review due to trivial changes (1)
- .gitmodules
🚧 Files skipped from review as they are similar to previous changes (4)
- dbms/src/Common/TiFlashBuildInfo.cpp
- contrib/tiflash-proxy-columnar
- dbms/src/Server/Server.cpp
- dbms/src/Storages/KVStore/ProxyStateMachine.h
| auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len); | ||
| errorpb::Error region_error; | ||
| region_error.ParseFromString(error_msg); | ||
| auto region_ver_id = pingcap::kv::RegionVerID(region_id, region_conf_ver, region_ver); | ||
| // Refresh region cache and throw an exception for retrying. | ||
| if (region_error.has_epoch_not_match()) | ||
| { | ||
| RegionException::UnavailableRegions unavailable_regions; | ||
| String region_id_ver; // region_id:region_ver:conf_ver | ||
| std::unordered_set<RegionID> retry_regions; | ||
| for (const auto & region : region_error.epoch_not_match().current_regions()) | ||
| { | ||
| unavailable_regions.insert(region.id()); | ||
| retry_regions.insert(region.id()); | ||
| region_id_ver = std::to_string(region.id()) + ":" + std::to_string(region_ver) + ":" | ||
| + std::to_string(region.region_epoch().conf_ver()); | ||
| } | ||
| auto _guard = std::lock_guard(output_lock); | ||
| cluster->region_cache->dropRegion(region_ver_id); | ||
| LOG_WARNING( | ||
| log, | ||
| "create columnar reader failed region_id={}, epoch not match {}", | ||
| std::to_string(region_id), | ||
| region_ver_id.toString()); | ||
| throw RegionException( | ||
| std::move(unavailable_regions), | ||
| RegionException::RegionReadStatus::EPOCH_NOT_MATCH, | ||
| region_id_ver.c_str()); | ||
| } | ||
| else | ||
| { | ||
| RegionException::UnavailableRegions unavailable_regions; | ||
| std::unordered_set<RegionID> retry_regions; | ||
| auto err_region_id = 0; | ||
| if (region_error.has_region_not_found()) | ||
| { | ||
| err_region_id = region_error.region_not_found().region_id(); | ||
| unavailable_regions.insert(err_region_id); | ||
| retry_regions.insert(err_region_id); | ||
| LOG_WARNING( | ||
| log, | ||
| "create columnar reader failed region_id={}, region not found {}", | ||
| std::to_string(region_id), | ||
| std::to_string(err_region_id)); | ||
| } | ||
| else | ||
| { | ||
| LOG_WARNING( | ||
| log, | ||
| "create columnar reader failed region_id={}, {}", | ||
| std::to_string(region_id), | ||
| region_error.ShortDebugString()); | ||
| } | ||
| auto _guard = std::lock_guard(output_lock); | ||
| cluster->region_cache->dropRegion(region_ver_id); | ||
| throw RegionException( | ||
| std::move(unavailable_regions), | ||
| RegionException::RegionReadStatus::NOT_FOUND, | ||
| std::to_string(region_id).c_str()); | ||
| } |
There was a problem hiding this comment.
Unchecked ParseFromString, lossy region_id_ver, and dead retry_regions in the RegionError branch.
- Line 496:
region_error.ParseFromString(error_msg)return value is ignored. If the proxy ever returns a malformed error payload, you proceed with a default-constructederrorpb::Error, mis-classify as theelsebranch, drop the region from cache, and throwNOT_FOUNDwith no real diagnostic. Same concern at line 559 forlock_info.ParseFromString(error_msg). - Lines 502–510:
region_id_veris overwritten on every iteration ofcurrent_regions(), so theRegionExceptionmessage reflects only the last region even thoughunavailable_regionsmay contain many. Either build a joined string or move thethrowinside the loop / pick a deterministic representative. - Lines 503, 526:
retry_regionsis populated but never read — dead variable in both sub-branches.
🧹 Sketch
- errorpb::Error region_error;
- region_error.ParseFromString(error_msg);
+ errorpb::Error region_error;
+ if (!region_error.ParseFromString(error_msg))
+ throw Exception(
+ ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
+ "failed to parse proxy RegionError payload, region_id={}",
+ region_id);
@@
- RegionException::UnavailableRegions unavailable_regions;
- String region_id_ver; // region_id:region_ver:conf_ver
- std::unordered_set<RegionID> retry_regions;
- for (const auto & region : region_error.epoch_not_match().current_regions())
- {
- unavailable_regions.insert(region.id());
- retry_regions.insert(region.id());
- region_id_ver = std::to_string(region.id()) + ":" + std::to_string(region_ver) + ":"
- + std::to_string(region.region_epoch().conf_ver());
- }
+ RegionException::UnavailableRegions unavailable_regions;
+ FmtBuffer regions_desc;
+ for (const auto & region : region_error.epoch_not_match().current_regions())
+ {
+ unavailable_regions.insert(region.id());
+ regions_desc.fmtAppend(
+ "{}{}:{}:{}",
+ regions_desc.size() == 0 ? "" : ",",
+ region.id(),
+ region_ver,
+ region.region_epoch().conf_ver());
+ }As per coding guidelines: "Use FmtBuffer for complex string building in performance-critical paths in C++".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 494 - 553,
Check and handle the return value of region_error.ParseFromString(error_msg)
(and similarly lock_info.ParseFromString) and if parsing fails, log a parse
error and treat it as an internal/different error path instead of proceeding
with a default-constructed errorpb::Error; when iterating
region_error.epoch_not_match().current_regions() collect all region ids into
unavailable_regions and build a joined region_id_ver string (use FmtBuffer for
efficient concatenation) rather than overwriting region_id_ver each loop, and
either remove the unused retry_regions variables (region_error branch and the
other branch) or actually use them where intended; keep existing
dropRegion(region_ver_id), lock_guard(output_lock), and throw RegionException
with the aggregated region_id_ver (or a deterministic representative) and
correct RegionReadStatus after parsing succeeds.
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
What problem does this PR solve?
Issue Number: close #10844
Problem Summary:
What is changed and how it works?
Add the new columnar storage as data source for TiDB X.
Check List
Tests
Side effects
Documentation
Release note
Manual Test
-DENABLE_NEXT_GEN_COLUMNAR=ONto enable building withtiflash-proxy-columnarcmake -GNinja -DCMAKE_BUILD_TYPE=DEBUG -DENABLE_NEXT_GEN=ON -DENABLE_NEXT_GEN_COLUMNAR=ON ..use_columnar=truein config.Summary by CodeRabbit
New Features
Chores
Bug Fixes