Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38e89f4657
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/Storages/IStorageCluster.cpp
Outdated
| if (send_external_tables) | ||
| external_tables = query_info.planner_context->getMutableQueryContext()->getExternalTables(); |
There was a problem hiding this comment.
Reset external-table forwarding state per query
send_external_tables is process state on the storage object and is only ever set to true in the GLOBAL-join path, so once one query enables it, later reads in other modes still take this branch; in those later reads query_info.planner_context may be null (old-analyzer path), so query_info.planner_context->getMutableQueryContext() can crash, and even when non-null it forwards external tables for unrelated queries. This needs to be scoped/reset per read instead of persisting across requests.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
send_external_tables can be true only when object_storage_cluster_join_mode='global'. But if allow_experimental_analyzer=0 with object_storage_cluster_join_mode='global', exception n getQueryProcessingStage must be thrown.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb5e474d83
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/Storages/IStorageCluster.h
Outdated
|
|
||
| LoggerPtr log; | ||
| String cluster_name; | ||
| bool send_external_tables = false; |
There was a problem hiding this comment.
Keep external-table send flag scoped to a single read
send_external_tables is stored as mutable state on IStorageCluster, but it is set from one query path and never reset before subsequent reads. After a GLOBAL-mode query sets it, later queries on the same storage instance can still forward external tables even when rewrite is not active; and because this field is non-atomic shared state, concurrent reads can race on it. This creates cross-query state leakage and undefined behavior risk in production read concurrency.
Useful? React with 👍 / 👎.
src/Storages/IStorageCluster.cpp
Outdated
| throw Exception(ErrorCodes::NOT_IMPLEMENTED, | ||
| "object_storage_cluster_join_mode!='allow' is not supported without allow_experimental_analyzer=true"); | ||
| "object_storage_cluster_join_mode!='allow' is not supported without allow_experimental_analyzer=false"); |
There was a problem hiding this comment.
Correct inverted analyzer requirement in error message
This branch is entered when allow_experimental_analyzer is disabled, but the message says the mode is unsupported without allow_experimental_analyzer=false. That inverts the required value and points users to the wrong configuration change when they hit this exception.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e3fe1a1c18
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/Storages/IStorageCluster.cpp
Outdated
| /*allow_global_join_for_right_table*/ true, | ||
| /*find_cross_join*/ true); | ||
| query_to_send = queryNodeToDistributedSelectQuery(modified_query_tree); | ||
| send_external_tables = true; |
There was a problem hiding this comment.
Keep external-table state per query, not on storage instance
Setting send_external_tables here stores query-specific state on the shared IStorageCluster object, but that flag is later read/cleared in read(). For storage engines that reuse one instance across requests (for example StorageObjectStorageCluster), concurrent reads can interleave so one query clears the flag while another still needs it, causing the second query to skip sending generated _data_* external tables to shards and fail at execution time. This state needs to be local to a single read path (or otherwise synchronized) rather than kept as mutable object state.
Useful? React with 👍 / 👎.
e3fe1a1 to
0cd90a8
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cd90a87ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| bool prefer_local_join = getContext()->getSettingsRef()[Setting::parallel_replicas_prefer_local_join] | ||
| && getContext()->getSettingsRef()[Setting::object_storage_cluster_join_mode] != ObjectStorageClusterJoinMode::GLOBAL; |
There was a problem hiding this comment.
Keep local-join preference outside object-storage mode
rewriteJoinToGlobalJoin is also used by parallel-replica planning (src/Planner/findParallelReplicasQuery.cpp and src/Interpreters/ClusterProxy/executeQuery.cpp), so tying prefer_local_join to object_storage_cluster_join_mode here makes unrelated non-object-storage queries ignore parallel_replicas_prefer_local_join whenever the session sets object_storage_cluster_join_mode='global'. That forces extra GLOBAL JOIN rewrites in those flows and can significantly increase broadcast/memory costs; this override should be scoped to the object-storage cluster path instead of the shared visitor.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Yes, object_storage_cluster_join_mode=global has priority and oveerides parallel_replicas_prefer_local_join setting here.
Audit: PR #1527 — Cluster Joins part 2 (global mode)PR: Altinity/ClickHouse#1527 — Cluster Joins part 2 - global mode Scope: AI audit note: This review was generated by AI (Cursor agent), following Confirmed defectsLow — Stale and misleading documentation for
|
| Item | Detail |
|---|---|
| Scope reviewed | End-to-end path for global object-storage cluster joins: getQueryProcessingStage (analyzer gate), updateQueryWithJoinToSendIfNeeded (GLOBAL branch), read → ReadFromCluster → RemoteQueryExecutor with external_tables; visitors in buildQueryTreeForShard.cpp; integration tests in tests/integration/test_s3_cluster/test.py (join_mode local / global). |
| Categories failed | Documentation / user-visible consistency (see above). |
| Categories passed | Per-query external table forwarding; analyzer requirement for non-allow modes; no persistent join/external-table flag on IStorageCluster in reviewed code. |
| Assumptions / limits | Static review of the workspace; CI and full integration runs were not executed here. |
arthurpassos
left a comment
There was a problem hiding this comment.
I am not familiar with this piece of code, I can't add any value within a short period of time. Given the release deadline, I'll trust the tests, AI review and approve it
|
Issue with FULL OUTER JOIN |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Cluster Joins part 2 - global mode
Documentation entry for user-facing changes
Setting
object_storage_cluster_join_modewiith valueglobal.In queries like
when left table is executed on cluster (
s3Cluster,Icebergwithobject_storage_clustersetting, etc.) data from right table is extracted and sent to swarm nodes as temorary tables. JOIN is executed on swarm nodes.CI/CD Options
Exclude tests:
Regression jobs to run: