Skip to content

Cluster Joins part 2 - global mode#1527

Open
ianton-ru wants to merge 10 commits intoantalya-26.1from
feature/antalya-26.1/json_part2
Open

Cluster Joins part 2 - global mode#1527
ianton-ru wants to merge 10 commits intoantalya-26.1from
feature/antalya-26.1/json_part2

Conversation

@ianton-ru
Copy link
Copy Markdown

@ianton-ru ianton-ru commented Mar 13, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Cluster Joins part 2 - global mode

Documentation entry for user-facing changes

Setting object_storage_cluster_join_mode wiith value global.
In queries like

SELECT * FROM iceberg_table(...) JOIN local_table(...) ON ...

when left table is executed on cluster (s3Cluster, Iceberg with object_storage_cluster setting, etc.) data from right table is extracted and sent to swarm nodes as temorary tables. JOIN is executed on swarm nodes.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 13, 2026

Workflow [PR], commit [d0de2cf]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38e89f4657

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +384 to +385
if (send_external_tables)
external_tables = query_info.planner_context->getMutableQueryContext()->getExternalTables();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reset external-table forwarding state per query

send_external_tables is process state on the storage object and is only ever set to true in the GLOBAL-join path, so once one query enables it, later reads in other modes still take this branch; in those later reads query_info.planner_context may be null (old-analyzer path), so query_info.planner_context->getMutableQueryContext() can crash, and even when non-null it forwards external tables for unrelated queries. This needs to be scoped/reset per read instead of persisting across requests.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_external_tables can be true only when object_storage_cluster_join_mode='global'. But if allow_experimental_analyzer=0 with object_storage_cluster_join_mode='global', exception n getQueryProcessingStage must be thrown.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb5e474d83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


LoggerPtr log;
String cluster_name;
bool send_external_tables = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep external-table send flag scoped to a single read

send_external_tables is stored as mutable state on IStorageCluster, but it is set from one query path and never reset before subsequent reads. After a GLOBAL-mode query sets it, later queries on the same storage instance can still forward external tables even when rewrite is not active; and because this field is non-atomic shared state, concurrent reads can race on it. This creates cross-query state leakage and undefined behavior risk in production read concurrency.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true?

Comment on lines +617 to +618
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
"object_storage_cluster_join_mode!='allow' is not supported without allow_experimental_analyzer=true");
"object_storage_cluster_join_mode!='allow' is not supported without allow_experimental_analyzer=false");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Correct inverted analyzer requirement in error message

This branch is entered when allow_experimental_analyzer is disabled, but the message says the mode is unsupported without allow_experimental_analyzer=false. That inverts the required value and points users to the wrong configuration change when they hit this exception.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e3fe1a1c18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

/*allow_global_join_for_right_table*/ true,
/*find_cross_join*/ true);
query_to_send = queryNodeToDistributedSelectQuery(modified_query_tree);
send_external_tables = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep external-table state per query, not on storage instance

Setting send_external_tables here stores query-specific state on the shared IStorageCluster object, but that flag is later read/cleared in read(). For storage engines that reuse one instance across requests (for example StorageObjectStorageCluster), concurrent reads can interleave so one query clears the flag while another still needs it, causing the second query to skip sending generated _data_* external tables to shards and fail at execution time. This state needs to be local to a single read path (or otherwise synchronized) rather than kept as mutable object state.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the feature/antalya-26.1/json_part2 branch from e3fe1a1 to 0cd90a8 Compare March 23, 2026 12:28
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cd90a87ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +694 to +695
bool prefer_local_join = getContext()->getSettingsRef()[Setting::parallel_replicas_prefer_local_join]
&& getContext()->getSettingsRef()[Setting::object_storage_cluster_join_mode] != ObjectStorageClusterJoinMode::GLOBAL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep local-join preference outside object-storage mode

rewriteJoinToGlobalJoin is also used by parallel-replica planning (src/Planner/findParallelReplicasQuery.cpp and src/Interpreters/ClusterProxy/executeQuery.cpp), so tying prefer_local_join to object_storage_cluster_join_mode here makes unrelated non-object-storage queries ignore parallel_replicas_prefer_local_join whenever the session sets object_storage_cluster_join_mode='global'. That forces extra GLOBAL JOIN rewrites in those flows and can significantly increase broadcast/memory costs; this override should be scoped to the object-storage cluster path instead of the shared visitor.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, object_storage_cluster_join_mode=global has priority and oveerides parallel_replicas_prefer_local_join setting here.

@ianton-ru
Copy link
Copy Markdown
Author

Audit: PR #1527 — Cluster Joins part 2 (global mode)

PR: Altinity/ClickHouse#1527 — Cluster Joins part 2 - global mode

Scope: object_storage_cluster_join_mode='global', object storage cluster reads (IStorageCluster), query rewrite (rewriteJoinToGlobalJoin, buildQueryTreeForShard), ReadFromCluster / RemoteQueryExecutor external tables, settings and tests.


AI audit note: This review was generated by AI (Cursor agent), following .cursor/skills/audit-review/SKILL.md.

Confirmed defects

Low — Stale and misleading documentation for object_storage_cluster_join_mode

  • Impact: Users and operators can believe global is still unsupported or that the setting only applies to JOIN subqueries, while the implementation enables global and applies the rewrite when has_join, has_cross_join, or has_local_columns_in_where is true (see IStorageCluster::updateQueryWithJoinToSendIfNeeded / getQueryTreeInfo).
  • Anchor: src/Core/Settings.cppDECLARE(..., object_storage_cluster_join_mode, ...) help string.
  • Trigger: Reading built-in setting documentation (e.g. system.settings, docs generated from the same string).
  • Why defect: The shipped description contradicts behavior and PR scope (global implemented; not “JOIN-only” in code).
  • Fix direction: Update the R"(...)" text for object_storage_cluster_join_mode to describe global as supported and align “Restrictions” with getQueryTreeInfo / actual rewrite conditions.
  • Regression test direction: Optional: lightweight check that the help string does not contain “Unsupported for now” for global (if the project tests setting descriptions).

Evidence:

    DECLARE(ObjectStorageClusterJoinMode, object_storage_cluster_join_mode, ObjectStorageClusterJoinMode::ALLOW, R"(
Changes the behaviour of object storage cluster function or table.
...
Restrictions:

- Only applied for JOIN subqueries.
- Only if the FROM section uses a object storage cluster function or table.

Possible values:

- `local` — Replaces the database and table in the subquery with local ones for the destination server (shard), leaving the normal `IN`/`JOIN.`
- `global` — Unsupported for now. Replaces the `IN`/`JOIN` query with `GLOBAL IN`/`GLOBAL JOIN.`
- `allow` — Default value. Allows the use of these types of subqueries.
)", 0) \

Notes (not classified as confirmed code defects)

  • Earlier Codex concern (mutable send_external_tables on IStorageCluster): In the current tree this is not present; external tables are taken from query_info.planner_context and passed into ReadFromCluster, avoiding cross-query leakage on the storage object.
  • parallel_replicas_prefer_local_join vs object_storage_cluster_join_mode=global: RewriteJoinToGlobalJoinVisitor in buildQueryTreeForShard.cpp intentionally gives global priority over local join preference; no incorrect-result path was verified in static review.

Coverage summary

Item Detail
Scope reviewed End-to-end path for global object-storage cluster joins: getQueryProcessingStage (analyzer gate), updateQueryWithJoinToSendIfNeeded (GLOBAL branch), readReadFromClusterRemoteQueryExecutor with external_tables; visitors in buildQueryTreeForShard.cpp; integration tests in tests/integration/test_s3_cluster/test.py (join_mode local / global).
Categories failed Documentation / user-visible consistency (see above).
Categories passed Per-query external table forwarding; analyzer requirement for non-allow modes; no persistent join/external-table flag on IStorageCluster in reviewed code.
Assumptions / limits Static review of the workspace; CI and full integration runs were not executed here.

Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this piece of code, I can't add any value within a short period of time. Given the release deadline, I'll trust the tests, AI review and approve it

@alsugiliazova
Copy link
Copy Markdown
Member

alsugiliazova commented Mar 27, 2026

Issue with FULL OUTER JOIN

Received exception from server (version 26.1.4):
Code: 264. DB::Exception: Received from localhost:9000. DB::Exception: Unexpected global join kind: FULL. (INCOMPATIBLE_TYPE_OF_JOIN)
(query: SELECT * FROM s3Cluster(replicated_cluster, 'http://minio:9000/warehouse/data1/data/**.parquet', '[masked]:Secret(name='minio_root_user')', '[masked]:Secret(name='minio_root_password')') AS t1
FULL OUTER JOIN merge_tree_table_86bbcd2c_29de_11f1_ae5b_de7b9eea348f AS t2 ON t1.float_col = t2.float_col ORDER BY tuple(*) SETTINGS object_storage_cluster_join_mode='global', object_storage_cluster='replicated_cluster_two_nodes' FORMAT Values)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Join Concurrency with object_storage_cluster_join_mode='global'

4 participants