Skip to content

Commit 78c5eb8

Browse files
authored
Merge pull request ClickHouse#79833 from ClickHouse/Blargian-patch-15
Docs: fix typos
2 parents ff0abcb + 0e5a573 commit 78c5eb8

File tree

2 files changed

+65
-52
lines changed

2 files changed

+65
-52
lines changed

ci/jobs/scripts/check_style/aspell-ignore/en/aspell-dict.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1251,6 +1251,7 @@ aggthrow
12511251
aiochclient
12521252
alloc
12531253
allocator
1254+
allowlist
12541255
alphaTokens
12551256
amplab
12561257
analysisOfVariance
@@ -1687,6 +1688,7 @@ denormalize
16871688
denormalized
16881689
denormalizing
16891690
denormals
1691+
denylist
16901692
dequeued
16911693
dequeues
16921694
dereference

docs/en/sql-reference/statements/select/join.md

Lines changed: 63 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
description: 'Documentation for JOIN Clause'
3-
sidebar_label: 'Joining Tables'
3+
sidebar_label: 'JOIN'
44
slug: /sql-reference/statements/select/join
55
title: 'JOIN Clause'
66
---
77

8-
# JOIN Clause
8+
# JOIN clause
99

10-
Join produces a new table by combining columns from one or multiple tables by using values common to each. It is a common operation in databases with SQL support, which corresponds to [relational algebra](https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators) join. The special case of one table join is often referred to as "self-join".
10+
The `JOIN` clause produces a new table by combining columns from one or multiple tables by using values common to each. It is a common operation in databases with SQL support, which corresponds to [relational algebra](https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators) join. The special case of one table join is often referred to as a "self-join".
1111

1212
**Syntax**
1313

@@ -18,67 +18,69 @@ FROM <left_table>
1818
(ON <expr_list>)|(USING <column_list>) ...
1919
```
2020

21-
Expressions from `ON` clause and columns from `USING` clause are called "join keys". Unless otherwise stated, join produces a [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) from rows with matching "join keys", which might produce results with much more rows than the source tables.
21+
Expressions from the `ON` clause and columns from the `USING` clause are called "join keys". Unless otherwise stated, a `JOIN` produces a [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) from rows with matching "join keys", which might produce results with many more rows than the source tables.
2222

23-
## Related Content {#related-content}
24-
25-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1](https://clickhouse.com/blog/clickhouse-fully-supports-joins)
26-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 2](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2)
27-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 3](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3)
28-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 4](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4)
29-
30-
## Supported Types of JOIN {#supported-types-of-join}
23+
## Supported types of JOIN {#supported-types-of-join}
3124

3225
All standard [SQL JOIN](https://en.wikipedia.org/wiki/Join_(SQL)) types are supported:
3326

34-
- `INNER JOIN`, only matching rows are returned.
35-
- `LEFT OUTER JOIN`, non-matching rows from left table are returned in addition to matching rows.
36-
- `RIGHT OUTER JOIN`, non-matching rows from right table are returned in addition to matching rows.
37-
- `FULL OUTER JOIN`, non-matching rows from both tables are returned in addition to matching rows.
38-
- `CROSS JOIN`, produces cartesian product of whole tables, "join keys" are **not** specified.
27+
| Type | Description |
28+
|-------------------|-------------------------------------------------------------------------------|
29+
| `INNER JOIN` | only matching rows are returned. |
30+
| `LEFT OUTER JOIN` | non-matching rows from left table are returned in addition to matching rows. |
31+
| `RIGHT OUTER JOIN`| non-matching rows from right table are returned in addition to matching rows. |
32+
| `FULL OUTER JOIN` | non-matching rows from both tables are returned in addition to matching rows. |
33+
| `CROSS JOIN` | produces cartesian product of whole tables, "join keys" are **not** specified.|
3934

40-
`JOIN` without specified type implies `INNER`. Keyword `OUTER` can be safely omitted. Alternative syntax for `CROSS JOIN` is specifying multiple tables in [FROM clause](../../../sql-reference/statements/select/from.md) separated by commas.
35+
- `JOIN` without a type specified implies `INNER`.
36+
- The keyword `OUTER` can be safely omitted.
37+
- An alternative syntax for `CROSS JOIN` is specifying multiple tables in the [`FROM` clause](../../../sql-reference/statements/select/from.md) separated by commas.
4138

42-
Additional join types available in ClickHouse:
39+
Additional join types available in ClickHouse are:
4340

44-
- `LEFT SEMI JOIN` and `RIGHT SEMI JOIN`, a whitelist on "join keys", without producing a cartesian product.
45-
- `LEFT ANTI JOIN` and `RIGHT ANTI JOIN`, a blacklist on "join keys", without producing a cartesian product.
46-
- `LEFT ANY JOIN`, `RIGHT ANY JOIN` and `INNER ANY JOIN`, partially (for opposite side of `LEFT` and `RIGHT`) or completely (for `INNER` and `FULL`) disables the cartesian product for standard `JOIN` types.
47-
- `ASOF JOIN` and `LEFT ASOF JOIN`, joining sequences with a non-exact match. `ASOF JOIN` usage is described below.
48-
- `PASTE JOIN`, performs a horizontal concatenation of two tables.
41+
| Type | Description |
42+
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
43+
| `LEFT SEMI JOIN`, `RIGHT SEMI JOIN` | An allowlist on "join keys", without producing a cartesian product. |
44+
| `LEFT ANTI JOIN`, `RIGHT ANTI JOIN` | A denylist on "join keys", without producing a cartesian product. |
45+
| `LEFT ANY JOIN`, `RIGHT ANY JOIN`, `INNER ANY JOIN` | Partially (for opposite side of `LEFT` and `RIGHT`) or completely (for `INNER` and `FULL`) disables the cartesian product for standard `JOIN` types. |
46+
| `ASOF JOIN`, `LEFT ASOF JOIN` | Joining sequences with a non-exact match. `ASOF JOIN` usage is described below. |
47+
| `PASTE JOIN` | Performs a horizontal concatenation of two tables. |
4948

5049
:::note
5150
When [join_algorithm](../../../operations/settings/settings.md#join_algorithm) is set to `partial_merge`, `RIGHT JOIN` and `FULL JOIN` are supported only with `ALL` strictness (`SEMI`, `ANTI`, `ANY`, and `ASOF` are not supported).
5251
:::
5352

5453
## Settings {#settings}
5554

56-
The default join type can be overridden using [join_default_strictness](../../../operations/settings/settings.md#join_default_strictness) setting.
57-
58-
The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting.
55+
The default join type can be overridden using [`join_default_strictness`](../../../operations/settings/settings.md#join_default_strictness) setting.
5956

57+
The behavior of the ClickHouse server for `ANY JOIN` operations depends on the [`any_join_distinct_right_table_keys`](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting.
6058

6159
**See also**
6260

63-
- [join_algorithm](../../../operations/settings/settings.md#join_algorithm)
64-
- [join_any_take_last_row](../../../operations/settings/settings.md#join_any_take_last_row)
65-
- [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls)
66-
- [partial_merge_join_rows_in_right_blocks](../../../operations/settings/settings.md#partial_merge_join_rows_in_right_blocks)
67-
- [join_on_disk_max_files_to_merge](../../../operations/settings/settings.md#join_on_disk_max_files_to_merge)
68-
- [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys)
61+
- [`join_algorithm`](../../../operations/settings/settings.md#join_algorithm)
62+
- [`join_any_take_last_row`](../../../operations/settings/settings.md#join_any_take_last_row)
63+
- [`join_use_nulls`](../../../operations/settings/settings.md#join_use_nulls)
64+
- [`partial_merge_join_rows_in_right_blocks`](../../../operations/settings/settings.md#partial_merge_join_rows_in_right_blocks)
65+
- [`join_on_disk_max_files_to_merge`](../../../operations/settings/settings.md#join_on_disk_max_files_to_merge)
66+
- [`any_join_distinct_right_table_keys`](../../../operations/settings/settings.md#any_join_distinct_right_table_keys)
6967

7068
Use the `cross_to_inner_join_rewrite` setting to define the behavior when ClickHouse fails to rewrite a `CROSS JOIN` as an `INNER JOIN`. The default value is `1`, which allows the join to continue but it will be slower. Set `cross_to_inner_join_rewrite` to `0` if you want an error to be thrown, and set it to `2` to not run the cross joins but instead force a rewrite of all comma/cross joins. If the rewriting fails when the value is `2`, you will receive an error message stating "Please, try to simplify `WHERE` section".
7169

72-
## ON Section Conditions {#on-section-conditions}
70+
## ON section conditions {#on-section-conditions}
71+
72+
An `ON` section can contain several conditions combined using the `AND` and `OR` operators. Conditions specifying join keys must:
73+
- reference both left and right tables
74+
- use the equality operator
7375

74-
An `ON` section can contain several conditions combined using the `AND` and `OR` operators. Conditions specifying join keys must refer both left and right tables and must use the equality operator. Other conditions may use other logical operators but they must refer either the left or the right table of a query.
76+
Other conditions may use other logical operators but they must reference either the left or the right table of a query.
7577

76-
Rows are joined if the whole complex condition is met. If the conditions are not met, still rows may be included in the result depending on the `JOIN` type. Note that if the same conditions are placed in a `WHERE` section and they are not met, then rows are always filtered out from the result.
78+
Rows are joined if the whole complex condition is met. If the conditions are not met, rows may still be included in the result depending on the `JOIN` type. Note that if the same conditions are placed in a `WHERE` section and they are not met, then rows are always filtered out from the result.
7779

7880
The `OR` operator inside the `ON` clause works using the hash join algorithm — for each `OR` argument with join keys for `JOIN`, a separate hash table is created, so memory consumption and query execution time grow linearly with an increase in the number of expressions `OR` of the `ON` clause.
7981

8082
:::note
81-
If a condition refers columns from different tables, then only the equality operator (`=`) is supported so far.
83+
If a condition references columns from different tables, then only the equality operator (`=`) is supported so far.
8284
:::
8385

8486
**Example**
@@ -156,7 +158,7 @@ Query with `INNER` type of a join and conditions with `OR` and `AND`:
156158

157159
By default, non-equal conditions are supported as long as they use columns from the same table.
158160
For example, `t1.a = t2.key AND t1.b > 0 AND t2.b > t2.c`, because `t1.b > 0` uses columns only from `t1` and `t2.b > t2.c` uses columns only from `t2`.
159-
However, you can try experimental support for conditions like `t1.a = t2.key AND t1.b > t2.key`, check out section below for more details.
161+
However, you can try experimental support for conditions like `t1.a = t2.key AND t1.b > t2.key`, check out the section below for more details.
160162

161163
:::
162164

@@ -174,7 +176,7 @@ Result:
174176
└───┴────┴─────┘
175177
```
176178

177-
## Join with inequality conditions for columns from different tables {#join-with-inequality-conditions-for-columns-from-different-tables}
179+
## JOIN with inequality conditions for columns from different tables {#join-with-inequality-conditions-for-columns-from-different-tables}
178180

179181
Clickhouse currently supports `ALL/ANY/SEMI/ANTI INNER/LEFT/RIGHT/FULL JOIN` with inequality conditions in addition to equality conditions. The inequality conditions are supported only for `hash` and `grace_hash` join algorithms. The inequality conditions are not supported with `join_use_nulls`.
180182

@@ -227,7 +229,7 @@ key4 f 2 3 4 0 0 \N
227229

228230
## NULL values in JOIN keys {#null-values-in-join-keys}
229231

230-
The NULL is not equal to any value, including itself. It means that if a JOIN key has a NULL value in one table, it won't match a NULL value in the other table.
232+
`NULL` is not equal to any value, including itself. This means that if a `JOIN` key has a `NULL` value in one table, it won't match a `NULL` value in the other table.
231233

232234
**Example**
233235

@@ -263,9 +265,9 @@ SELECT A.name, B.score FROM A LEFT JOIN B ON A.id = B.id
263265
└─────────┴───────┘
264266
```
265267

266-
Notice that the row with `Charlie` from table `A` and the row with score 88 from table `B` are not in the result because of the NULL value in the JOIN key.
268+
Notice that the row with `Charlie` from table `A` and the row with score 88 from table `B` are not in the result because of the `NULL` value in the `JOIN` key.
267269

268-
In case you want to match NULL values, use the `isNotDistinctFrom` function to compare the JOIN keys.
270+
In case you want to match `NULL` values, use the `isNotDistinctFrom` function to compare the `JOIN` keys.
269271

270272
```sql
271273
SELECT A.name, B.score FROM A LEFT JOIN B ON isNotDistinctFrom(A.id, B.id)
@@ -279,15 +281,15 @@ SELECT A.name, B.score FROM A LEFT JOIN B ON isNotDistinctFrom(A.id, B.id)
279281
└─────────┴───────┘
280282
```
281283

282-
## ASOF JOIN Usage {#asof-join-usage}
284+
## ASOF JOIN usage {#asof-join-usage}
283285

284286
`ASOF JOIN` is useful when you need to join records that have no exact match.
285287

286-
Algorithm requires the special column in tables. This column:
288+
This JOIN algorithm requires a special column in tables. This column:
287289

288290
- Must contain an ordered sequence.
289291
- Can be one of the following types: [Int, UInt](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), [Date](../../../sql-reference/data-types/date.md), [DateTime](../../../sql-reference/data-types/datetime.md), [Decimal](../../../sql-reference/data-types/decimal.md).
290-
- For `hash` join algorithm it can't be the only column in the `JOIN` clause.
292+
- For the `hash` join algorithm it can't be the only column in the `JOIN` clause.
291293

292294
Syntax `ASOF JOIN ... ON`:
293295

@@ -331,7 +333,7 @@ For example, consider the following tables:
331333
It's **not** supported in the [Join](../../../engines/table-engines/special/join.md) table engine.
332334
:::
333335

334-
## PASTE JOIN Usage {#paste-join-usage}
336+
## PASTE JOIN usage {#paste-join-usage}
335337

336338
The result of `PASTE JOIN` is a table that contains all columns from left subquery followed by all columns from the right subquery.
337339
The rows are matched based on their positions in the original tables (the order of rows should be defined).
@@ -357,7 +359,9 @@ PASTE JOIN
357359
10
358360
└───┴──────┘
359361
```
360-
Note: In this case result can be nondeterministic if the reading is parallel. Example:
362+
363+
Note: in this case result can be nondeterministic if the reading is parallel. For example:
364+
361365
```sql
362366
SELECT *
363367
FROM
@@ -388,14 +392,14 @@ SETTINGS max_block_size = 2;
388392

389393
## Distributed JOIN {#distributed-join}
390394

391-
There are two ways to execute join involving distributed tables:
395+
There are two ways to execute a JOIN involving distributed tables:
392396

393397
- When using a normal `JOIN`, the query is sent to remote servers. Subqueries are run on each of them in order to make the right table, and the join is performed with this table. In other words, the right table is formed on each server separately.
394398
- When using `GLOBAL ... JOIN`, first the requestor server runs a subquery to calculate the right table. This temporary table is passed to each remote server, and queries are run on them using the temporary data that was transmitted.
395399

396400
Be careful when using `GLOBAL`. For more information, see the [Distributed subqueries](/sql-reference/operators/in#distributed-subqueries) section.
397401

398-
## Implicit Type Conversion {#implicit-type-conversion}
402+
## Implicit type conversion {#implicit-type-conversion}
399403

400404
`INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN` queries support the implicit type conversion for "join keys". However the query can not be executed, if join keys from the left and the right tables cannot be converted to a single type (for example, there is no data type that can hold all values from both `UInt64` and `Int64`, or `String` and `Int32`).
401405

@@ -431,9 +435,9 @@ returns the set:
431435
└────┴──────┴───────────────┴─────────────────┘
432436
```
433437

434-
## Usage Recommendations {#usage-recommendations}
438+
## Usage recommendations {#usage-recommendations}
435439

436-
### Processing of Empty or NULL Cells {#processing-of-empty-or-null-cells}
440+
### Processing of empty or NULL cells {#processing-of-empty-or-null-cells}
437441

438442
While joining tables, the empty cells may appear. The setting [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls) define how ClickHouse fills these cells.
439443

@@ -467,7 +471,7 @@ In some cases, it is more efficient to use [IN](../../../sql-reference/operators
467471

468472
If you need a `JOIN` for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a `JOIN` might not be very convenient due to the fact that the right table is re-accessed for every query. For such cases, there is a "dictionaries" feature that you should use instead of `JOIN`. For more information, see the [Dictionaries](../../../sql-reference/dictionaries/index.md) section.
469473

470-
### Memory Limitations {#memory-limitations}
474+
### Memory limitations {#memory-limitations}
471475

472476
By default, ClickHouse uses the [hash join](https://en.wikipedia.org/wiki/Hash_join) algorithm. ClickHouse takes the right_table and creates a hash table for it in RAM. If `join_algorithm = 'auto'` is enabled, then after some threshold of memory consumption, ClickHouse falls back to [merge](https://en.wikipedia.org/wiki/Sort-merge_join) join algorithm. For `JOIN` algorithms description see the [join_algorithm](../../../operations/settings/settings.md#join_algorithm) setting.
473477

@@ -521,3 +525,10 @@ LIMIT 10
521525
│ 722884 │ 77492 │ 11056 │
522526
└───────────┴────────┴────────┘
523527
```
528+
529+
## Related content {#related-content}
530+
531+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1](https://clickhouse.com/blog/clickhouse-fully-supports-joins)
532+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 2](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2)
533+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 3](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3)
534+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 4](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4)

0 commit comments

Comments
 (0)