You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/sql-reference/statements/select/join.md
+63-52Lines changed: 63 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
---
2
2
description: 'Documentation for JOIN Clause'
3
-
sidebar_label: 'Joining Tables'
3
+
sidebar_label: 'JOIN'
4
4
slug: /sql-reference/statements/select/join
5
5
title: 'JOIN Clause'
6
6
---
7
7
8
-
# JOIN Clause
8
+
# JOIN clause
9
9
10
-
Join produces a new table by combining columns from one or multiple tables by using values common to each. It is a common operation in databases with SQL support, which corresponds to [relational algebra](https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators) join. The special case of one table join is often referred to as "self-join".
10
+
The `JOIN` clause produces a new table by combining columns from one or multiple tables by using values common to each. It is a common operation in databases with SQL support, which corresponds to [relational algebra](https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators) join. The special case of one table join is often referred to as a "self-join".
11
11
12
12
**Syntax**
13
13
@@ -18,67 +18,69 @@ FROM <left_table>
18
18
(ON<expr_list>)|(USING <column_list>) ...
19
19
```
20
20
21
-
Expressions from `ON` clause and columns from `USING` clause are called "join keys". Unless otherwise stated, join produces a [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) from rows with matching "join keys", which might produce results with much more rows than the source tables.
21
+
Expressions from the `ON` clause and columns from the `USING` clause are called "join keys". Unless otherwise stated, a `JOIN`produces a [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) from rows with matching "join keys", which might produce results with many more rows than the source tables.
22
22
23
-
## Related Content {#related-content}
24
-
25
-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1](https://clickhouse.com/blog/clickhouse-fully-supports-joins)
26
-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 2](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2)
27
-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 3](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3)
28
-
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 4](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4)
29
-
30
-
## Supported Types of JOIN {#supported-types-of-join}
23
+
## Supported types of JOIN {#supported-types-of-join}
31
24
32
25
All standard [SQL JOIN](https://en.wikipedia.org/wiki/Join_(SQL)) types are supported:
33
26
34
-
-`INNER JOIN`, only matching rows are returned.
35
-
-`LEFT OUTER JOIN`, non-matching rows from left table are returned in addition to matching rows.
36
-
-`RIGHT OUTER JOIN`, non-matching rows from right table are returned in addition to matching rows.
37
-
-`FULL OUTER JOIN`, non-matching rows from both tables are returned in addition to matching rows.
38
-
-`CROSS JOIN`, produces cartesian product of whole tables, "join keys" are **not** specified.
|`LEFT OUTER JOIN`| non-matching rows from left table are returned in addition to matching rows. |
31
+
|`RIGHT OUTER JOIN`| non-matching rows from right table are returned in addition to matching rows. |
32
+
|`FULL OUTER JOIN`| non-matching rows from both tables are returned in addition to matching rows. |
33
+
|`CROSS JOIN`| produces cartesian product of whole tables, "join keys" are **not** specified.|
39
34
40
-
`JOIN` without specified type implies `INNER`. Keyword `OUTER` can be safely omitted. Alternative syntax for `CROSS JOIN` is specifying multiple tables in [FROM clause](../../../sql-reference/statements/select/from.md) separated by commas.
35
+
-`JOIN` without a type specified implies `INNER`.
36
+
- The keyword `OUTER` can be safely omitted.
37
+
- An alternative syntax for `CROSS JOIN` is specifying multiple tables in the [`FROM` clause](../../../sql-reference/statements/select/from.md) separated by commas.
41
38
42
-
Additional join types available in ClickHouse:
39
+
Additional join types available in ClickHouse are:
43
40
44
-
-`LEFT SEMI JOIN` and `RIGHT SEMI JOIN`, a whitelist on "join keys", without producing a cartesian product.
45
-
-`LEFT ANTI JOIN` and `RIGHT ANTI JOIN`, a blacklist on "join keys", without producing a cartesian product.
46
-
-`LEFT ANY JOIN`, `RIGHT ANY JOIN` and `INNER ANY JOIN`, partially (for opposite side of `LEFT` and `RIGHT`) or completely (for `INNER` and `FULL`) disables the cartesian product for standard `JOIN` types.
47
-
-`ASOF JOIN` and `LEFT ASOF JOIN`, joining sequences with a non-exact match. `ASOF JOIN` usage is described below.
48
-
-`PASTE JOIN`, performs a horizontal concatenation of two tables.
|`LEFT SEMI JOIN`, `RIGHT SEMI JOIN`| An allowlist on "join keys", without producing a cartesian product. |
44
+
|`LEFT ANTI JOIN`, `RIGHT ANTI JOIN`| A denylist on "join keys", without producing a cartesian product. |
45
+
|`LEFT ANY JOIN`, `RIGHT ANY JOIN`, `INNER ANY JOIN`| Partially (for opposite side of `LEFT` and `RIGHT`) or completely (for `INNER` and `FULL`) disables the cartesian product for standard `JOIN` types. |
46
+
|`ASOF JOIN`, `LEFT ASOF JOIN`| Joining sequences with a non-exact match. `ASOF JOIN` usage is described below. |
47
+
|`PASTE JOIN`| Performs a horizontal concatenation of two tables. |
49
48
50
49
:::note
51
50
When [join_algorithm](../../../operations/settings/settings.md#join_algorithm) is set to `partial_merge`, `RIGHT JOIN` and `FULL JOIN` are supported only with `ALL` strictness (`SEMI`, `ANTI`, `ANY`, and `ASOF` are not supported).
52
51
:::
53
52
54
53
## Settings {#settings}
55
54
56
-
The default join type can be overridden using [join_default_strictness](../../../operations/settings/settings.md#join_default_strictness) setting.
57
-
58
-
The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting.
55
+
The default join type can be overridden using [`join_default_strictness`](../../../operations/settings/settings.md#join_default_strictness) setting.
59
56
57
+
The behavior of the ClickHouse server for `ANY JOIN` operations depends on the [`any_join_distinct_right_table_keys`](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting.
Use the `cross_to_inner_join_rewrite` setting to define the behavior when ClickHouse fails to rewrite a `CROSS JOIN` as an `INNER JOIN`. The default value is `1`, which allows the join to continue but it will be slower. Set `cross_to_inner_join_rewrite` to `0` if you want an error to be thrown, and set it to `2` to not run the cross joins but instead force a rewrite of all comma/cross joins. If the rewriting fails when the value is `2`, you will receive an error message stating "Please, try to simplify `WHERE` section".
71
69
72
-
## ON Section Conditions {#on-section-conditions}
70
+
## ON section conditions {#on-section-conditions}
71
+
72
+
An `ON` section can contain several conditions combined using the `AND` and `OR` operators. Conditions specifying join keys must:
73
+
- reference both left and right tables
74
+
- use the equality operator
73
75
74
-
An `ON` section can contain several conditions combined using the `AND` and `OR` operators. Conditions specifying join keys must refer both left and right tables and must use the equality operator. Other conditions may use other logical operators but they must refer either the left or the right table of a query.
76
+
Other conditions may use other logical operators but they must reference either the left or the right table of a query.
75
77
76
-
Rows are joined if the whole complex condition is met. If the conditions are not met, still rows may be included in the result depending on the `JOIN` type. Note that if the same conditions are placed in a `WHERE` section and they are not met, then rows are always filtered out from the result.
78
+
Rows are joined if the whole complex condition is met. If the conditions are not met, rows may still be included in the result depending on the `JOIN` type. Note that if the same conditions are placed in a `WHERE` section and they are not met, then rows are always filtered out from the result.
77
79
78
80
The `OR` operator inside the `ON` clause works using the hash join algorithm — for each `OR` argument with join keys for `JOIN`, a separate hash table is created, so memory consumption and query execution time grow linearly with an increase in the number of expressions `OR` of the `ON` clause.
79
81
80
82
:::note
81
-
If a condition refers columns from different tables, then only the equality operator (`=`) is supported so far.
83
+
If a condition references columns from different tables, then only the equality operator (`=`) is supported so far.
82
84
:::
83
85
84
86
**Example**
@@ -156,7 +158,7 @@ Query with `INNER` type of a join and conditions with `OR` and `AND`:
156
158
157
159
By default, non-equal conditions are supported as long as they use columns from the same table.
158
160
For example, `t1.a = t2.key AND t1.b > 0 AND t2.b > t2.c`, because `t1.b > 0` uses columns only from `t1` and `t2.b > t2.c` uses columns only from `t2`.
159
-
However, you can try experimental support for conditions like `t1.a = t2.key AND t1.b > t2.key`, check out section below for more details.
161
+
However, you can try experimental support for conditions like `t1.a = t2.key AND t1.b > t2.key`, check out the section below for more details.
160
162
161
163
:::
162
164
@@ -174,7 +176,7 @@ Result:
174
176
└───┴────┴─────┘
175
177
```
176
178
177
-
## Join with inequality conditions for columns from different tables {#join-with-inequality-conditions-for-columns-from-different-tables}
179
+
## JOIN with inequality conditions for columns from different tables {#join-with-inequality-conditions-for-columns-from-different-tables}
178
180
179
181
Clickhouse currently supports `ALL/ANY/SEMI/ANTI INNER/LEFT/RIGHT/FULL JOIN` with inequality conditions in addition to equality conditions. The inequality conditions are supported only for `hash` and `grace_hash` join algorithms. The inequality conditions are not supported with `join_use_nulls`.
180
182
@@ -227,7 +229,7 @@ key4 f 2 3 4 0 0 \N
227
229
228
230
## NULL values in JOIN keys {#null-values-in-join-keys}
229
231
230
-
The NULL is not equal to any value, including itself. It means that if a JOIN key has a NULL value in one table, it won't match a NULL value in the other table.
232
+
`NULL` is not equal to any value, including itself. This means that if a `JOIN` key has a `NULL` value in one table, it won't match a `NULL` value in the other table.
231
233
232
234
**Example**
233
235
@@ -263,9 +265,9 @@ SELECT A.name, B.score FROM A LEFT JOIN B ON A.id = B.id
263
265
└─────────┴───────┘
264
266
```
265
267
266
-
Notice that the row with `Charlie` from table `A` and the row with score 88 from table `B` are not in the result because of the NULL value in the JOIN key.
268
+
Notice that the row with `Charlie` from table `A` and the row with score 88 from table `B` are not in the result because of the `NULL` value in the `JOIN` key.
267
269
268
-
In case you want to match NULL values, use the `isNotDistinctFrom` function to compare the JOIN keys.
270
+
In case you want to match `NULL` values, use the `isNotDistinctFrom` function to compare the `JOIN` keys.
269
271
270
272
```sql
271
273
SELECTA.name, B.scoreFROM A LEFT JOIN B ON isNotDistinctFrom(A.id, B.id)
@@ -279,15 +281,15 @@ SELECT A.name, B.score FROM A LEFT JOIN B ON isNotDistinctFrom(A.id, B.id)
279
281
└─────────┴───────┘
280
282
```
281
283
282
-
## ASOF JOIN Usage {#asof-join-usage}
284
+
## ASOF JOIN usage {#asof-join-usage}
283
285
284
286
`ASOF JOIN` is useful when you need to join records that have no exact match.
285
287
286
-
Algorithm requires the special column in tables. This column:
288
+
This JOIN algorithm requires a special column in tables. This column:
287
289
288
290
- Must contain an ordered sequence.
289
291
- Can be one of the following types: [Int, UInt](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), [Date](../../../sql-reference/data-types/date.md), [DateTime](../../../sql-reference/data-types/datetime.md), [Decimal](../../../sql-reference/data-types/decimal.md).
290
-
- For `hash` join algorithm it can't be the only column in the `JOIN` clause.
292
+
- For the `hash` join algorithm it can't be the only column in the `JOIN` clause.
291
293
292
294
Syntax `ASOF JOIN ... ON`:
293
295
@@ -331,7 +333,7 @@ For example, consider the following tables:
331
333
It's **not** supported in the [Join](../../../engines/table-engines/special/join.md) table engine.
332
334
:::
333
335
334
-
## PASTE JOIN Usage {#paste-join-usage}
336
+
## PASTE JOIN usage {#paste-join-usage}
335
337
336
338
The result of `PASTE JOIN` is a table that contains all columns from left subquery followed by all columns from the right subquery.
337
339
The rows are matched based on their positions in the original tables (the order of rows should be defined).
@@ -357,7 +359,9 @@ PASTE JOIN
357
359
│ 1 │ 0 │
358
360
└───┴──────┘
359
361
```
360
-
Note: In this case result can be nondeterministic if the reading is parallel. Example:
362
+
363
+
Note: in this case result can be nondeterministic if the reading is parallel. For example:
There are two ways to execute join involving distributed tables:
395
+
There are two ways to execute a JOIN involving distributed tables:
392
396
393
397
- When using a normal `JOIN`, the query is sent to remote servers. Subqueries are run on each of them in order to make the right table, and the join is performed with this table. In other words, the right table is formed on each server separately.
394
398
- When using `GLOBAL ... JOIN`, first the requestor server runs a subquery to calculate the right table. This temporary table is passed to each remote server, and queries are run on them using the temporary data that was transmitted.
395
399
396
400
Be careful when using `GLOBAL`. For more information, see the [Distributed subqueries](/sql-reference/operators/in#distributed-subqueries) section.
397
401
398
-
## Implicit Type Conversion {#implicit-type-conversion}
402
+
## Implicit type conversion {#implicit-type-conversion}
399
403
400
404
`INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN` queries support the implicit type conversion for "join keys". However the query can not be executed, if join keys from the left and the right tables cannot be converted to a single type (for example, there is no data type that can hold all values from both `UInt64` and `Int64`, or `String` and `Int32`).
401
405
@@ -431,9 +435,9 @@ returns the set:
431
435
└────┴──────┴───────────────┴─────────────────┘
432
436
```
433
437
434
-
## Usage Recommendations {#usage-recommendations}
438
+
## Usage recommendations {#usage-recommendations}
435
439
436
-
### Processing of Empty or NULL Cells {#processing-of-empty-or-null-cells}
440
+
### Processing of empty or NULL cells {#processing-of-empty-or-null-cells}
437
441
438
442
While joining tables, the empty cells may appear. The setting [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls) define how ClickHouse fills these cells.
439
443
@@ -467,7 +471,7 @@ In some cases, it is more efficient to use [IN](../../../sql-reference/operators
467
471
468
472
If you need a `JOIN` for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a `JOIN` might not be very convenient due to the fact that the right table is re-accessed for every query. For such cases, there is a "dictionaries" feature that you should use instead of `JOIN`. For more information, see the [Dictionaries](../../../sql-reference/dictionaries/index.md) section.
469
473
470
-
### Memory Limitations {#memory-limitations}
474
+
### Memory limitations {#memory-limitations}
471
475
472
476
By default, ClickHouse uses the [hash join](https://en.wikipedia.org/wiki/Hash_join) algorithm. ClickHouse takes the right_table and creates a hash table for it in RAM. If `join_algorithm = 'auto'` is enabled, then after some threshold of memory consumption, ClickHouse falls back to [merge](https://en.wikipedia.org/wiki/Sort-merge_join) join algorithm. For `JOIN` algorithms description see the [join_algorithm](../../../operations/settings/settings.md#join_algorithm) setting.
473
477
@@ -521,3 +525,10 @@ LIMIT 10
521
525
│ 722884 │ 77492 │ 11056 │
522
526
└───────────┴────────┴────────┘
523
527
```
528
+
529
+
## Related content {#related-content}
530
+
531
+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1](https://clickhouse.com/blog/clickhouse-fully-supports-joins)
532
+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 2](https://clickhouse.com/blog/clickhouse-fully-supports-joins-hash-joins-part2)
533
+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 3](https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3)
534
+
- Blog: [ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 4](https://clickhouse.com/blog/clickhouse-fully-supports-joins-direct-join-part4)
0 commit comments