-
Notifications
You must be signed in to change notification settings - Fork 75
Join docs #1437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Join docs #1437
Changes from 9 commits
1ce08a4
4eba809
0fe349a
4e5e60b
b64b96c
6a6fcff
a205d04
63b420d
0642463
278b781
ca9e963
cf5313c
623238e
3ded9e1
1ffbb39
c206279
7a72ccd
3881f60
26efbdb
ebd7b3b
746baa9
6c7a875
4b2aefb
ac873c2
34512f4
9cce2a4
15306b4
75b9a95
77fc4a9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
[//]: # (title: join) | ||
|
||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.multiple.JoinSamples--> | ||
|
||
Joins two [`DataFrame`](DataFrame.md) object by join columns. | ||
|
||
```kotlin | ||
join(otherDf, type = JoinType.Inner) [ { joinColumns } ] | ||
|
||
joinColumns: JoinDsl.(LeftDataFrame) -> Columns | ||
|
||
interface JoinDsl: LeftDataFrame { | ||
|
||
val right: RightDataFrame | ||
|
||
fun DataColumn.match(rightColumn: DataColumn) | ||
} | ||
``` | ||
|
||
`joinColumns` is a [column selector](ColumnSelectors.md) that defines column mapping for join: | ||
|
||
Related operations: [](multipleDataFrames.md) | ||
|
||
## Examples | ||
|
||
<!---FUN notebook_test_join_3--> | ||
|
||
```kotlin | ||
dfAges | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_3.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_5--> | ||
|
||
```kotlin | ||
dfCities | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_5.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_6--> | ||
|
||
```kotlin | ||
// INNER JOIN on differently named keys: | ||
// Merge a row when dfAges.firstName == dfCities.name. | ||
// With the given data all 3 names match → all rows merge. | ||
dfAges.join(dfCities) { firstName match right.name } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_6.html" width="100%" height="500px"></inline-frame> | ||
|
||
If mapped columns have the same name, just select join columns from the left [`DataFrame`](DataFrame.md): | ||
|
||
|
||
<!---FUN notebook_test_join_8--> | ||
|
||
```kotlin | ||
dfLeft | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_8.html" width="100%" height="500px"></inline-frame> | ||
|
||
|
||
<!---FUN notebook_test_join_10--> | ||
|
||
```kotlin | ||
dfRight | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_10.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_11--> | ||
|
||
```kotlin | ||
// INNER JOIN on "name" only: | ||
// Merge when left.name == right.name. | ||
// Duplicate keys produce multiple merged rows (one per pairing). | ||
dfLeft.join(dfRight) { name } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_11.html" width="100%" height="500px"></inline-frame> | ||
|
||
If `joinColumns` is not specified, columns with the same name from both [`DataFrame`](DataFrame.md) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You say |
||
objects will be used as join columns: | ||
|
||
|
||
<!---FUN notebook_test_join_12--> | ||
|
||
```kotlin | ||
// INNER JOIN on all same-named columns ("name" and "city"): | ||
// Merge when BOTH name AND city are equal; otherwise the row is dropped. | ||
dfLeft.join(dfRight) | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_12.html" width="100%" height="500px"></inline-frame> | ||
|
||
|
||
## Join types | ||
|
||
Supported join types: | ||
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects | ||
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md) | ||
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null` | ||
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null` | ||
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null` | ||
* `Exclude` — only mismatched rows from left [`DataFrame`](DataFrame.md) | ||
|
||
For every join type there is a shortcut operation: | ||
|
||
```kotlin | ||
df.innerJoin(otherDf) [ { joinColumns } ] | ||
df.filterJoin(otherDf) [ { joinColumns } ] | ||
df.leftJoin(otherDf) [ { joinColumns } ] | ||
df.rightJoin(otherDf) [ { joinColumns } ] | ||
df.fullJoin(otherDf) [ { joinColumns } ] | ||
df.excludeJoin(otherDf) [ { joinColumns } ] | ||
``` | ||
|
||
|
||
### Examples {id="examples_1"} | ||
|
||
<!---FUN notebook_test_join_13--> | ||
|
||
```kotlin | ||
dfLeft | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_14--> | ||
|
||
```kotlin | ||
dfRight | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_15--> | ||
|
||
```kotlin | ||
// INNER JOIN: | ||
// Keep only rows where (name, city) match on both sides. | ||
// In this dataset both Charlies match twice (Moscow, Milan) → 2 merged rows. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you mean "both charlies"?, the result shows |
||
dfLeft.innerJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_15.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_16--> | ||
|
||
```kotlin | ||
// FILTER JOIN: | ||
// Keep ONLY left rows that have ANY match on (name, city). | ||
// No right-side columns are added. | ||
dfLeft.filterJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_16.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_17--> | ||
|
||
```kotlin | ||
// LEFT JOIN: | ||
// Keep ALL left rows. If (name, city) matches, attach right columns; | ||
// if not, right columns are null (e.g., Alice–London has no right match). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *columns from the right dataframe Also, Alice-London does have a match, Bob-Dubai does not, so |
||
dfLeft.leftJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_17.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_18--> | ||
|
||
```kotlin | ||
// RIGHT JOIN: | ||
// Keep ALL right rows. If no left match, left columns become null | ||
// (e.g., Alice with city=null exists only on the right). | ||
dfLeft.rightJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_18.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_19--> | ||
|
||
```kotlin | ||
// FULL JOIN: | ||
// Keep ALL rows from both sides. Where there's no match on (name, city), | ||
// the other side is filled with nulls. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you make all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I still find the examples hard to follow, I'm wondering how we can make it as clear as possible :) |
||
dfLeft.fullJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_19.html" width="100%" height="500px"></inline-frame> | ||
|
||
<!---FUN notebook_test_join_20--> | ||
|
||
```kotlin | ||
// EXCLUDE JOIN: | ||
// Keep ONLY left rows that have NO match on (name, city). | ||
// Useful to find "unpaired" left rows. | ||
dfLeft.excludeJoin(dfRight) { name and city } | ||
``` | ||
|
||
<!---END--> | ||
|
||
<inline-frame src="./resources/notebook_test_join_20.html" width="100%" height="500px"></inline-frame> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you just copied this part from the original file