You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-joins.Rmd
+5-19Lines changed: 5 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -576,24 +576,18 @@ When performing non-equi joins (<, >, <=, >=), it's important to understand how
576
576
- The left operand (`x` column) determines the column name in the result.
577
577
- The right operand (`i` column) contributes values but does not retain its original name.
578
578
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579
-
`In non-equi joins, the resulting column inherits the name from the left (x) table but contains values from the right (i) table.`
580
579
581
-
This can cause confusion when `x` and `i` have different column names.
580
+
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) must be a column from `x`,
581
+
and the right side (e.g., `B`) must be a column from `i`. Non-equi join does not support arbitrary expressions. For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
582
582
583
-
**Important**: Non-equi join conditions must use column names from `x` and `i`, *not arbitrary expressions*.
584
-
For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
585
-
586
-
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) *must be a column from `x`*,
587
-
and the right side (e.g., `B`) *must be a column from `i`*.
588
-
589
-
To use expressions, create temporary columns first (see example below).
583
+
Arbitrary comparisons can be accomplished by create temporary columns first. For example:
590
584
591
585
```{r}
592
586
x <- data.table(A = 1:5, value_x = letters[1:5])
593
587
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
594
588
x[i, on = .(A >= B)]
595
589
```
596
-
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
590
+
```In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.```
597
591
598
592
Expected Output
599
593
A value_x value_i
@@ -604,15 +598,7 @@ Expected Output
604
598
605
599
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
606
600
607
-
Notice that A appears in the result, but B from i is missing.
608
-
**Note for SQL Users**: Unlike SQL, `data.table` non-equi joins:
609
-
- Do not retain the `i` column used in the join condition unless explicitly selected.
610
-
- Use the `x` column name in the result (e.g., `A` instead of `B` in `A >= B`).
611
-
612
-
This is because B was only used for filtering and is not retained unless explicitly selected.
613
-
However, columns from i that are not used in the join condition (e.g., value_i) are automatically included in the output by default, since data.table keeps all non-matching columns from i.
614
-
615
-
`If you want to keep the B column from i, you need to explicitly select it in the result:`
601
+
If you want to keep the B column from i, you need to explicitly select it in the result:
0 commit comments