You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-joins.Rmd
+22-37Lines changed: 22 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -576,61 +576,46 @@ When performing non-equi joins (<, >, <=, >=), it's important to understand how
576
576
- The left operand (`x` column) determines the column name in the result.
577
577
- The right operand (`i` column) contributes values but does not retain its original name.
578
578
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579
-
`In non-equi joins, the resulting column inherits the name from the left (x) table but contains values from the right (i) table.`
580
579
581
-
This can cause confusion when `x` and `i` have different column names.
580
+
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) must be a column from `x`,
581
+
and the right side (e.g., `B`) must be a column from `i`. Non-equi join does not support arbitrary expressions. For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
582
582
583
-
**Important**: Non-equi join conditions must use column names from `x` and `i`, *not arbitrary expressions*.
584
-
For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
585
-
586
-
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) *must be a column from `x`*,
587
-
and the right side (e.g., `B`) *must be a column from `i`*.
588
-
589
-
To use expressions, create temporary columns first (see example below).
583
+
Arbitrary comparisons can be accomplished by create temporary columns first. For example:
590
584
591
585
```{r}
592
-
x <- data.table(A = 1:5, value_x = letters[1:5])
593
-
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
594
-
x[i, on = .(A >= B)]
586
+
ProductReceived[ProductSales,
587
+
on = .(product_id, received_date <= sales_date)]
595
588
```
596
-
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
589
+
```In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.```
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
606
598
607
-
Notice that A appears in the result, but B from i is missing.
608
-
**Note for SQL Users**: Unlike SQL, `data.table` non-equi joins:
609
-
- Do not retain the `i` column used in the join condition unless explicitly selected.
610
-
- Use the `x` column name in the result (e.g., `A` instead of `B` in `A >= B`).
611
-
612
-
This is because B was only used for filtering and is not retained unless explicitly selected.
613
-
However, columns from i that are not used in the join condition (e.g., value_i) are automatically included in the output by default, since data.table keeps all non-matching columns from i.
614
-
615
-
`If you want to keep the B column from i, you need to explicitly select it in the result:`
599
+
If you want to keep the received_date column from i, you need to explicitly select it:
0 commit comments