You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-joins.Rmd
+14-28Lines changed: 14 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -577,45 +577,31 @@ When performing non-equi joins (<, >, <=, >=), column names are assigned as foll
577
577
- The right operand (`i` column) contributes values but does not retain its original name.
578
578
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579
579
580
-
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) must be a column from `x`,
581
-
and the right side (e.g., `B`) must be a column from `i`. Non-equi join does not support arbitrary expressions. For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
580
+
In non-equi joins, the left side of the operator (e.g., x_int in x_int >= i_int) must be a column from x, while the right side (e.g., i_int) must be a column from i. Non-equi joins do not support arbitrary expressions.
581
+
For example, on = .(x_int >= i_int) is valid, but on = .(x_int >= i_int + 1) is not valid.
582
582
583
-
Arbitrary comparisons can be accomplished by create temporary columns first. For example:
583
+
If you need to apply transformations, create a temporary column first.
584
584
585
585
```{r}
586
-
x <- data.table(A = 1:5, value_x = letters[1:5])
587
-
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
588
-
x[i, on = .(A >= B)]
586
+
x <- data.table(x_int = 2:4, lower = letters[1:3])
587
+
i <- data.table(i_int = c(2, 4, 5), UPPER = LETTERS[1:3])
588
+
x[i, on = .(x_int >= i_int)]
589
589
```
590
-
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
590
+
Key Takeaways:
591
+
- The name of the output column (x_int) comes from x, but the values come from i_int in i.
592
+
- The last row contains NA because no rows in x match the last row in i (UPPER = "C").
593
+
- Multiple rows in x are returned to match the first row in i with UPPER = "A".
591
594
592
-
Expected Output
593
-
```
594
-
A value_x value_i
595
-
1: 2 b A
596
-
2: 4 d B
597
-
3: 5 e C
598
-
4: 5 e C
599
-
```
600
-
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
601
-
602
-
If you want to keep the B column from i, you need to explicitly select it in the result:
595
+
If you want to keep the i_int column from i, you need to explicitly select it in the result:
603
596
604
597
```{r}
605
-
x[i, on = .(A >= B), .(B, A, value_x, value_i)]
606
-
```
607
-
Updated Output
608
-
```
609
-
B A value_x value_i
610
-
1: 2 2 b A
611
-
2: 4 4 d B
612
-
3: 5 5 e C
613
-
4: 5 5 e C
598
+
x[i, on = .(x_int >= i_int), .(i_int, x_int, lower, UPPER)]
614
599
```
600
+
615
601
If you want to exclude unmatched rows, you should use nomatch = NULL:
616
602
617
603
```{r}
618
-
x[i, on = .(A >= B), .(B, A, value_x, value_i), nomatch = NULL]
0 commit comments