Skip to content

Commit b556d13

Browse files
committed
done
1 parent 1245e5e commit b556d13

File tree

1 file changed

+14
-28
lines changed

1 file changed

+14
-28
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 14 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -577,45 +577,31 @@ When performing non-equi joins (<, >, <=, >=), column names are assigned as foll
577577
- The right operand (`i` column) contributes values but does not retain its original name.
578578
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579579

580-
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) must be a column from `x`,
581-
and the right side (e.g., `B`) must be a column from `i`. Non-equi join does not support arbitrary expressions. For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
580+
In non-equi joins, the left side of the operator (e.g., x_int in x_int >= i_int) must be a column from x, while the right side (e.g., i_int) must be a column from i. Non-equi joins do not support arbitrary expressions.
581+
For example, on = .(x_int >= i_int) is valid, but on = .(x_int >= i_int + 1) is not valid.
582582

583-
Arbitrary comparisons can be accomplished by create temporary columns first. For example:
583+
If you need to apply transformations, create a temporary column first.
584584

585585
```{r}
586-
x <- data.table(A = 1:5, value_x = letters[1:5])
587-
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
588-
x[i, on = .(A >= B)]
586+
x <- data.table(x_int = 2:4, lower = letters[1:3])
587+
i <- data.table(i_int = c(2, 4, 5), UPPER = LETTERS[1:3])
588+
x[i, on = .(x_int >= i_int)]
589589
```
590-
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
590+
Key Takeaways:
591+
- The name of the output column (x_int) comes from x, but the values come from i_int in i.
592+
- The last row contains NA because no rows in x match the last row in i (UPPER = "C").
593+
- Multiple rows in x are returned to match the first row in i with UPPER = "A".
591594

592-
Expected Output
593-
```
594-
A value_x value_i
595-
1: 2 b A
596-
2: 4 d B
597-
3: 5 e C
598-
4: 5 e C
599-
```
600-
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
601-
602-
If you want to keep the B column from i, you need to explicitly select it in the result:
595+
If you want to keep the i_int column from i, you need to explicitly select it in the result:
603596

604597
```{r}
605-
x[i, on = .(A >= B), .(B, A, value_x, value_i)]
606-
```
607-
Updated Output
608-
```
609-
B A value_x value_i
610-
1: 2 2 b A
611-
2: 4 4 d B
612-
3: 5 5 e C
613-
4: 5 5 e C
598+
x[i, on = .(x_int >= i_int), .(i_int, x_int, lower, UPPER)]
614599
```
600+
615601
If you want to exclude unmatched rows, you should use nomatch = NULL:
616602

617603
```{r}
618-
x[i, on = .(A >= B), .(B, A, value_x, value_i), nomatch = NULL]
604+
x[i, on = .(x_int >= i_int), .(i_int, x_int, lower, UPPER), nomatch = NULL]
619605
```
620606

621607
## 5. Rolling join

0 commit comments

Comments
 (0)