Skip to content

Commit 3f484c3

Browse files
committed
corrected error
1 parent 6b2e25b commit 3f484c3

File tree

1 file changed

+22
-37
lines changed

1 file changed

+22
-37
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 22 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -576,61 +576,46 @@ When performing non-equi joins (<, >, <=, >=), it's important to understand how
576576
- The left operand (`x` column) determines the column name in the result.
577577
- The right operand (`i` column) contributes values but does not retain its original name.
578578
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579-
`In non-equi joins, the resulting column inherits the name from the left (x) table but contains values from the right (i) table.`
580579

581-
This can cause confusion when `x` and `i` have different column names.
580+
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) must be a column from `x`,
581+
and the right side (e.g., `B`) must be a column from `i`. Non-equi join does not support arbitrary expressions. For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
582582

583-
**Important**: Non-equi join conditions must use column names from `x` and `i`, *not arbitrary expressions*.
584-
For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
585-
586-
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) *must be a column from `x`*,
587-
and the right side (e.g., `B`) *must be a column from `i`*.
588-
589-
To use expressions, create temporary columns first (see example below).
583+
Arbitrary comparisons can be accomplished by create temporary columns first. For example:
590584

591585
```{r}
592-
x <- data.table(A = 1:5, value_x = letters[1:5])
593-
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
594-
x[i, on = .(A >= B)]
586+
ProductReceived[ProductSales,
587+
on = .(product_id, received_date <= sales_date)]
595588
```
596-
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
589+
```In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.```
597590

598591
Expected Output
599-
A value_x value_i
600-
1: 2 b A
601-
2: 4 d B
602-
3: 5 e C
603-
4: 5 e C
592+
product_id received_date i.sales_date i.sales_count
593+
1: 2 2024-02-01 2024-02-05 100
594+
2: 2 2024-02-03 2024-02-05 100
595+
3: 2 2024-02-01 2024-02-10 150
604596

605597
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
606598

607-
Notice that A appears in the result, but B from i is missing.
608-
**Note for SQL Users**: Unlike SQL, `data.table` non-equi joins:
609-
- Do not retain the `i` column used in the join condition unless explicitly selected.
610-
- Use the `x` column name in the result (e.g., `A` instead of `B` in `A >= B`).
611-
612-
This is because B was only used for filtering and is not retained unless explicitly selected.
613-
However, columns from i that are not used in the join condition (e.g., value_i) are automatically included in the output by default, since data.table keeps all non-matching columns from i.
614-
615-
`If you want to keep the B column from i, you need to explicitly select it in the result:`
599+
If you want to keep the received_date column from i, you need to explicitly select it:
616600

617601
```{r}
618-
x[i, on = .(A >= B), .(B, A, value_x, value_i)]
602+
ProductReceived[ProductSales,
603+
on = .(product_id, received_date <= sales_date),
604+
.(product_id, received_date, sales_date = i.sales_date, sales_count = i.sales_count)]
619605
```
620606
Updated Output
621-
B A value_x value_i
622-
1: 2 2 b A
623-
2: 4 4 d B
624-
3: 5 5 e C
625-
4: 5 5 e C
626-
627-
Now, B from i is explicitly retained in the final table.
607+
product_id received_date sales_date sales_count
608+
1: 2 2024-02-01 2024-02-05 100
609+
2: 2 2024-02-03 2024-02-05 100
610+
3: 2 2024-02-01 2024-02-10 150
628611

629-
`important Consideration: nomatch = NULL in Non-Equi Joins`
630612
If you want to exclude unmatched rows, you should use nomatch = NULL:
631613

632614
```{r}
633-
x[i, on = .(A >= B), .(B, A, value_x, value_i), nomatch = NULL]
615+
ProductReceived[ProductSales,
616+
on = .(product_id, received_date <= sales_date),
617+
.(product_id, received_date, sales_date = i.sales_date, sales_count = i.sales_count),
618+
nomatch = NULL]
634619
```
635620

636621
## 5. Rolling join

0 commit comments

Comments
 (0)