Skip to content

Commit 43b8bcb

Browse files
committed
doc updated reagrding non equi join
1 parent c29e313 commit 43b8bcb

File tree

1 file changed

+68
-0
lines changed

1 file changed

+68
-0
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -569,6 +569,74 @@ ProductReceivedProd2[ProductSalesProd2,
569569
nomatch = NULL]
570570
```
571571

572+
### 4.1 Handling Column Name Behavior in Non-Equi Joins
573+
574+
When performing non-equi joins (<, >, <=, >=), it's important to understand how column names are assigned in the result.
575+
576+
- The left operand (`x` column) determines the column name in the result.
577+
- The right operand (`i` column) contributes values but does not retain its original name.
578+
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579+
`In non-equi joins, the resulting column inherits the name from the left (x) table but contains values from the right (i) table.`
580+
581+
This can cause confusion when `x` and `i` have different column names.
582+
583+
**Important**: Non-equi join conditions must use column names from `x` and `i`, *not arbitrary expressions*.
584+
For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
585+
586+
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) *must be a column from `x`*,
587+
and the right side (e.g., `B`) *must be a column from `i`*.
588+
589+
To use expressions, create temporary columns first (see example below).
590+
591+
```{r}
592+
library(data.table)
593+
594+
x <- data.table(A = 1:5, value_x = letters[1:5])
595+
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
596+
597+
x[i, on = .(A >= B)]
598+
```
599+
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
600+
601+
Expected Output
602+
A value_x value_i
603+
1: 2 b A
604+
2: 4 d B
605+
3: 5 e C
606+
4: 5 e C
607+
608+
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
609+
610+
Notice that A appears in the result, but B from i is missing.
611+
**Note for SQL Users**: Unlike SQL, `data.table` non-equi joins:
612+
- Do not retain the `i` column used in the join condition unless explicitly selected.
613+
- Use the `x` column name in the result (e.g., `A` instead of `B` in `A >= B`).
614+
615+
This is because B was only used for filtering and is not retained unless explicitly selected.
616+
However, columns from i that are not used in the join condition (e.g., value_i) are automatically included in the output by default, since data.table keeps all non-matching columns from i.
617+
618+
`If you want to keep the B column from i, you need to explicitly select it in the result:`
619+
620+
```{r}
621+
x[i, on = .(A >= B), .(B, A, value_x, value_i)]
622+
```
623+
Updated Output
624+
B A value_x value_i
625+
1: 2 2 b A
626+
2: 4 4 d B
627+
3: 5 5 e C
628+
4: 5 5 e C
629+
630+
Now, B from i is explicitly retained in the final table.
631+
632+
`important Consideration: nomatch = NULL in Non-Equi Joins`
633+
If you want to exclude unmatched rows, you should use nomatch = NULL:
634+
635+
```{r}
636+
x[i, on = .(A >= B), .(B, A, value_x, value_i), nomatch = NULL]
637+
```
638+
639+
This ensures that only matching cases are returned.
572640

573641
## 5. Rolling join
574642

0 commit comments

Comments
 (0)