You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### 4.1 Handling Column Name Behavior in Non-Equi Joins
573
+
574
+
When performing non-equi joins (<, >, <=, >=), it's important to understand how column names are assigned in the result.
575
+
576
+
- The left operand (`x` column) determines the column name in the result.
577
+
- The right operand (`i` column) contributes values but does not retain its original name.
578
+
- By default, `data.table` does not retain the `i` column used in the join condition unless explicitly requested.
579
+
`In non-equi joins, the resulting column inherits the name from the left (x) table but contains values from the right (i) table.`
580
+
581
+
This can cause confusion when `x` and `i` have different column names.
582
+
583
+
**Important**: Non-equi join conditions must use column names from `x` and `i`, *not arbitrary expressions*.
584
+
For example, `on = .(x_col >= i_col)` is valid, but `on = .(x_col >= i_col + 1)` is not.
585
+
586
+
In non-equi joins, the left side of the operator (e.g., `A` in `A >= B`) *must be a column from `x`*,
587
+
and the right side (e.g., `B`) *must be a column from `i`*.
588
+
589
+
To use expressions, create temporary columns first (see example below).
590
+
591
+
```{r}
592
+
library(data.table)
593
+
594
+
x <- data.table(A = 1:5, value_x = letters[1:5])
595
+
i <- data.table(B = c(2, 4, 5), value_i = LETTERS[1:3])
596
+
597
+
x[i, on = .(A >= B)]
598
+
```
599
+
In data.table, when using a non-equi join condition (>=, <, etc.), the column from x is retained in the result, while the column from i is not retained unless explicitly selected.
600
+
601
+
Expected Output
602
+
A value_x value_i
603
+
1: 2 b A
604
+
2: 4 d B
605
+
3: 5 e C
606
+
4: 5 e C
607
+
608
+
If multiple rows in x satisfy the join condition with a single row in i, those rows will be duplicated in the result.
609
+
610
+
Notice that A appears in the result, but B from i is missing.
611
+
**Note for SQL Users**: Unlike SQL, `data.table` non-equi joins:
612
+
- Do not retain the `i` column used in the join condition unless explicitly selected.
613
+
- Use the `x` column name in the result (e.g., `A` instead of `B` in `A >= B`).
614
+
615
+
This is because B was only used for filtering and is not retained unless explicitly selected.
616
+
However, columns from i that are not used in the join condition (e.g., value_i) are automatically included in the output by default, since data.table keeps all non-matching columns from i.
617
+
618
+
`If you want to keep the B column from i, you need to explicitly select it in the result:`
619
+
620
+
```{r}
621
+
x[i, on = .(A >= B), .(B, A, value_x, value_i)]
622
+
```
623
+
Updated Output
624
+
B A value_x value_i
625
+
1: 2 2 b A
626
+
2: 4 4 d B
627
+
3: 5 5 e C
628
+
4: 5 5 e C
629
+
630
+
Now, B from i is explicitly retained in the final table.
631
+
632
+
`important Consideration: nomatch = NULL in Non-Equi Joins`
633
+
If you want to exclude unmatched rows, you should use nomatch = NULL:
634
+
635
+
```{r}
636
+
x[i, on = .(A >= B), .(B, A, value_x, value_i), nomatch = NULL]
637
+
```
638
+
639
+
This ensures that only matching cases are returned.
0 commit comments