Skip to content

Commit 0f166cb

Browse files
Simplified and Extended "Updating by Reference" Section in Joins Vignette (#6847)
* updated vignett * corrected file * introduced the necesarry changes * diff bw last and tail * updated difference * updated version * corrected * refined version * reduced the size * included examples * updated * updated section * Various suggested improvements * Some whitespace changes, remove more extraneous info * More consolidation * print for clarity --------- Co-authored-by: Michael Chirico <[email protected]>
1 parent f80867b commit 0f166cb

File tree

1 file changed

+42
-17
lines changed

1 file changed

+42
-17
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 42 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -675,34 +675,59 @@ Products[c("banana","popcorn"),
675675
676676
Products[!"popcorn",
677677
on = "name"]
678-
679678
```
680679

681-
682-
683680
### 6.2. Updating by reference
684681

685-
The `:=` operator in `data.table` is used for updating or adding columns by reference. This means it modifies the original `data.table` without creating a copy, which is very memory-efficient, especially for large datasets. When used inside a `data.table`, `:=` allows you to **add new columns** or **modify existing ones** as part of your query.
682+
Use `:=` to modify columns **by reference** (no copy) during joins. General syntax: `x[i, on=, (cols) := val]`.
686683

687-
Let's update our `Products` table with the latest price from `ProductPriceHistory`:
684+
**Simple One-to-One Update**
685+
686+
Update `Products` with prices from `ProductPriceHistory`:
688687

689688
```{r}
690-
copy(Products)[ProductPriceHistory,
691-
on = .(id = product_id),
692-
j = `:=`(price = tail(i.price, 1),
693-
last_updated = tail(i.date, 1)),
694-
by = .EACHI][]
689+
Products[ProductPriceHistory,
690+
on = .(id = product_id),
691+
price := i.price]
692+
693+
Products
695694
```
696695

697-
In this operation:
696+
- `i.price` refers to price from `ProductPriceHistory`.
697+
- Modifies `Products` in-place.
698698

699-
- The function copy creates a ***deep*** copy of the `Products` table, preventing modifications made by `:=` from changing the original table by reference.
700-
- We join `Products` with `ProductPriceHistory` based on `id` and `product_id`.
701-
- We update the `price` column with the latest price from `ProductPriceHistory`.
702-
- We add a new `last_updated` column to track when the price was last changed.
703-
- The `by = .EACHI` ensures that the `tail` function is applied for each product in `ProductPriceHistory`.
699+
**Grouped Updates with `.EACHI`**
704700

705-
***
701+
Get last price/date for each product:
702+
703+
```{r Updating_with_the_Latest_Record}
704+
Products[ProductPriceHistory,
705+
on = .(id = product_id),
706+
`:=`(price = last(i.price), last_updated = last(i.date)),
707+
by = .EACHI]
708+
709+
Products
710+
```
711+
712+
- `by = .EACHI` groups by i's rows (1 group per ProductPriceHistory row).
713+
- `last()` returns last value
714+
715+
**Efficient Right Join Update**
716+
717+
Add product details to `ProductPriceHistory` without copying:
718+
719+
```{r}
720+
cols <- setdiff(names(Products), "id")
721+
ProductPriceHistory[, (cols) :=
722+
Products[.SD, on = .(id = product_id), .SD, .SDcols = cols]]
723+
setnafill(ProductPriceHistory, fill=0, cols="price") # Handle missing values
724+
725+
ProductPriceHistory
726+
```
727+
728+
- In `i`, `.SD` refers to `ProductPriceHistory`.
729+
- In `j`, `.SD` refers to `Products`.
730+
- `:=` and `setnafill()` both update `ProductPriceHistory` by reference.
706731

707732
## Reference
708733

0 commit comments

Comments
 (0)