Skip to content

Commit 844d97c

Browse files
committed
updated vignett
1 parent 4f84d3c commit 844d97c

File tree

1 file changed

+32
-17
lines changed

1 file changed

+32
-17
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 32 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -698,23 +698,38 @@ Products[!"popcorn",
698698

699699
The `:=` operator in `data.table` is used for updating or adding columns by reference. This means it modifies the original `data.table` without creating a copy, which is very memory-efficient, especially for large datasets. When used inside a `data.table`, `:=` allows you to **add new columns** or **modify existing ones** as part of your query.
700700

701-
Let's update our `Products` table with the latest price from `ProductPriceHistory`:
702-
703-
```{r}
704-
copy(Products)[ProductPriceHistory,
705-
on = .(id = product_id),
706-
j = `:=`(price = tail(i.price, 1),
707-
last_updated = tail(i.date, 1)),
708-
by = .EACHI][]
709-
```
710-
711-
In this operation:
712-
713-
- The function copy creates a ***deep*** copy of the `Products` table, preventing modifications made by `:=` from changing the original table by reference.
714-
- We join `Products` with `ProductPriceHistory` based on `id` and `product_id`.
715-
- We update the `price` column with the latest price from `ProductPriceHistory`.
716-
- We add a new `last_updated` column to track when the price was last changed.
717-
- The `by = .EACHI` ensures that the `tail` function is applied for each product in `ProductPriceHistory`.
701+
1) Let's update our `Products` table with the latest price from `ProductPriceHistory`:
702+
```{r Simple One-to-One Update}
703+
Products[ProductPriceHistory, on = .(id = product_id), price := i.price]
704+
```
705+
- The price column in Products is updated using the price column from ProductPriceHistory.
706+
- The on = .(id = product_id) ensures that updates happen based on matching IDs.
707+
- This method modifies Products in place, avoiding unnecessary copies.
708+
709+
2) If we need to get the latest price and date (instead of all matches), we can still use := efficiently:
710+
```{r Updating with the Latest Record}
711+
Products[ProductPriceHistory,
712+
on = .(id = product_id),
713+
`:=`(price = last(i.price), last_updated = last(i.date)),
714+
by = .EACHI]
715+
```
716+
- last(i.price) ensures that only the latest price is selected.
717+
- last_updated column is added to track the last update date.
718+
- by = .EACHI ensures that the last price is picked for each product.
719+
720+
3) When we need to update Products with multiple columns from ProductPriceHistory
721+
```{r Efficient Right Join Update }
722+
cols <- setdiff(names(ProductPriceHistory), 'product_id')
723+
Products[ProductPriceHistory,
724+
on = .(id = product_id),
725+
(cols) := mget(cols)]
726+
727+
```
728+
- Efficiently updates multiple columns in Products from ProductPriceHistory.
729+
- mget(cols) retrieves multiple matching columns dynamically.
730+
- This method is faster and more memory-efficient than Products <- ProductPriceHistory[Products, on=...].
731+
- Note: := updates Products in place, but does not modify ProductPriceHistory.
732+
- Unlike traditional RIGHT JOIN, data.table does not allow i (right table) to be updated directly.
718733

719734
***
720735

0 commit comments

Comments
 (0)