Skip to content

Commit 9724f41

Browse files
committed
diff bw last and tail
1 parent bd69f6f commit 9724f41

File tree

1 file changed

+38
-18
lines changed

1 file changed

+38
-18
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -703,44 +703,64 @@ Products[!"popcorn",
703703
The `:=` operator in `data.table` is used for updating or adding columns by reference. This means it modifies the original `data.table` without creating a copy, which is very memory-efficient, especially for large datasets. When used inside a `data.table`, `:=` allows you to **add new columns** or **modify existing ones** as part of your query.
704704

705705
#### Let's update our `Products` table with the latest price from `ProductPriceHistory`:
706-
```{r Simple One-to-One Update}
706+
```{r Simple_One_to_One_Update}
707707
Products[ProductPriceHistory, on = .(id = product_id), price := i.price]
708708
```
709-
- The price column in Products is updated using the price column from ProductPriceHistory.
710-
- The on = .(id = product_id) ensures that updates happen based on matching IDs.
711-
- This method modifies Products in place, avoiding unnecessary copies.
709+
- The `price` column in `Products` is updated using the `price` column from `ProductPriceHistory`.
710+
- The `on = .(id = product_id)` ensures that updates happen based on matching IDs.
711+
- This method modifies `Products` in place, avoiding unnecessary copies.
712712

713713
#### If we need to get the latest price and date (instead of all matches), we can still use := efficiently:
714-
```{r Updating with the Latest Record}
714+
```{r Updating_with_the_Latest_Record}
715715
Products[ProductPriceHistory,
716716
on = .(id = product_id),
717717
`:=`(price = last(i.price), last_updated = last(i.date)),
718718
by = .EACHI]
719719
```
720-
- last(i.price) ensures that only the latest price is selected.
721-
- last_updated column is added to track the last update date.
722-
- by = .EACHI ensures that the last price is picked for each product.
720+
- `last(i.price)` ensures that only the latest price is selected.
721+
- `last_updated` column is added to track the last update date.
722+
- `by = .EACHI` ensures that `last(i.price)` is applied separately for each product."
723723

724724
#### Understanding last() vs. tail()
725725

726-
- The key difference between last() and tail() is:
727-
- last(x): Returns the last element of x. Skips NAs when used on a data.table column.
728-
- tail(x, 1): Returns the last row, including NA if present.
726+
- The key difference between `last()` and `tail()` is:
727+
- `last(x):` Returns the last element of x, including NA if it's the last element.
728+
- `tail(x, 1):` Also returns the last element but works more consistently with different object types.
729729

730-
In this case, last(i.price) ensures we get the latest non-NA price, whereas tail(i.price, 1) would return the last row even if it contains NA.
730+
```{r Example_Behavior}
731+
# Test 1: Simple vector with NA at the end
732+
x <- c(1, 2, 3, NA)
733+
last(x) # Returns NA
734+
tail(x, 1) # Returns NA
735+
736+
# Test 2: data.table grouping behavior
737+
dt <- data.table(group = c(1,1,2,2), value = c(10, NA, 20, NA))
738+
dt[, .(last_value = last(value)), by = group] # last() does not skip NA
739+
dt[, .(tail_value = tail(value, 1)), by = group] # tail() behaves similarly
740+
741+
# Test 3: Working with lists
742+
l <- list(a = 1, b = 2, c = 3)
743+
last(l) # Returns 3
744+
tail(l, 1) # Returns a list of length 1
745+
746+
# Test 4: Empty vector behavior
747+
z <- numeric(0)
748+
length(last(z)) # Returns 0
749+
length(tail(z, 1)) # Returns 0
750+
```
731751

732752
#### When we need to update Products with multiple columns from ProductPriceHistory
733-
```{r Efficient Right Join Update }
753+
```{r Efficient_Right_Join_Update }
734754
cols <- setdiff(names(ProductPriceHistory), 'product_id')
735755
Products[ProductPriceHistory,
736756
on = .(id = product_id),
737757
(cols) := mget(cols)]
738758
```
739-
- Efficiently updates multiple columns in Products from ProductPriceHistory.
740-
- mget(cols) retrieves multiple matching columns dynamically.
741-
- This method is faster and more memory-efficient than Products <- ProductPriceHistory[Products, on=...].
742-
- Note: := updates Products in place, but does not modify ProductPriceHistory.
743-
- Unlike traditional RIGHT JOIN, data.table does not allow i (right table) to be updated directly.
759+
- Efficiently updates multiple columns in `Products` from `ProductPriceHistory`.
760+
- `mget(cols)` retrieves multiple matching columns dynamically.
761+
- This method is faster and more memory-efficient than Products <- `ProductPriceHistory[Products, on=...]`.
762+
- Note: `:=` updates `Products` in place, but does not modify `ProductPriceHistory`.
763+
- Unlike traditional RIGHT JOIN, `data.table` does not allow i (right table) to be updated directly.
744764

745765
***
746766

0 commit comments

Comments
 (0)