Skip to content

Commit c98ba08

Browse files
committed
updated version
1 parent 4a416fc commit c98ba08

File tree

2 files changed

+8
-41
lines changed

2 files changed

+8
-41
lines changed

vignettes/datatable-joins.Rmd

Lines changed: 0 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -194,40 +194,6 @@ Products[ProductReceived,
194194
on = .(id = product_id)]
195195
```
196196

197-
#### 3.1.2. Using Named Lists for Explicit Joins
198-
In `data.table`, joins can be performed using unnamed lists `(list())` or named lists. Named lists provide greater clarity and reduce ambiguity when matching column names, especially when joining on multiple columns.
199-
200-
- Using Unnamed
201-
```{r}
202-
dt1 <- data.table(id = c(1, 2, 3), value = c("A", "B", "C"))
203-
dt2 <- data.table(id = c(2, 3, 4), info = c("X", "Y", "Z"))
204-
dt1[dt2, on = "id"]
205-
```
206-
- Using a named list for explicit joins:
207-
```{r}
208-
dt1[dt2, on = .(id = id)]
209-
```
210-
- Here, .() is a shorthand for list(), and explicitly naming the column (id = id) makes the join easier to understand.
211-
212-
##### Named Lists for Multiple Column Joins
213-
When joining on multiple columns, named lists prevent mismatches and make the query more readable:
214-
```{r}
215-
dt1 <- data.table(id = c(1, 2, 3), key1 = c("A", "B", "C"), value = c(10, 20, 30))
216-
dt2 <- data.table(id = c(2, 3, 4), key1 = c("B", "C", "D"), info = c("X", "Y", "Z"))
217-
218-
# Unnamed list approach (less readable)
219-
dt1[dt2, on = c("id", "key1")]
220-
221-
# Named list approach (explicit and clear)
222-
dt1[dt2, on = .(id = id, key1 = key1)]
223-
```
224-
This ensures that column names are explicitly matched, which is especially useful when working with complex datasets.
225-
226-
- When Should You Use Named Lists?
227-
There is potential ambiguity in column names.
228-
You are joining on multiple columns.
229-
You want to make your joins self-documenting and more readable.
230-
231197
#### 3.1.2. Alternatives to define the `on` argument
232198

233199
In all the prior example we have pass the column names we want to match to the `on` argument but `data.table` also have alternatives to that syntax.

vignettes/datatable-secondary-indices-and-auto-indexing.Rmd

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
192192
* Since the time to compute the secondary index is quite small, we don't have to use `setindex()`, unless, once again, the task involves repeated subsetting on the same column.
193193

194194
### b) Using named list elements in `i`
195-
When subsetting using the on argument, values in `i` are typically passed as unnamed elements. However, naming elements explicitly in `i` improves readability, especially when dealing with multiple keys.
195+
When subsetting using the `on` argument, values in `i` are typically passed as unnamed elements. However, naming elements explicitly in `i` improves readability, especially when dealing with multiple keys.
196196

197197
- Example: Standard subsetting using unnamed elements
198198
```{r}
@@ -204,20 +204,21 @@ While this syntax is concise, it may not be immediately clear which value corres
204204
```{r}
205205
flights[.(origin = "LGA", dest = "TPA"), max(arr_delay), on = c("origin", "dest")]
206206
```
207-
Here, naming the elements explicitly `(origin = "LGA", dest = "TPA")` makes it clear which variable each value corresponds to. This improves code maintainability, especially in complex queries.
207+
Naming elements explicitly `(origin = "LGA", dest = "TPA")` clarifies variable correspondence.
208208

209209
- Using named lists with multiple values
210210
When multiple values are passed, named elements further enhance clarity:
211-
```{r}
211+
```{r unnamed_elemts}
212212
flights[.("LGA", "JFK", "EWR"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
213213
```
214-
- Named elements
215-
```{r}
214+
215+
```{r named_elements}
216216
flights[.(origin = c("LGA", "JFK", "EWR"), dest = "XNA"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
217217
```
218218
- Impact of named elements on key order
219-
It's important to note that naming elements in `i` only affects ordering when `on` is specified. If `on` is not used, data.table will match values based on key order, regardless of the names used.
220-
219+
```{r}
220+
flights[.(dest = "TPA", origin = "LGA"), on = .(origin, dest)]
221+
```
221222
- When to use named list elements in `i`.
222223
when working with multiple keys in `on`, as it improves readability.
223224

0 commit comments

Comments
 (0)