seggested improvements

venom1204 · venom1204 · commit a8a03e9d90a5 · 2025-03-21T12:11:42.000+05:30
diff --git a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
@@ -191,64 +191,39 @@ flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
 
 * Since the time to compute the secondary index is quite small, we don't have to use `setindex()`, unless, once again, the task involves repeated subsetting on the same column.
 
-### b) Using named list elements in `i` 
-When subsetting using the `on` argument, values in `i` are typically passed as unnamed elements. However, naming elements explicitly in `i` improves readability, especially when dealing with multiple keys.
-
-- Example: Standard subsetting using unnamed elements
-```{r}
-flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
-```
-While this syntax is concise, it may not be immediately clear which value corresponds to which key in `on`.
-
-- Subsetting using named elements in `i`
+* For clarity/readability, it might help to name the inputs in `i`, e.g.,
 ```{r}
-flights[.(origin = "LGA", dest = "TPA"), max(arr_delay), on = c("origin", "dest")]
+flights[.(origin = "JFK", dest = "LAX"), on = c("origin", "dest")]
 ```
-Naming elements explicitly `(origin = "LGA", dest = "TPA")` clarifies variable correspondence.
+This makes it clear which values correspond to which key.
 
-- Using named lists with multiple values
-When multiple values are passed, named elements further enhance clarity:
-```{r unnamed_elemts}
-flights[.("LGA", "JFK", "EWR"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
-```
-
-```{r named_elements}
-flights[.(origin = c("LGA", "JFK", "EWR"), dest = "XNA"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
-```
-- Impact of named elements on key order
-```{r} 
-flights[.(dest = "TPA", origin = "LGA"), on = .(origin, dest)]  
-```
-- When to use named list elements in `i`.
-when working with multiple keys in `on`, as it improves readability.
-
-### c) Select in `j`
+### b) Select in `j`
 
 All the operations we will discuss below are no different to the ones we already saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. Except we'll be using the `on` argument instead of setting keys.
 
 #### -- Return `arr_delay` column alone as a data.table corresponding to `origin = "LGA"` and `dest = "TPA"`
 
 ```{r}
-flights[.(origin = "LGA", dest = "TPA"), .(arr_delay), on = c("origin", "dest")]
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")]
 ```
 
-### d) Chaining
+### c) Chaining
 
 #### -- On the result obtained above, use chaining to order the column in decreasing order.
 
 ```{r}
-flights[.(origin = "LGA", dest = "TPA"), .(arr_delay), on = c("origin", "dest")][order(-arr_delay)]
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")][order(-arr_delay)]
 ```
 
-### e) Compute or *do* in `j`
+### d) Compute or *do* in `j`
 
 #### -- Find the maximum arrival delay corresponding to `origin = "LGA"` and `dest = "TPA"`.
 
 ```{r}
-flights[.(origin = "LGA", dest = "TPA"), max(arr_delay), on = c("origin", "dest")]
+flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
 ```
 
-### f) *sub-assign* by reference using `:=` in `j`
+### e) *sub-assign* by reference using `:=` in `j`
 
 We have seen this example already in the vignettes [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html). Let's take a look at all the `hours` available in the `flights` *data.table*:
 
@@ -271,7 +246,7 @@ flights[, sort(unique(hour))]
 
 * This is particularly a huge advantage of secondary indices. Previously, just to update a few rows of `hour`, we had to `setkey()` on it, which inevitably reorders the entire data.table. With `on`, the order is preserved, and the operation is much faster! Looking at the code, the task we wanted to perform is also quite clear.
 
-### g) Aggregation using `by`
+### f) Aggregation using `by`
 
 #### -- Get the maximum departure delay for each `month` corresponding to `origin = "JFK"`. Order the result by `month`
 
@@ -282,7 +257,7 @@ head(ans)
 
 * We would have had to set the `key` back to `origin, dest` again, if we did not use `on` which internally builds secondary indices on the fly.
 
-### h) The *mult* argument
+### g) The *mult* argument
 
 The other arguments including `mult` work exactly the same way as we saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
 
@@ -295,17 +270,17 @@ flights[c("BOS", "DAY"), on = "dest", mult = "first"]
 #### -- Subset only the last matching row where `origin` matches *"LGA", "JFK", "EWR"* and `dest` matches *"XNA"*
 
 ```{r}
-flights[.(origin = c("LGA", "JFK", "EWR"), dest = "XNA"), on = c("origin", "dest"), mult = "last"]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), on = c("origin", "dest"), mult = "last"]
 ```
 
-### i) The *nomatch* argument
+### h) The *nomatch* argument
 
 We can choose if queries that do not match should return `NA` or be skipped altogether using the `nomatch` argument.
 
 #### -- From the previous example, subset all rows only if there's a match
 
 ```{r}
-flights[.(origin = c("LGA", "JFK", "EWR"), dest = "XNA"), mult = "last", on = c("origin", "dest"), nomatch = NULL]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", on = c("origin", "dest"), nomatch = NULL]
 ```
 
 * There are no flights connecting "JFK" and "XNA". Therefore, that row is skipped in the result.