updated

venom1204 · venom1204 · commit aa0ecc0ad01b · 2025-06-04T21:30:57.000Z
diff --git a/man/setkey.Rd b/man/setkey.Rd
@@ -39,20 +39,6 @@ ordering vector of the dataset's rows according to the provided columns. This or
 is stored as an attribute of the \code{data.table} and the dataset retains the original order
 of rows in memory. See the \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}} for more details.
 
-\subsection{Key vs. Index Subsetting}{
-When using \code{setkey}:
-\itemize{
-  \item Subsetting can omit \code{on} (e.g., \code{DT[.(value)]})
-  \item Data is physically reordered in RAM
-}
-When using \code{setindex}:
-\itemize{
-  \item Must specify \code{on} (e.g., \code{DT[.(value), on = "col"]})
-  \item Multiple indices can coexist via \code{setindexv(x, list(cols))}
-  \item Original row order is preserved
-}
-}
-
 \code{key} returns the \code{data.table}'s key if it exists; \code{NULL} if none exists.
 
 \code{haskey} returns \code{TRUE}/\code{FALSE} if the \code{data.table} has a key.
@@ -88,6 +74,20 @@ The sort is \emph{stable}; i.e., the order of ties (if any) is preserved.
 For character vectors, \code{data.table} takes advantage of R's internal global string cache, also exported as \code{\link{chorder}}.
 }
 
+\section{Key vs. Index Subsetting}{
+When using \code{setkey}:
+\itemize{
+  \item Subsetting can omit \code{on} (e.g., \code{DT[.(value)]})
+  \item Data is physically reordered in RAM
+}
+When using \code{setindex}:
+\itemize{
+  \item Must specify \code{on} (e.g., \code{DT[.(value), on = "col"]})
+  \item Multiple indices can coexist via \code{setindexv(x, list(cols))}
+  \item Original row order is preserved
+}
+}
+
 \section{Good practice}{
 In general, it's good practice to use column names rather than numbers. This is
 why \code{setkey} and \code{setkeyv} only accept column names.
@@ -157,14 +157,22 @@ indices(DT)           # get indices single vector
 indices(DT, vectors = TRUE) # get indices list
 
 # Keyed subsetting (no 'on' needed)
+DT = data.table(A=5:1,B=letters[5:1])
 setkey(DT, B)
 DT[.("e")]  # Returns row where B = "e"
 
 # Indexed subsetting (requires 'on')
+DT = data.table(A=5:1,B=letters[5:1])
 setindex(DT, B)
 DT[.("e"), on = "B"]  # Works
 # DT[.("e")]           # Would error (missing 'on')
 
+DT <- data.table(a = c(1,2,1,2), b = c("x", "x", "y", "y"), val = 1:4)
+# Set an index on a single column
+setindex(DT, a)
+# Set multiple indices at once using setindexv
+setindexv(DT, list("a", c("a", "b")))
+  
 # Setting multiple indices at once
 DT = data.table(A = 5:1, B = letters[5:1], C = 10:6)
 setindexv(DT, list(c("A", "B"), c("B", "C")))
diff --git a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
@@ -64,46 +64,34 @@ Secondary indices are similar to `keys` in *data.table*, except for two major di
 
 #### 1.1 Keyed vs. Indexed Subsetting
 
-While both **keys** and **indices** enable fast binary search subsetting, they differ critically in usage:
+While both **keys** and **indices** enable fast binary search subsetting, they differ significantly in usage:
 
 **Keyed subsetting** (implicit column matching)
 ```{r}
-DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
+DT = data.table(a = c(TRUE, FALSE), b = 1:2)
 setkey(DT, a)                # Set key
 DT[.(TRUE)]                  # No 'on' needed - key is used
 ```
 
 **Indexed subsetting** (explicit column specification)
 
 ```{r}
-DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
+DT = data.table(a = c(TRUE, FALSE), b = 1:2)
 setindex(DT, a)              # Set index only (no reorder)
 DT[.(TRUE), on = "a"]        # Must specify 'on'
 ```
 
-Always use `on`with indices to avoid errors. Keys establish permanent implicit matching, while indices require explicit direction for each operation.
+For join-like operations such as `DT[.(values)]` on an indexed table, the `on=` argument is essential to specify the join column(s). `setindex` does not establish the implicit matching that setkey does; thus, this `explicit` direction is needed.
 
-#### **Hierarchy of Lookup Operations**
+#### Hierarchy of Lookup Operations
 
 When `on` is specified, `data.table` uses this lookup order:
 
 - Existing keys matching `on` columns
 
 - Pre-built secondary indices matching `on` columns
 
-- Temporary auto-generated index (if none exist)
-
-**Example**: Error When Omitting on with Indices
-
-```{r}
-DT <- data.table(a = c(TRUE, FALSE), b = 1:2)
-setindex(DT, a)
-# DT[.(TRUE)] # Uncommenting this line causes: Error - must specify 'on'
-```
-
-- Use `DT[.(value)]` when the table is keyed by the relevant column(s).
-
-- Use `DT[.(value), on = "col"]` when using secondary indices or when no key is set.
+- Auto-generated secondary index (created if none exist; its optimized ordering is used internally for efficient joins)
 
 ### b) Set and get secondary indices