Skip to content

Commit 7eb0c02

Browse files
Update vignettes/datatable-faq.Rmd
Co-authored-by: Benjamin Schwendinger <[email protected]>
1 parent ef07281 commit 7eb0c02

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vignettes/datatable-faq.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ For consistency so that when you use data.table in functions that accept varying
5656

5757
You may have heard that it is generally bad practice to refer to columns by number rather than name, though. If your colleague comes along and reads your code later they may have to hunt around to find out which column is number 5. If you or they change the column ordering higher up in your R program, you may produce wrong results with no warning or error if you forget to change all the places in your code which refer to column number 5. That is your fault not R's or data.table's. It's really really bad. Please don't do it. It's the same mantra as professional SQL developers have: never use `select *`, always explicitly select by column name to at least try to be robust to future changes.
5858

59-
Say column 5 is named `"region"` and you really must extract that column as a vector not a data.table. It is more robust to use the column name and write `DT$region` or `DT[["region"]]`; i.e., the same as base R. Using base R's `$` and `[[` on data.table is encouraged. Not when combined with `<-` to assign (use `:=` instead for that) but just to select a single column by name they are encouraged.A key difference, however, is that DT$col may return a reference, while DT[, col] always returns a copy. This can have important consequences and is explained in the vignette("datatable-reference-semantics", package="data.table").
59+
Say column 5 is named `"region"` and you really must extract that column as a vector not a data.table. It is more robust to use the column name and write `DT$region` or `DT[["region"]]`; i.e., the same as base R. Using base R's `$` and `[[` on data.table is encouraged. Not when combined with `<-` to assign (use `:=` instead for that) but just to select a single column by name they are encouraged. A key difference, however, is that DT$col may return a reference, while DT[, col] always returns a copy. This can have important consequences and is explained in the `vignette("datatable-reference-semantics", package="data.table")`.
6060

6161
There are some circumstances where referring to a column by number seems like the only way, such as a sequence of columns. In these situations just like data.frame, you can write `DT[, 5:10]` and `DT[,c(1,4,10)]`. However, again, it is more robust (to future changes in your data's number of and ordering of columns) to use a named range such as `DT[,columnRed:columnViolet]` or name each one `DT[,c("columnRed","columnOrange","columnYellow")]`. It is harder work up front, but you will probably thank yourself and your colleagues might thank you in the future. At least you can say you tried your best to write robust code if something does go wrong.
6262

0 commit comments

Comments
 (0)