You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update data.table.R
* Update tests.Rraw
* Update data.table.R
* Update tests.Rraw
* Update datatable-reference-semantics.Rmd
* Update assign.Rd
* Update NEWS.md
* Update NEWS.md
* Update data.table.R
* Update tests.Rraw
* Update tests.Rraw
* Update data.table.R
* Update tests.Rraw
* replace iris with raw dataset
* Update tests.Rraw
* update replace_names_sd and made .SD := not work
* change .SD to names(.SD)
* update typo; change .SD to names(.SD)
* update to names(.SD)
* include names(.SD) and fx to .SD usage
I may have went too far. There's no use of ```(cols) := ...``` now but there is at least a reference to the other vignette.
* Updates news to names(.SD)
* Update typo.
* tweak NEWS
* minor grammar
* jans comment
* jan's comment (ii)
* added "footnote"
* Add is.name(e[[2L]])
* Put tests above Add new tests here
* added test to test names(.SD(2))
* include .SDcols in example for assign
* included .SDcols = function example
* test 2138 is greater than 2137
* bad merge
* Make updates per Michael's comments.
* Added test where .SD is used as well as some columns not in .SD.
* Mention count of reactions in issue
* small copy-edit
* more specific
* specify LHS/RHS
* Simplify implementation to probe for names(.SD) and new test
* fine-tune comment
---------
Co-authored-by: Michael Chirico <[email protected]>
Copy file name to clipboardExpand all lines: NEWS.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,8 @@
22
22
23
23
5.`transpose`gains`list.cols=`argument, [#5639](https://github.com/Rdatatable/data.table/issues/5639). Use this to return output with list columns and avoids type promotion (an exception is `factor` columns which are promoted to `character` for consistency between `list.cols=TRUE` and `list.cols=FALSE`). This is convenient for creating a row-major representation of a table. Thanks to @MLopez-Ibanez for the request, and Benjamin Schwendinger for the PR.
24
24
25
+
4.Using`dt[, names(.SD) := lapply(.SD, fx)]`nowworks, [#795](https://github.com/Rdatatable/data.table/issues/795) -- one of our [most-requested issues (see #3189)](https://github.com/Rdatatable/data.table/issues/3189). Thanks to @brodieG for the report, 20 or so others for chiming in, and @ColeMiller1 for PR.
26
+
25
27
## BUG FIXES
26
28
27
29
1.`unique()`returnsacopythecasewhen`nrows(x) <= 1`insteadofamutablealias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.
Let's clean up again and convert our newly-made factor columns back into character columns. This time we will make use of `.SDcols` accepting a function to decide which columns to include. In this case, `is.factor()` will return the columns which are factors. For more on the **S**ubset of the **D**ata, there is also an [SD Usage vignette](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-sd-usage.html).
267
+
268
+
Sometimes, it is also nice to keep track of columns that we transform. That way, even after we convert our columns we would be able to call the specific columns we were updating.
* We also could have used `(factor_cols)` on the `LHS` instead of `names(.SD)`.
277
+
261
278
## 3. `:=` and `copy()`
262
279
263
280
`:=` modifies the input object by reference. Apart from the features we have discussed already, sometimes we might want to use the update by reference feature for its side effect. And at other times it may not be desirable to modify the original object, in which case we can use `copy()` function, as we will see in a moment.
Copy file name to clipboardExpand all lines: vignettes/datatable-sd-usage.Rmd
+24-28Lines changed: 24 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -77,7 +77,15 @@ The first way to impact what `.SD` is is to limit the _columns_ contained in `.S
77
77
Pitching[ , .SD, .SDcols = c('W', 'L', 'G')]
78
78
```
79
79
80
-
This is just for illustration and was pretty boring. But even this simply usage lends itself to a wide variety of highly beneficial / ubiquitous data manipulation operations:
80
+
This is just for illustration and was pretty boring. In addition to accepting a character vector, `.SDcols` also accepts:
81
+
82
+
1. any function such as `is.character` to filter _columns_
83
+
2. the function^{*} `patterns()` to filter _column names_ by regular expression
84
+
3. integer and logical vectors
85
+
86
+
*see `?patterns` for more details
87
+
88
+
This simple usage lends itself to a wide variety of highly beneficial / ubiquitous data manipulation operations:
81
89
82
90
## Column Type Conversion
83
91
@@ -91,52 +99,40 @@ We notice that the following columns are stored as `character` in the `Teams` da
If you're confused by the use of `sapply` here, note that it's quite similar for base R `data.frames`:
98
-
99
-
```{r identify_factors_as_df}
100
-
setDF(Teams) # convert to data.frame for illustration
101
-
sapply(Teams[ , fkt], is.character)
102
-
setDT(Teams) # convert back to data.table
103
-
```
104
-
105
-
The key to understanding this syntax is to recall that a `data.table` (as well as a `data.frame`) can be considered as a `list` where each element is a column -- thus, `sapply`/`lapply` applies the `FUN` argument (in this case, `is.character`) to each _column_ and returns the result as `sapply`/`lapply` usually would.
106
-
107
-
The syntax to now convert these columns to `factor` is very similar -- simply add the `:=` assignment operator:
105
+
The syntax to now convert these columns to `factor` is simple:
# print out the first column to demonstrate success
112
110
head(unique(Teams[[fkt[1L]]]))
113
111
```
114
112
115
-
Note that we must wrap `fkt` in parentheses `()` to force `data.table` to interpret this as column names, instead of trying to assign a column named `'fkt'`.
113
+
Note:
116
114
117
-
Actually, the `.SDcols` argument is quite flexible; above, we supplied a `character` vector of column names. In other situations, it is more convenient to supply an `integer` vector of column _positions_ or a `logical` vector dictating include/exclude for each column. `.SDcols` even accepts regular expression-based pattern matching.
115
+
1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See [reference semantics](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html) for more.
116
+
2. The LHS, `names(.SD)`, indicates which columns we are updating - in this case we update the entire `.SD`.
117
+
3. The RHS, `lapply()`, loops through each column of the `.SD` and converts the column to a factor.
118
+
4. We use the `.SDcols` to only select columns that have pattern of `teamID`.
119
+
120
+
Again, the `.SDcols` argument is quite flexible; above, we supplied `patterns` but we could have also supplied `fkt` or any `character` vector of column names. In other situations, it is more convenient to supply an `integer` vector of column _positions_ or a `logical` vector dictating include/exclude for each column. Finally, the use of a function to filter columns is very helpful.
118
121
119
122
For example, we could do the following to convert all `factor` columns to `character`:
** A proviso to the above: _explicitly_ using column numbers (like `DT[ , (1) := rnorm(.N)]`) is bad practice and can lead to silently corrupted code over time if column positions change. Even implicitly using numbers can be dangerous if we don't keep smart/strict control over the ordering of when we create the numbered index and when we use it.
0 commit comments