-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Currently, dt[, (cols) := list(...), by = group] seems to silently recycles list(...) when replacing values of cols. If length(list) < length(cols), then list is recyled; if length(list) > length(cols) then redundant elements in list are silently dropped, as demonstrated below:
When by = group is absent, the lengths are checked:
library(data.table)
dt <- data.table(id = 1:10)
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(10, 20)]
#> Error in `[.data.table`(dt, , `:=`((xcols), list(10, 20))): Supplied 3 columns to be assigned 2 items. Please see NEWS for v1.12.2.However, if by = group is used, list is recycled:
library(data.table)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(10, 20), by = group]
dt
#> id group x1 x2 x3
#> 1: 1 2 10 20 10
#> 2: 2 2 10 20 10
#> 3: 3 2 10 20 10
#> 4: 4 2 10 20 10
#> 5: 5 2 10 20 10
#> 6: 6 2 10 20 10
#> 7: 7 1 10 20 10
#> 8: 8 2 10 20 10
#> 9: 9 2 10 20 10
#> 10: 10 2 10 20 10library(data.table)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(40, 30, 20, 10), by = group]
dt
#> id group x1 x2 x3
#> 1: 1 1 40 30 20
#> 2: 2 1 40 30 20
#> 3: 3 2 40 30 20
#> 4: 4 2 40 30 20
#> 5: 5 2 40 30 20
#> 6: 6 1 40 30 20
#> 7: 7 1 40 30 20
#> 8: 8 2 40 30 20
#> 9: 9 2 40 30 20
#> 10: 10 1 40 30 20Personally, the recycling behavior is almost always unwanted. If it occurs, it is mostly something wrong with my code.
Consider the following example where list(...) is produced by lapply(.SD, ...). If the function is inlined and a bit complicated, one often forgets to write .SDcols.
library(data.table)
set.seed(123)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
for (i in xn) {
dt[, xcols[[i]] := runif(.N)]
}
dt[, (xcols) := lapply(.SD, function(x) {
x / sd(x)
}), by = group]
dt
#> id group x1 x2 x3
#> 1: 1 1 0.2672612 2.7645427 3.2041655
#> 2: 2 1 0.5345225 1.3098014 2.4955128
#> 3: 3 1 0.8017837 1.9576795 2.3071378
#> 4: 4 2 2.3421602 1.5214175 4.9351189
#> 5: 5 1 1.3363062 0.2973764 2.3618854
#> 6: 6 2 3.5132403 2.3907258 3.5168344
#> 7: 7 2 4.0987803 0.6538253 2.7005050
#> 8: 8 2 4.6843204 0.1117471 2.9490603
#> 9: 9 1 2.4053512 0.9474491 1.0415680
#> 10: 10 1 2.6726124 2.7578116 0.5299108Undesired/incorrect results are silently produced. The following are the correct results with .SDcols added.
library(data.table)
set.seed(123)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
for (i in xn) {
dt[, xcols[[i]] := runif(.N)]
}
dt[, (xcols) := lapply(.SD, function(x) {
x / sd(x)
}), by = group, .SDcols = xcols]
dt
#> id group x1 x2 x3
#> 1: 1 1 2.7645427 3.2041655 2.5018371
#> 2: 2 1 1.3098014 2.4955128 2.3440794
#> 3: 3 1 1.9576795 2.3071378 1.7943807
#> 4: 4 2 1.5214175 4.9351189 2.9399476
#> 5: 5 1 0.2973764 2.3618854 0.0639438
#> 6: 6 2 2.3907258 3.5168344 1.7658739
#> 7: 7 2 0.6538253 2.7005050 2.8031711
#> 8: 8 2 0.1117471 2.9490603 0.7998165
#> 9: 9 1 0.9474491 1.0415680 0.8266013
#> 10: 10 1 2.7578116 0.5299108 0.6017398I suggest that list(...) recycling should be consistent with the behavior data.table has already adopted with row recycling: only accepting zero, one, or .N elements.