dt[, (cols) := list(...), by = group] should not silently recycles list

Currently, `dt[, (cols) := list(...), by = group]` seems to silently recycles `list(...)` when replacing values of `cols`. If `length(list) < length(cols)`, then `list` is recyled; if `length(list) > length(cols)` then redundant elements in `list` are silently dropped, as demonstrated below:

When `by = group` is absent, the lengths are checked:

```r
library(data.table)
dt <- data.table(id = 1:10)
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(10, 20)]
#> Error in `[.data.table`(dt, , `:=`((xcols), list(10, 20))): Supplied 3 columns to be assigned 2 items. Please see NEWS for v1.12.2.
```

However, if `by = group` is used, `list` is recycled:

```r
library(data.table)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(10, 20), by = group]
dt
#>     id group x1 x2 x3
#>  1:  1     2 10 20 10
#>  2:  2     2 10 20 10
#>  3:  3     2 10 20 10
#>  4:  4     2 10 20 10
#>  5:  5     2 10 20 10
#>  6:  6     2 10 20 10
#>  7:  7     1 10 20 10
#>  8:  8     2 10 20 10
#>  9:  9     2 10 20 10
#> 10: 10     2 10 20 10
```

```r
library(data.table)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
dt[, (xcols) := list(40, 30, 20, 10), by = group]
dt
#>     id group x1 x2 x3
#>  1:  1     1 40 30 20
#>  2:  2     1 40 30 20
#>  3:  3     2 40 30 20
#>  4:  4     2 40 30 20
#>  5:  5     2 40 30 20
#>  6:  6     1 40 30 20
#>  7:  7     1 40 30 20
#>  8:  8     2 40 30 20
#>  9:  9     2 40 30 20
#> 10: 10     1 40 30 20
```

Personally, the recycling behavior is almost always unwanted. If it occurs, it is mostly something wrong with my code.

Consider the following example where `list(...)` is produced by `lapply(.SD, ...)`. If the function is inlined and a bit complicated, one often forgets to write `.SDcols`.

```r
library(data.table)
set.seed(123)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
for (i in xn) {
  dt[, xcols[[i]] := runif(.N)]
}
dt[, (xcols) := lapply(.SD, function(x) {
  x / sd(x)
}), by = group]
dt
#>     id group        x1        x2        x3
#>  1:  1     1 0.2672612 2.7645427 3.2041655
#>  2:  2     1 0.5345225 1.3098014 2.4955128
#>  3:  3     1 0.8017837 1.9576795 2.3071378
#>  4:  4     2 2.3421602 1.5214175 4.9351189
#>  5:  5     1 1.3363062 0.2973764 2.3618854
#>  6:  6     2 3.5132403 2.3907258 3.5168344
#>  7:  7     2 4.0987803 0.6538253 2.7005050
#>  8:  8     2 4.6843204 0.1117471 2.9490603
#>  9:  9     1 2.4053512 0.9474491 1.0415680
#> 10: 10     1 2.6726124 2.7578116 0.5299108
```

Undesired/incorrect results are silently produced. The following are the correct results with `.SDcols` added.

```r
library(data.table)
set.seed(123)
dt <- data.table(id = 1:10)
dt[, group := sample(1:2, .N, replace = TRUE)]
xn <- 1:3
xcols <- paste0("x", xn)
for (i in xn) {
  dt[, xcols[[i]] := runif(.N)]
}
dt[, (xcols) := lapply(.SD, function(x) {
  x / sd(x)
}), by = group, .SDcols = xcols]
dt
#>     id group        x1        x2        x3
#>  1:  1     1 2.7645427 3.2041655 2.5018371
#>  2:  2     1 1.3098014 2.4955128 2.3440794
#>  3:  3     1 1.9576795 2.3071378 1.7943807
#>  4:  4     2 1.5214175 4.9351189 2.9399476
#>  5:  5     1 0.2973764 2.3618854 0.0639438
#>  6:  6     2 2.3907258 3.5168344 1.7658739
#>  7:  7     2 0.6538253 2.7005050 2.8031711
#>  8:  8     2 0.1117471 2.9490603 0.7998165
#>  9:  9     1 0.9474491 1.0415680 0.8266013
#> 10: 10     1 2.7578116 0.5299108 0.6017398
```

I suggest that `list(...)` recycling should be consistent with the behavior `data.table` has already adopted with row recycling: only accepting zero, one, or `.N` elements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dt[, (cols) := list(...), by = group] should not silently recycles list #4022

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dt[, (cols) := list(...), by = group] should not silently recycles list #4022

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions