Skip to content

Commit bdbe15a

Browse files
Revert new rules on list(NULL) assignment (#6759)
* revert #6167 (new rules on list(NULL) assignment) * restore last missing line * restore tests to working state on current master
1 parent b739ab2 commit bdbe15a

File tree

6 files changed

+15
-61
lines changed

6 files changed

+15
-61
lines changed

NEWS.md

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -131,28 +131,7 @@ rowwiseDT(
131131
132132
18. Grouped queries on keyed tables no longer return an incorrectly keyed result if the _ad hoc_ `by=` list has some function call (in particular, a function which happens to return a strictly decreasing function of the keys), e.g. `by=.(a = rev(a))`, [#5583](https://github.com/Rdatatable/data.table/issues/5583). Thanks @AbrJA for the report and @MichaelChirico for the fix.
133133
134-
19. Assigning `list(NULL)` to a list column now replaces the column with `list(NULL)`, instead of deleting the column [#5558](https://github.com/Rdatatable/data.table/issues/5558). This behavior is now consistent with base `data.frame`. Thanks @tdhock for the report and @joshhwuu for the fix. This is due to a fundamental ambiguity from both allowing list columns _and_ making the use of `list()` to wrap `j=` arguments optional. We think that the code behaves as expected in all cases now. See the below for some illustration:
135-
136-
```r
137-
DT = data.table(L=list(1L), i=2L, c='a')
138-
139-
DT[, i := NULL] # delete i
140-
DT[, L := NULL] # delete L
141-
142-
DT[, i := list(NULL)] # overwrite: identical(DT$i, list(NULL))
143-
# ^ ** THIS IS A CHANGE FROM PREVIOUS BEHAVIOR WHICH WOULD DELETE i **
144-
DT[, L := list(NULL)] # assignment: identical(DT$L, list(NULL))
145-
146-
DT[, i := .(3L)] # assignment: identical(DT$i, 3L)
147-
DT[, i := .('a')] # overwrite: identical(DT$i, 'a')
148-
DT[, L := .(list(NULL))] # assignment: identical(DT$L, list(NULL))
149-
150-
DT[, c('L', 'i') := list(NULL, NULL)] # delete L,i
151-
DT[, c('L', 'i') := list(list(NULL), 3L)] # assignment: identical(DT$L, list(NULL)), identical(DT$i, 3L)
152-
DT[, c('L', 'i') := list(NULL, 3L)] # delete L, assign to i
153-
DT[, c('L', 'i') := list(list(NULL), NULL)] # assign to L, delete i
154-
```
155-
20. An integer overflow in `fread()` with lines longer than `2^(31/2)` bytes is prevented, [#6729](https://github.com/Rdatatable/data.table/issues/6729). The typical impact was no worse than a wrong initial allocation size, corrected later. Thanks to @TaikiSan21 for the report and @aitap for the fix.
134+
19. An integer overflow in `fread()` with lines longer than `2^(31/2)` bytes is prevented, [#6729](https://github.com/Rdatatable/data.table/issues/6729). The typical impact was no worse than a wrong initial allocation size, corrected later. Thanks to @TaikiSan21 for the report and @aitap for the fix.
156135
157136
## NOTES
158137

R/data.table.R

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1145,8 +1145,7 @@ replace_dot_alias = function(e) {
11451145
}
11461146
}
11471147
names(jsub)=""
1148-
# dont wrap the RHS in list if it is a singular NULL and if not creating a new column
1149-
if (length(jsub[-1L]) == 1L && as.character(jsub[-1L]) == 'NULL' && all(lhs %chin% names_x)) jsub[[1L]]=as.name("identity") else jsub[[1L]]=as.name("list")
1148+
jsub[[1L]]=as.name("list")
11501149
}
11511150
av = all.vars(jsub,TRUE)
11521151
if (!is.atomic(lhs)) stopf("LHS of := must be a symbol, or an atomic vector (column names or positions).")

inst/tests/tests.Rraw

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15447,7 +15447,7 @@ L = list(1:3, NULL, 4:6)
1544715447
test(2058.18, length(L), 3L)
1544815448
test(2058.19, as.data.table(L), data.table(V1=1:3, V2=4:6)) # V2 not V3 # no
1544915449
DT = data.table(a=1:3, b=c(4,5,6))
15450-
test(2058.20, DT[,b:=list(NULL)], data.table(a=1:3, b=list(NULL))) # no
15450+
test(2058.20, DT[,b:=list(NULL)], data.table(a=1:3)) # no
1545115451

1545215452
# rbindlist improved error message, #3638
1545315453
DT = data.table(a=1)
@@ -20782,29 +20782,33 @@ test(2303.2, DT[, .(N=1L), by=.(b=rev(a))], data.table(b=2:1, N=1L)) # ensure no
2078220782
DT = data.table(a=2:3, b=1:0, key=c('a', 'b'))
2078320783
test(2303.3, DT[, .N, by=.(ab=a^b, d=c(1L, 1L))], data.table(ab=c(2, 1), d=1L, N=1L))
2078420784

20785+
# NB: these tests have been edited in light of #6740 to be regression tests
20786+
# preventing existing behavior from breaking in 1.17.0 while we decide
20787+
# whether a breaking change is warranted & how to proceed. The specific tests
20788+
# with different behavior under #5558 001,002,005,006,011,012,015,016.
2078520789
# tests for new consistent replacement of list columns with list(NULL), #5558
2078620790
# replacement of a list column with list(NULL) in a single-row data.table, using different assignment methods
2078720791
DT = data.table(L=list("A"), i=1L)
2078820792
ans = data.table(L=list(NULL), i=1L)
2078920793
# test using replacement with $ operator
2079020794
DT$L = list(NULL)
20791-
test(2304.001, DT, ans)
20795+
test(2304.001, DT, within(ans, rm('L')))
2079220796
DT = data.table(L=list("A"), i=1L)
2079320797
# standard form with := operator
20794-
test(2304.002, copy(DT)[, L := list(NULL)], ans)
20798+
test(2304.002, copy(DT)[, L := list(NULL)], within(ans, rm('L')))
2079520799
# functional form with := operator
2079620800
test(2304.003, copy(DT)[, `:=`(L=list(NULL))], ans)
2079720801
# functional form with 'let' alias
2079820802
test(2304.004, copy(DT)[, let(L=list(NULL))], ans)
2079920803
# using set()
20800-
test(2304.005, set(copy(DT), j="L", value=list(NULL)), ans)
20804+
test(2304.005, set(copy(DT), j="L", value=list(NULL)), within(ans, rm('L')))
2080120805

2080220806
# replacement of multiple list columns with list(NULL) in a single-row data.table, using different assignment methods
2080320807
DT = data.table(L1=list("A"), L2=list("B"), i=1L)
2080420808
ans = data.table(L1=list(NULL), L2=list(NULL), i=1L)
2080520809
DT$L1 = list(NULL)
2080620810
DT$L2 = list(NULL)
20807-
test(2304.006, DT, ans)
20811+
test(2304.006, DT, within(ans, rm('L1', 'L2')))
2080820812
DT = data.table(L1=list("A"), L2=list("B"), i=1L)
2080920813
# standard form with := operator
2081020814
test(2304.007, copy(DT)[, c("L1", "L2") := list(list(NULL), list(NULL))], ans)
@@ -20820,23 +20824,23 @@ DT = data.table(L=list("A", "B"), i=1L)
2082020824
ans = data.table(L=list(NULL, NULL), i=1L)
2082120825
# test using replacement with $ operator
2082220826
DT$L = list(NULL)
20823-
test(2304.011, DT, ans)
20827+
test(2304.011, DT, within(ans, rm('L')))
2082420828
DT = data.table(L=list("A", "B"), i=1L)
2082520829
# standard form with := operator
20826-
test(2304.012, copy(DT)[, L := list(NULL)], ans)
20830+
test(2304.012, copy(DT)[, L := list(NULL)], within(ans, rm('L')))
2082720831
# functional form with := operator
2082820832
test(2304.013, copy(DT)[, `:=`(L=list(NULL))], ans)
2082920833
# functional form with 'let' alias
2083020834
test(2304.014, copy(DT)[, let(L=list(NULL))], ans)
2083120835
# using set()
20832-
test(2304.015, set(copy(DT), j="L", value=list(NULL)), ans)
20836+
test(2304.015, set(copy(DT), j="L", value=list(NULL)), within(ans, rm('L')))
2083320837

2083420838
# replacement of multiple list columns with list(NULL) in a multi-row data.table, using different assignment methods
2083520839
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
2083620840
ans = data.table(L1=list(NULL, NULL), L2=list(NULL, NULL), i=1L)
2083720841
DT$L1 = list(NULL)
2083820842
DT$L2 = list(NULL)
20839-
test(2304.016, DT, ans)
20843+
test(2304.016, DT, within(ans, rm('L1', 'L2')))
2084020844
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
2084120845
# standard form with := operator
2084220846
test(2304.017, copy(DT)[, c("L1", "L2") := list(list(NULL), list(NULL))], ans)

man/assign.Rd

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,6 @@ When \code{LHS} is a factor column and \code{RHS} is a character vector with ite
7878
Unlike \code{<-} for \code{data.frame}, the (potentially large) LHS is not coerced to match the type of the (often small) RHS. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given when fractional data is truncated. It is best to get the column types correct up front and stick to them. Changing a column type is possible but deliberately harder: provide a whole column as the RHS. This RHS is then \emph{plonked} into that column slot and we call this \emph{plonk syntax}, or \emph{replace column syntax} if you prefer. By needing to construct a full length vector of a new type, you as the user are more aware of what is happening and it is clearer to readers of your code that you really do intend to change the column type; e.g., \code{DT[, colA:=as.integer(colA)]}. A plonk occurs whenever you provide a RHS value to `:=` which is \code{nrow} long. When a column is \emph{plonked}, the original column is not updated by reference because that would entail updating every single element of that column whereas the plonk is just one column pointer update.
7979
8080
\code{data.table}s are \emph{not} copied-on-change by \code{:=}, \code{setkey} or any of the other \code{set*} functions. See \code{\link{copy}}.
81-
82-
While in most cases standard and functional form of \code{:=} are interchangeable, there are some minor differences in the way that \code{RHS} is handled. In the functional form, \code{:=} operator behaves like an alias to \code{list}. This means that when \code{RHS} is a list, \code{LHS} is assigned a list. Avoid this by using the standard form when \code{RHS} is a list, or use a vector. See \href{../doc/datatable-reference-semantics.html}{\code{vignette("datatable-reference-semantics")}} for examples.
8381
}
8482
8583
\section{Advanced (internals):}{It is easy to see how \emph{sub-assigning} to existing columns is done internally. Removing columns by reference is also straightforward by modifying the vector of column pointers only (using memmove in C). However adding (new) columns is more tricky as to how the \code{data.table} can be grown \emph{by reference}: the list vector of column pointers is \emph{over-allocated}, see \code{\link{truelength}}. By defining \code{:=} in \code{j} we believe update syntax is natural, and scales, but it also bypasses \code{[<-} dispatch and allows \code{:=} to update by reference with no copies of any part of memory at all.

src/assign.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -549,11 +549,6 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
549549
for (int i=0; i<length(cols); ++i) {
550550
coln = INTEGER(cols)[i]-1;
551551
SEXP thisvalue = RHS_list_of_columns ? VECTOR_ELT(values, i) : values;
552-
// if values is list(NULL), then replace with a list of NULLs instead of deleting, #5558
553-
if (RHS_list_of_columns && length(values)==1 && TYPEOF(VECTOR_ELT(values, 0))==NILSXP && coln < oldncol) {
554-
SET_VECTOR_ELT(dt, coln, targetcol=allocNAVector(VECSXP, length(VECTOR_ELT(dt, coln))));
555-
continue;
556-
}
557552
if (TYPEOF(thisvalue)==NILSXP) {
558553
if (!isNull(rows)) internal_error(__func__, "earlier error 'When deleting columns, i should not be provided' did not happen"); // # nocov
559554
ndelete++;

vignettes/datatable-reference-semantics.Rmd

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -118,33 +118,12 @@ Note that the code above explains how `:=` can be used. They are not working exa
118118

119119
* On the other hand, (b) is handy if you would like to jot some comments down for later.
120120

121-
* In general, (a) and (b) should produce the same result, however there are minor differences in certain cases due to the implementation details of `:=`. Specifically, when using (b), the `:=` operator behaves like an alias to `list`, therefore if `RHS` is a `list` in functional form the column will become a list. See example below.
122-
123121
* The result is returned *invisibly*.
124122

125123
* Since `:=` is available in `j`, we can combine it with `i` and `by` operations just like the aggregation operations we saw in the previous vignette.
126124

127125
#
128126

129-
Although in most cases they are the same, there is a minor difference between the standard and functional forms mentioned above. Let's see an example to understand this.
130-
```{r}
131-
DT = data.table(a = list('A', 'B', 'C'))
132-
l = list(1L:3L)
133-
134-
DT[, new_int := l] # Standard form, new column is integer
135-
136-
DT[, `:=`(new_list = l)] # Functional form, new column is a list of integer vectors
137-
138-
DT[, new_list := list(l)] # Same as above, we can see that `:=` is an alias to list in functional form
139-
140-
# This can be avoided by using a vector instead of a list:
141-
v = 1L:3L
142-
143-
DT[, new_int := v] # Standard form, new column is integer
144-
145-
DT[, `:=`(new_int = v)] # Functional form, new column is integer
146-
```
147-
148127
In the two forms of `:=` shown above, note that we don't assign the result back to a variable. Because we don't need to. The input *data.table* is modified by reference. Let's go through examples to understand what we mean by this.
149128

150129
For the rest of the vignette, we will work with `flights` *data.table*.

0 commit comments

Comments
 (0)