Skip to content

Commit 1aa92bc

Browse files
Consistent Replacement of List Column with NULL (#6167)
Co-authored-by: Michael Chirico <[email protected]>
1 parent 6641ca0 commit 1aa92bc

File tree

6 files changed

+331
-2
lines changed

6 files changed

+331
-2
lines changed

NEWS.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,28 @@ rowwiseDT(
133133

134134
19. Grouped queries on keyed tables no longer return an incorrectly keyed result if the _ad hoc_ `by=` list has some function call (in particular, a function which happens to return a strictly decreasing function of the keys), e.g. `by=.(a = rev(a))`, [#5583](https://github.com/Rdatatable/data.table/issues/5583). Thanks @AbrJA for the report and @MichaelChirico for the fix.
135135

136+
20. Assigning `list(NULL)` to a list column now replaces the column with `list(NULL)`, instead of deleting the column [#5558](https://github.com/Rdatatable/data.table/issues/5558). This behavior is now consistent with base `data.frame`. Thanks @tdhock for the report and @joshhwuu for the fix. This is due to a fundamental ambiguity from both allowing list columns _and_ making the use of `list()` to wrap `j=` arguments optional. We think that the code behaves as expected in all cases now. See the below for some illustration:
137+
138+
```r
139+
DT = data.table(L=list(1L), i=2L, c='a')
140+
141+
DT[, i := NULL] # delete i
142+
DT[, L := NULL] # delete L
143+
144+
DT[, i := list(NULL)] # overwrite: identical(DT$i, list(NULL))
145+
# ^ ** THIS IS A CHANGE FROM PREVIOUS BEHAVIOR WHICH WOULD DELETE i **
146+
DT[, L := list(NULL)] # assignment: identical(DT$L, list(NULL))
147+
148+
DT[, i := .(3L)] # assignment: identical(DT$i, 3L)
149+
DT[, i := .('a')] # overwrite: identical(DT$i, 'a')
150+
DT[, L := .(list(NULL))] # assignment: identical(DT$L, list(NULL))
151+
152+
DT[, c('L', 'i') := list(NULL, NULL)] # delete L,i
153+
DT[, c('L', 'i') := list(list(NULL), 3L)] # assignment: identical(DT$L, list(NULL)), identical(DT$i, 3L)
154+
DT[, c('L', 'i') := list(NULL, 3L)] # delete L, assign to i
155+
DT[, c('L', 'i') := list(list(NULL), NULL)] # assign to L, delete i
156+
```
157+
136158
## NOTES
137159

138160
1. There is a new vignette on joins! See `vignette("datatable-joins")`. Thanks to Angel Feliz for authoring it! Feedback welcome. This vignette has been highly requested since 2017: [#2181](https://github.com/Rdatatable/data.table/issues/2181).

R/data.table.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1153,7 +1153,8 @@ replace_dot_alias = function(e) {
11531153
}
11541154
}
11551155
names(jsub)=""
1156-
jsub[[1L]]=as.name("list")
1156+
# dont wrap the RHS in list if it is a singular NULL and if not creating a new column
1157+
if (length(jsub[-1L]) == 1L && as.character(jsub[-1L]) == 'NULL' && all(lhs %chin% names_x)) jsub[[1L]]=as.name("identity") else jsub[[1L]]=as.name("list")
11571158
}
11581159
av = all.vars(jsub,TRUE)
11591160
if (!is.atomic(lhs)) stopf("LHS of := must be a symbol, or an atomic vector (column names or positions).")

inst/tests/tests.Rraw

Lines changed: 279 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15447,7 +15447,7 @@ L = list(1:3, NULL, 4:6)
1544715447
test(2058.18, length(L), 3L)
1544815448
test(2058.19, as.data.table(L), data.table(V1=1:3, V2=4:6)) # V2 not V3 # no
1544915449
DT = data.table(a=1:3, b=c(4,5,6))
15450-
test(2058.20, DT[,b:=list(NULL)], data.table(a=1:3)) # no
15450+
test(2058.20, DT[,b:=list(NULL)], data.table(a=1:3, b=list(NULL))) # no
1545115451

1545215452
# rbindlist improved error message, #3638
1545315453
DT = data.table(a=1)
@@ -20764,3 +20764,281 @@ test(2303.1, DT[, .N, by=.(b=rev(a))], data.table(b=2:1, N=1L))
2076420764
test(2303.2, DT[, .(N=1L), by=.(b=rev(a))], data.table(b=2:1, N=1L)) # ensure no interaction with GForce
2076520765
DT = data.table(a=2:3, b=1:0, key=c('a', 'b'))
2076620766
test(2303.3, DT[, .N, by=.(ab=a^b, d=c(1L, 1L))], data.table(ab=c(2, 1), d=1L, N=1L))
20767+
20768+
# tests for new consistent replacement of list columns with list(NULL), #5558
20769+
# replacement of a list column with list(NULL) in a single-row data.table, using different assignment methods
20770+
DT = data.table(L=list("A"), i=1L)
20771+
ans = data.table(L=list(NULL), i=1L)
20772+
# test using replacement with $ operator
20773+
DT$L = list(NULL)
20774+
test(2304.001, DT, ans)
20775+
DT = data.table(L=list("A"), i=1L)
20776+
# standard form with := operator
20777+
test(2304.002, copy(DT)[, L := list(NULL)], ans)
20778+
# functional form with := operator
20779+
test(2304.003, copy(DT)[, `:=`(L=list(NULL))], ans)
20780+
# functional form with 'let' alias
20781+
test(2304.004, copy(DT)[, let(L=list(NULL))], ans)
20782+
# using set()
20783+
test(2304.005, set(copy(DT), j="L", value=list(NULL)), ans)
20784+
20785+
# replacement of multiple list columns with list(NULL) in a single-row data.table, using different assignment methods
20786+
DT = data.table(L1=list("A"), L2=list("B"), i=1L)
20787+
ans = data.table(L1=list(NULL), L2=list(NULL), i=1L)
20788+
DT$L1 = list(NULL)
20789+
DT$L2 = list(NULL)
20790+
test(2304.006, DT, ans)
20791+
DT = data.table(L1=list("A"), L2=list("B"), i=1L)
20792+
# standard form with := operator
20793+
test(2304.007, copy(DT)[, c("L1", "L2") := list(list(NULL), list(NULL))], ans)
20794+
# functional form with := operator
20795+
test(2304.008, copy(DT)[, `:=`(L1=list(NULL), L2=list(NULL))], ans)
20796+
# functional form with 'let' alias
20797+
test(2304.009, copy(DT)[, let(L1=list(NULL), L2=list(NULL))], ans)
20798+
# using set()
20799+
test(2304.010, set(copy(DT), j=c("L1", "L2"), value=list(list(NULL), list(NULL))), ans)
20800+
20801+
# replacement of a list column with list(NULL) in a multi-row data.table, using different assignment methods
20802+
DT = data.table(L=list("A", "B"), i=1L)
20803+
ans = data.table(L=list(NULL, NULL), i=1L)
20804+
# test using replacement with $ operator
20805+
DT$L = list(NULL)
20806+
test(2304.011, DT, ans)
20807+
DT = data.table(L=list("A", "B"), i=1L)
20808+
# standard form with := operator
20809+
test(2304.012, copy(DT)[, L := list(NULL)], ans)
20810+
# functional form with := operator
20811+
test(2304.013, copy(DT)[, `:=`(L=list(NULL))], ans)
20812+
# functional form with 'let' alias
20813+
test(2304.014, copy(DT)[, let(L=list(NULL))], ans)
20814+
# using set()
20815+
test(2304.015, set(copy(DT), j="L", value=list(NULL)), ans)
20816+
20817+
# replacement of multiple list columns with list(NULL) in a multi-row data.table, using different assignment methods
20818+
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
20819+
ans = data.table(L1=list(NULL, NULL), L2=list(NULL, NULL), i=1L)
20820+
DT$L1 = list(NULL)
20821+
DT$L2 = list(NULL)
20822+
test(2304.016, DT, ans)
20823+
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
20824+
# standard form with := operator
20825+
test(2304.017, copy(DT)[, c("L1", "L2") := list(list(NULL), list(NULL))], ans)
20826+
# functional form with := operator
20827+
test(2304.018, copy(DT)[, `:=`(L1=list(NULL), L2=list(NULL))], ans)
20828+
# functional form with 'let' alias
20829+
test(2304.019, copy(DT)[, let(L1=list(NULL), L2=list(NULL))], ans)
20830+
# using set()
20831+
test(2304.020, set(copy(DT), j=c("L1", "L2"), value=list(list(NULL), list(NULL))), ans)
20832+
20833+
# Adding an empty list column to a single-row data.table, using different assignment methods
20834+
DT = data.table(L=list("A"), i=1L)
20835+
ans = data.table(L=list("A"), i=1L, D=list(NULL))
20836+
warn = "Tried to assign NULL to column 'D', but this column does not exist to remove"
20837+
# try to add a new empty list by list(NULL) with := in standard form, warns and does not change
20838+
test(2304.021, copy(DT)[, D := list(NULL)], DT, warning=warn)
20839+
test(2304.022, set(copy(DT), j="D", value=NULL), DT, warning=warn)
20840+
test(2304.023, set(copy(DT), j="D", value=list(NULL)), DT, warning=warn)
20841+
# add a new column by wrapping list(NULL), consistent with old behavior
20842+
DT$D = list(list(NULL))
20843+
test(2304.024, DT, ans)
20844+
DT = data.table(L=list("A"), i=1L)
20845+
# test adding empty list column in standard form with := operator
20846+
test(2304.025, copy(DT)[, D := .(list(NULL))], ans)
20847+
# functional form with := operator
20848+
test(2304.026, copy(DT)[, `:=`(D=list(NULL))], ans)
20849+
# functional form with 'let' alias
20850+
test(2304.027, copy(DT)[, let(D=list(NULL))], ans)
20851+
# using set()
20852+
test(2304.028, set(copy(DT), j="D", value=list(list(NULL))), ans)
20853+
20854+
# Adding multiple empty list columns to a single-row data.table, using different assignment methods
20855+
DT = data.table(L=list("A"), i=1L)
20856+
ans = data.table(L=list("A"), i=1L, D=list(NULL), R=list(NULL))
20857+
DT$D = list(list(NULL))
20858+
DT$R = list(list(NULL))
20859+
test(2304.029, DT, ans)
20860+
DT = data.table(L=list("A"), i=1L)
20861+
# standard form with := operator
20862+
test(2304.030, copy(DT)[, c("D", "R") := .(list(NULL))], ans)
20863+
test(2304.031, copy(DT)[, c("D", "R") := .(list(NULL), list(NULL))], ans)
20864+
# functional form with := operator
20865+
test(2304.032, copy(DT)[, `:=`(D=list(NULL), R=list(NULL))], ans)
20866+
# functional form with 'let' alias
20867+
test(2304.033, copy(DT)[, let(D=list(NULL), R=list(NULL))], ans)
20868+
# using set()
20869+
test(2304.034, set(copy(DT), j=c("D", "R"), value=list(list(NULL))), ans)
20870+
test(2304.035, set(copy(DT), j=c("D", "R"), value=list(list(NULL), list(NULL))), ans)
20871+
20872+
# Adding an empty list column to a multi-row data.table, using different assignment methods
20873+
DT = data.table(L=list("A", "B"), i=1L)
20874+
ans = data.table(L=list("A", "B"), i=1L, D=list(NULL, NULL))
20875+
warn = "Tried to assign NULL to column 'D', but this column does not exist to remove"
20876+
# try to add a new empty list by list(NULL) with := in standard form, warns and does not change
20877+
test(2304.036, copy(DT)[, D := list(NULL)], DT, warning=warn)
20878+
test(2304.037, set(copy(DT), j="D", value=NULL), DT, warning=warn)
20879+
test(2304.038, set(copy(DT), j="D", value=list(NULL)), DT, warning=warn)
20880+
# add a new column by wrapping list(NULL), consistent with old behavior
20881+
DT$D = list(list(NULL))
20882+
test(2304.039, DT, ans)
20883+
DT = data.table(L=list("A", "B"), i=1L)
20884+
# test adding empty list column in standard form with := operator
20885+
test(2304.040, copy(DT)[, D := .(list(NULL))], ans)
20886+
# functional form with := operator
20887+
test(2304.041, copy(DT)[, `:=`(D=list(NULL))], ans)
20888+
# functional form with 'let' alias
20889+
test(2304.042, copy(DT)[, let(D=list(NULL))], ans)
20890+
# using set()
20891+
test(2304.043, set(copy(DT), j="D", value = list(list(NULL))), ans)
20892+
20893+
# Adding multiply empty list columns to a multi-row data.table, using different assignment methods
20894+
DT = data.table(L=list("A", "B"), i=1L)
20895+
ans = data.table(L=list("A", "B"), i=1L, D=list(NULL, NULL), R=list(NULL, NULL))
20896+
DT$D = list(list(NULL))
20897+
DT$R = list(list(NULL))
20898+
test(2304.044, DT, ans)
20899+
DT = data.table(L=list("A", "B"), i=1L)
20900+
# standard form with := operator
20901+
test(2304.045, copy(DT)[, c("D", "R") := .(list(NULL))], ans)
20902+
test(2304.046, copy(DT)[, c("D", "R") := .(list(NULL), list(NULL))], ans)
20903+
# functional form with := operator
20904+
test(2304.047, copy(DT)[, `:=`(D=list(NULL), R=list(NULL))], ans)
20905+
# functional form with 'let' alias
20906+
test(2304.048, copy(DT)[, let(D=list(NULL), R=list(NULL))], ans)
20907+
# using set()
20908+
test(2304.049, set(copy(DT), j=c("D", "R"), value=list(list(NULL))), ans)
20909+
test(2304.050, set(copy(DT), j=c("D", "R"), value=list(list(NULL), list(NULL))), ans)
20910+
20911+
# Removal of a list column in a single-row data.table, using different assignment methods
20912+
# NOTE: There is only one way to remove columns now, by assigning to NULL
20913+
DT = data.table(L=list("A"), i=1L)
20914+
ans = data.table(i=1L)
20915+
# test removing a list column by assigning to NULL
20916+
DT$L = NULL
20917+
test(2304.051, DT, ans)
20918+
DT = data.table(L=list("A"), i=1L)
20919+
# standard form with := operator
20920+
test(2304.052, copy(DT)[, L := NULL], ans)
20921+
# functional form with := operator
20922+
test(2304.053, copy(DT)[, `:=`(L=NULL)], ans)
20923+
# functional form with 'let' alias
20924+
test(2304.054, copy(DT)[, let(L=NULL)], ans)
20925+
# using set()
20926+
test(2304.055, set(copy(DT), j="L", value=NULL), ans)
20927+
20928+
# Removal of multiple list columns in a single-row data.table, using different assignment methods
20929+
DT = data.table(L1=list("A"), L2=list("B"), i=1L)
20930+
# test removing two list columns by assigning to NULL
20931+
DT$L1 = NULL
20932+
DT$L2 = NULL
20933+
test(2304.056, DT, ans)
20934+
DT = data.table(L1=list("A"), L2=list("B"), i=1L)
20935+
# standard form with := operator
20936+
test(2304.057, copy(DT)[, c("L1", "L2") := NULL], ans)
20937+
test(2304.058, copy(DT)[, c("L1", "L2") := .(NULL, NULL)], ans)
20938+
# functional form with := operator
20939+
test(2304.059, copy(DT)[, `:=`(L1=NULL, L2=NULL)], ans)
20940+
# functional form with 'let' alias
20941+
test(2304.060, copy(DT)[, let(L1=NULL, L2=NULL)], ans)
20942+
# using set()
20943+
test(2304.061, set(copy(DT), j=c("L1", "L2"), value=NULL), ans)
20944+
test(2304.062, set(copy(DT), j=c("L1", "L2"), value=list(NULL, NULL)), ans)
20945+
20946+
# Removal of a list column in a multi-row data.table, using different assignment methods
20947+
DT = data.table(L=list("A", "B"), i=1L)
20948+
ans = data.table(i=c(1L, 1L))
20949+
# test removing a list column by assigning to NULL
20950+
DT$L = NULL
20951+
test(2304.063, DT, ans)
20952+
DT = data.table(L=list("A", "B"), i=1L)
20953+
# standard form with := operator
20954+
test(2304.064, copy(DT)[, L := NULL], ans)
20955+
# functional form with := operator
20956+
test(2304.065, copy(DT)[, `:=`(L=NULL)], ans)
20957+
# functional form with 'let' alias
20958+
test(2304.066, copy(DT)[, let(L=NULL)], ans)
20959+
# using set()
20960+
test(2304.067, set(copy(DT), j="L", value=NULL), ans)
20961+
20962+
# Removal of multiple list columns in a multi-row data.table, using different assignment methods
20963+
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
20964+
# test removing two list columns by assigning to NULL
20965+
DT$L1 = NULL
20966+
DT$L2 = NULL
20967+
test(2304.068, DT, ans)
20968+
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
20969+
# standard form with := operator
20970+
test(2304.069, copy(DT)[, c("L1", "L2") := NULL], ans)
20971+
test(2304.070, copy(DT)[, c("L1", "L2") := .(NULL, NULL)], ans)
20972+
# functional form with := operator
20973+
test(2304.071, copy(DT)[, `:=`(L1=NULL, L2=NULL)], ans)
20974+
# functional form with 'let' alias
20975+
test(2304.072, copy(DT)[, let(L1=NULL, L2=NULL)], ans)
20976+
# using set()
20977+
test(2304.073, set(copy(DT), j=c("L1", "L2"), value=NULL), ans)
20978+
test(2304.074, set(copy(DT), j=c("L1", "L2"), value=list(NULL, NULL)), ans)
20979+
20980+
# Combining queries (add/remove/replace columns in the same query) for a single-row data.table
20981+
20982+
# test for adding a new empty list column D and removing column L in the same query
20983+
DT = data.table(L=list("A"), i=1L)
20984+
ans = data.table(i=1L, D=list(NULL))
20985+
test(2304.075, copy(DT)[, c("L", "D") := list(NULL, list(NULL))], ans)
20986+
test(2304.076, copy(DT)[, `:=`(L=NULL, D=list(NULL))], ans)
20987+
test(2304.077, copy(DT)[, let(L=NULL, D=list(NULL))], ans)
20988+
test(2304.078, set(copy(DT), j=c("L", "D"), value=list(NULL, list(NULL))), ans)
20989+
20990+
# test for adding a new empty list column D and replacing column L with empty list in the same query
20991+
DT = data.table(L=list("A"), i=1L)
20992+
ans = data.table(L=list(NULL), i=1L, D=list(NULL))
20993+
test(2304.079, copy(DT)[, c("L", "D") := list(list(NULL), list(NULL))], ans)
20994+
test(2304.080, copy(DT)[, `:=`(L=list(NULL), D=list(NULL))], ans)
20995+
test(2304.081, copy(DT)[, let(L=list(NULL), D=list(NULL))], ans)
20996+
test(2304.082, set(copy(DT), j=c("L", "D"), value=list(list(NULL), list(NULL))), ans)
20997+
20998+
# test for replacing column L with an empty list and removing list column D in the same query
20999+
DT = data.table(L=list("A"), D=list("B"), i=1L)
21000+
ans = data.table(L=list(NULL), i=1L)
21001+
test(2304.083, copy(DT)[, c("L", "D") := list(list(NULL), NULL)], ans)
21002+
test(2304.084, copy(DT)[, `:=`(L=list(NULL), D=NULL)], ans)
21003+
test(2304.085, copy(DT)[, let(L=list(NULL), D=NULL)], ans)
21004+
test(2304.086, set(copy(DT), j=c("L", "D"), value=list(list(NULL), NULL)), ans)
21005+
21006+
# test for combining add, replace, remove in the same query
21007+
DT = data.table(L=list("A"), D=list("B"), i=1L)
21008+
ans = data.table(L=list(NULL), i=1L, E=list(NULL))
21009+
test(2304.087, copy(DT)[, c("L", "D", "E") := list(list(NULL), NULL, list(NULL))], ans)
21010+
test(2304.088, copy(DT)[, `:=`(L=list(NULL), D=NULL, E=list(NULL))], ans)
21011+
test(2304.089, copy(DT)[, let(L=list(NULL), D=NULL, E=list(NULL))], ans)
21012+
test(2304.090, set(copy(DT), j=c("L", "D", "E"), value=list(list(NULL), NULL, list(NULL))), ans)
21013+
21014+
# sub-assignment of list column with list(NULL) in a multi-row data.table, using different assignment methods
21015+
DT = data.table(L=list("A", "B"), i=1L)
21016+
ans = data.table(L=list("A", NULL), i=1L)
21017+
# test using replacement with $ operator
21018+
DT$L[2L] = list(NULL)
21019+
test(2304.091, DT, ans)
21020+
DT = data.table(L=list("A", "B"), i=1L)
21021+
# standard form with := operator
21022+
test(2304.092, copy(DT)[2L, L := list(list(NULL))], ans)
21023+
# functional form with := operator
21024+
test(2304.093, copy(DT)[2L, `:=`(L=list(NULL))], ans)
21025+
# functional form with 'let' alias
21026+
test(2304.094, copy(DT)[2L, let(L=list(NULL))], ans)
21027+
# using set()
21028+
test(2304.095, set(copy(DT), i=2L, j="L", value=list(list(NULL))), ans)
21029+
21030+
# sub-assignment of multiple list columns with list(NULL) in a multi-row data.table, using different assignment methods
21031+
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
21032+
ans = data.table(L1=list("A", NULL), L2=list("B", NULL), i=1L)
21033+
DT$L1[2L] = list(NULL)
21034+
DT$L2[2L] = list(NULL)
21035+
test(2304.096, DT, ans)
21036+
DT = data.table(L1=list("A", "B"), L2=list("B", "C"), i=1L)
21037+
# standard form with := operator
21038+
test(2304.097, copy(DT)[2L, c("L1", "L2") := list(list(NULL), list(NULL))], ans)
21039+
# functional form with := operator
21040+
test(2304.098, copy(DT)[2L, `:=`(L1=list(NULL), L2=list(NULL))], ans)
21041+
# functional form with 'let' alias
21042+
test(2304.099, copy(DT)[2L, let(L1=list(NULL), L2=list(NULL))], ans)
21043+
# using set()
21044+
test(2304.100, set(copy(DT), i=2L, j=c("L1", "L2"), value=list(list(NULL), list(NULL))), ans)

man/assign.Rd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,8 @@ When \code{LHS} is a factor column and \code{RHS} is a character vector with ite
7878
Unlike \code{<-} for \code{data.frame}, the (potentially large) LHS is not coerced to match the type of the (often small) RHS. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given when fractional data is truncated. It is best to get the column types correct up front and stick to them. Changing a column type is possible but deliberately harder: provide a whole column as the RHS. This RHS is then \emph{plonked} into that column slot and we call this \emph{plonk syntax}, or \emph{replace column syntax} if you prefer. By needing to construct a full length vector of a new type, you as the user are more aware of what is happening and it is clearer to readers of your code that you really do intend to change the column type; e.g., \code{DT[, colA:=as.integer(colA)]}. A plonk occurs whenever you provide a RHS value to `:=` which is \code{nrow} long. When a column is \emph{plonked}, the original column is not updated by reference because that would entail updating every single element of that column whereas the plonk is just one column pointer update.
7979
8080
\code{data.table}s are \emph{not} copied-on-change by \code{:=}, \code{setkey} or any of the other \code{set*} functions. See \code{\link{copy}}.
81+
82+
While in most cases standard and functional form of \code{:=} are interchangeable, there are some minor differences in the way that \code{RHS} is handled. In the functional form, \code{:=} operator behaves like an alias to \code{list}. This means that when \code{RHS} is a list, \code{LHS} is assigned a list. Avoid this by using the standard form when \code{RHS} is a list, or use a vector. See \href{../doc/datatable-reference-semantics.html}{\code{vignette("datatable-reference-semantics")}} for examples.
8183
}
8284
8385
\section{Advanced (internals):}{It is easy to see how \emph{sub-assigning} to existing columns is done internally. Removing columns by reference is also straightforward by modifying the vector of column pointers only (using memmove in C). However adding (new) columns is more tricky as to how the \code{data.table} can be grown \emph{by reference}: the list vector of column pointers is \emph{over-allocated}, see \code{\link{truelength}}. By defining \code{:=} in \code{j} we believe update syntax is natural, and scales, but it also bypasses \code{[<-} dispatch and allows \code{:=} to update by reference with no copies of any part of memory at all.

src/assign.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -549,6 +549,11 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
549549
for (int i=0; i<length(cols); ++i) {
550550
coln = INTEGER(cols)[i]-1;
551551
SEXP thisvalue = RHS_list_of_columns ? VECTOR_ELT(values, i) : values;
552+
// if values is list(NULL), then replace with a list of NULLs instead of deleting, #5558
553+
if (RHS_list_of_columns && length(values)==1 && TYPEOF(VECTOR_ELT(values, 0))==NILSXP && coln < oldncol) {
554+
SET_VECTOR_ELT(dt, coln, targetcol=allocNAVector(VECSXP, length(VECTOR_ELT(dt, coln))));
555+
continue;
556+
}
552557
if (TYPEOF(thisvalue)==NILSXP) {
553558
if (!isNull(rows)) internal_error(__func__, "earlier error 'When deleting columns, i should not be provided' did not happen"); // # nocov
554559
ndelete++;

0 commit comments

Comments
 (0)