Skip to content

Commit bc7f5af

Browse files
czeildiben-schwen
andauthored
Keep existing key(s) in [ even when new column(s) are added before them (#7391)
* add tests and fix * extract DT setup from test * add news item * do not use \(x) fun style * remove unnecessary local() * apply formatting suggestion * add test case for join result and multiple keys * clarify test comment * style change in fix * rephrase news --------- Co-authored-by: Benjamin Schwendinger <[email protected]>
1 parent 738779b commit bc7f5af

File tree

3 files changed

+21
-1
lines changed

3 files changed

+21
-1
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,8 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
340340
341341
20. `forderv` could segfault on keys with long runs of identical bytes (e.g., many duplicate columns) because the single-group branch tail-recursed radix-by-radix until the C stack ran out, [#4300](https://github.com/Rdatatable/data.table/issues/4300). This is a major problem since sorting is extensively used in `data.table`. Thanks @quantitative-technologies for the report and @ben-schwen for the fix.
342342
343+
21. `[` now preserves existing key(s) when new columns are added before them, instead of incorrectly setting a new column as key, [#7364](https://github.com/Rdatatable/data.table/issues/7364). Thanks @czeildi for the bug report and the fix.
344+
343345
### NOTES
344346
345347
1. The following in-progress deprecations have proceeded:

R/data.table.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1448,7 +1448,7 @@ replace_dot_alias = function(e) {
14481448
if (SD_only)
14491449
jvnames = jnames = sdvars
14501450
else
1451-
jnames = as.character(Filter(is.name, jsub)[-1L])
1451+
jnames = vapply_1c(jsub, function(x) if (is.name(x)) as.character(x) else NA_character_)[-1L]
14521452
key_idx = chmatch(key, jnames)
14531453
missing_keys = which(is.na(key_idx))
14541454
if (length(missing_keys) && missing_keys[1L] == 1L) return(NULL)

inst/tests/tests.Rraw

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21837,3 +21837,21 @@ DT[, V1000 := 20:1]
2183721837
test(2343.1, forderv(DT, by=names(DT), sort=FALSE, retGrp=TRUE), forderv(DT, by=c("V1", "V1000"), sort=FALSE, retGrp=TRUE))
2183821838
x = c(rep(0, 7e5), 1e6)
2183921839
test(2343.2, forderv(list(x)), integer(0))
21840+
21841+
# Keep key when new column added before existing key in j
21842+
# Incorrect key can lead to incorrect join result #7364
21843+
DT = data.table(V1 = 1:2, key = "V1")
21844+
test(2344.00, key(DT[, .(V2 = c("b", "a"), V1)]), "V1")
21845+
test(2344.01, key(DT[, .(V2 = -V1, V1)]), "V1")
21846+
21847+
d1 = data.table(V1 = c(1L, 0L, 1L), V2 = c("a", "a", "b"), key = "V2")
21848+
d2 = d1[, .(V1, label = c("one", "zero", "one"), V2)]
21849+
r = d2[data.table(label = "one"), on = "label", allow.cartesian = TRUE]
21850+
test(2344.02, nrow(r), 2L)
21851+
# join result of keyed input is the same as unkeyed input
21852+
test(2344.03, setkey(d1[, .(V1, label = c("one", "zero", "one"), V2)][data.table(label = "one"), on = "label", allow.cartesian = TRUE], NULL),
21853+
setkey(d1, NULL)[, .(V1, label = c("one", "zero", "one"), V2)][data.table(label = "one"), on = "label", allow.cartesian = TRUE])
21854+
21855+
# keep sub-key in case of multiple keys, even with new columns and changing column order
21856+
DT = data.table(V1 = 1:2, V2 = 3:4, V3 = 5:6, key = c("V1", "V2", "V3"))
21857+
test(2344.04, key(DT[, .(V4 = c("b", "a"), V2, V5 = c("y", "x"), V1)]), c("V1", "V2"))

0 commit comments

Comments
 (0)