Skip to content

Commit 2273004

Browse files
authored
Merge branch 'Rdatatable:master' into error-enhance
2 parents e7ce3ee + bfccc23 commit 2273004

File tree

5 files changed

+36
-28
lines changed

5 files changed

+36
-28
lines changed

NEWS.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -109,31 +109,29 @@ rowwiseDT(
109109

110110
8. Fixed possible segfault in `setDT(df); attr(df, key) <- value; set(df, ...)`, i.e. adding columns to an object with `set()` that was converted to data.table with `setDT()` and later had attributes add with `attr<-`, [#6410](https://github.com/Rdatatable/data.table/issues/6410). Thanks to @hongyuanjia for the report and @ben-schwen for the PR. Note that `setattr()` should be preferred for adding attributes to a data.table.
111111

112-
9. `setDT()` no longer modifies the class of other names bound to the origin data.frame, e.g., in `DF1 <- data.frame(a=1); DF2 <- DF1; setDT(DF2)`, `DF1`'s class will not change. [#4784](https://github.com/Rdatatable/data.table/issues/4784). Thanks @OfekShilon for the report and fix.
112+
9. `DT[1, on=NULL]` now works for returning the first row, [#6579](https://github.com/Rdatatable/data.table/issues/6579). Thanks to @Kodiologist for the report and @tdhock for the PR.
113113

114-
10. `DT[1, on=NULL]` now works for returning the first row, [#6579](https://github.com/Rdatatable/data.table/issues/6579). Thanks to @Kodiologist for the report and @tdhock for the PR.
114+
10. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
115115

116-
11. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
116+
11. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
117117

118-
12. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
118+
12. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
119119

120-
13. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
121-
122-
14. Auto-printing gets some substantial improvements
120+
13. Auto-printing gets some substantial improvements
123121
- Suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). The old way was fragile and wound up broken by some implementation changes in {knitr}. Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
124122
- `print()` methods for S3 subclasses of data.table (e.g. an object of class `c("my.table", "data.table", "data.frame")`) no longer print where plain data.tables wouldn't, e.g. `myDT[, y := 2]`, [#3029](https://github.com/Rdatatable/data.table/issues/3029). The improved detection of auto-printing scenarios has the added benefit of _allowing_ print in highly explicit statements like `print(DT[, y := 2])`, obviating our recommendation since v1.9.6 to append `[]` to signal "please print me".
125123
126-
15. Joins of `integer64` and `double` columns succeed when the `double` column has lossless `integer64` representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
124+
14. Joins of `integer64` and `double` columns succeed when the `double` column has lossless `integer64` representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
127125
128-
16. `DT[order(...)]` better matches `base::order()` behavior by (1) recognizing the `method=` argument (and erroring since this is not supported) and (2) accepting a vector of `TRUE`/`FALSE` in `decreasing=` as an alternative to using `-a` to convey "sort `a` decreasing", [#4456](https://github.com/Rdatatable/data.table/issues/4456). Thanks @jangorecki for the FR and @MichaelChirico for the PR.
126+
15. `DT[order(...)]` better matches `base::order()` behavior by (1) recognizing the `method=` argument (and erroring since this is not supported) and (2) accepting a vector of `TRUE`/`FALSE` in `decreasing=` as an alternative to using `-a` to convey "sort `a` decreasing", [#4456](https://github.com/Rdatatable/data.table/issues/4456). Thanks @jangorecki for the FR and @MichaelChirico for the PR.
129127
130-
17. Assignment with `:=` to an S4 slot of an under-allocated data.table now works, [#6704](https://github.com/Rdatatable/data.table/issues/6704). Thanks @MichaelChirico for the report and fix.
128+
16. Assignment with `:=` to an S4 slot of an under-allocated data.table now works, [#6704](https://github.com/Rdatatable/data.table/issues/6704). Thanks @MichaelChirico for the report and fix.
131129
132-
18. `as.data.table()` method for `data.frame`s (especially those with extended classes) is more consistent with `as.data.frame()` with respect to rention of attributes, [#5699](https://github.com/Rdatatable/data.table/issues/5699). Thanks @jangorecki for the report and fix.
130+
17. `as.data.table()` method for `data.frame`s (especially those with extended classes) is more consistent with `as.data.frame()` with respect to rention of attributes, [#5699](https://github.com/Rdatatable/data.table/issues/5699). Thanks @jangorecki for the report and fix.
133131
134-
19. Grouped queries on keyed tables no longer return an incorrectly keyed result if the _ad hoc_ `by=` list has some function call (in particular, a function which happens to return a strictly decreasing function of the keys), e.g. `by=.(a = rev(a))`, [#5583](https://github.com/Rdatatable/data.table/issues/5583). Thanks @AbrJA for the report and @MichaelChirico for the fix.
132+
18. Grouped queries on keyed tables no longer return an incorrectly keyed result if the _ad hoc_ `by=` list has some function call (in particular, a function which happens to return a strictly decreasing function of the keys), e.g. `by=.(a = rev(a))`, [#5583](https://github.com/Rdatatable/data.table/issues/5583). Thanks @AbrJA for the report and @MichaelChirico for the fix.
135133
136-
20. Assigning `list(NULL)` to a list column now replaces the column with `list(NULL)`, instead of deleting the column [#5558](https://github.com/Rdatatable/data.table/issues/5558). This behavior is now consistent with base `data.frame`. Thanks @tdhock for the report and @joshhwuu for the fix. This is due to a fundamental ambiguity from both allowing list columns _and_ making the use of `list()` to wrap `j=` arguments optional. We think that the code behaves as expected in all cases now. See the below for some illustration:
134+
19. Assigning `list(NULL)` to a list column now replaces the column with `list(NULL)`, instead of deleting the column [#5558](https://github.com/Rdatatable/data.table/issues/5558). This behavior is now consistent with base `data.frame`. Thanks @tdhock for the report and @joshhwuu for the fix. This is due to a fundamental ambiguity from both allowing list columns _and_ making the use of `list()` to wrap `j=` arguments optional. We think that the code behaves as expected in all cases now. See the below for some illustration:
137135
138136
```r
139137
DT = data.table(L=list(1L), i=2L, c='a')
@@ -154,7 +152,7 @@ rowwiseDT(
154152
DT[, c('L', 'i') := list(NULL, 3L)] # delete L, assign to i
155153
DT[, c('L', 'i') := list(list(NULL), NULL)] # assign to L, delete i
156154
```
157-
21. An integer overflow in `fread()` with lines longer than `2^(31/2)` bytes is prevented, [#6729](https://github.com/Rdatatable/data.table/issues/6729). The typical impact was no worse than a wrong initial allocation size, corrected later. Thanks to @TaikiSan21 for the report and @aitap for the fix.
155+
20. An integer overflow in `fread()` with lines longer than `2^(31/2)` bytes is prevented, [#6729](https://github.com/Rdatatable/data.table/issues/6729). The typical impact was no worse than a wrong initial allocation size, corrected later. Thanks to @TaikiSan21 for the report and @aitap for the fix.
158156
159157
## NOTES
160158

R/data.table.R

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1188,7 +1188,7 @@ replace_dot_alias = function(e) {
11881188
# ok=-1 which will trigger setalloccol with verbose in the next
11891189
# branch, which again calls _selfrefok and returns the message then
11901190
if ((ok<-selfrefok(x, verbose=FALSE))==0L) # ok==0 so no warning when loaded from disk (-1) [-1 considered TRUE by R]
1191-
if (is.data.table(x)) warningf("Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.")
1191+
if (is.data.table(x)) warningf("A shallow copy of this data.table was taken so that := can add or remove %d columns by reference. At an earlier point, this data.table was copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. It's also not unusual for data.table-agnostic packages to produce tables affected by this issue. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.", length(newnames))
11921192
# !is.data.table for DF |> DT(,:=) tests 2212.16-19 (#5113) where a shallow copy is routine for data.frame
11931193
if ((ok<1L) || (truelength(x) < ncol(x)+length(newnames))) {
11941194
DT = x # in case getOption contains "ncol(DT)" as it used to. TODO: warn and then remove
@@ -2918,9 +2918,6 @@ setDT = function(x, keep.rownames=FALSE, key=NULL, check.names=FALSE) {
29182918
}
29192919
}
29202920

2921-
# Done to avoid affecting other copies of x when we setattr() below (#4784)
2922-
x = .shallow(x)
2923-
29242921
rn = if (!identical(keep.rownames, FALSE)) rownames(x) else NULL
29252922
setattr(x, "row.names", .set_row_names(nrow(x)))
29262923
if (check.names) setattr(x, "names", make.names(names(x), unique=TRUE))

inst/tests/tests.Rraw

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7066,7 +7066,7 @@ ee = new.env()
70667066
ee$DT = data.frame(x=1L, y=1:3)
70677067
setattr(ee$DT, 'class', c("data.table", "data.frame"))
70687068
test(1482.1, truelength(ee$DT), 0L) # make sure that the simulated environment is right.
7069-
test(1482.2, ee$DT[, z := 3:1], data.table(x=1L, y=1:3, z=3:1), warning="Invalid .internal.selfref detected and")
7069+
test(1482.2, ee$DT[, z := 3:1], data.table(x=1L, y=1:3, z=3:1), warning="A shallow copy of this data.table was taken")
70707070
test(1482.3, truelength(ee$DT), 1027L)
70717071
test(1482.4, ee$DT[, za := 4:6], data.table(x=1L, y=1:3, z=3:1, za=4:6))
70727072
test(1482.5, truelength(ee$DT), 1027L) # should have used spare slot i.e. no increase in tl
@@ -14942,7 +14942,7 @@ test(2037.1, foo(DT), output='Please remember to always setDT()')
1494214942
# no assignment was made to DT
1494314943
test(2037.2, names(DT), 'a')
1494414944
# _selrefok() verbose message was duplicated
14945-
test(2037.3, unname(table(unlist(strsplit(capture.output(foo(DT)), '\n|\\s+')))['ptr']), 1L)
14945+
test(2037.3, foo(DT), output="data.table internal attributes", notOutput="data.table internal attributes.*data.table internal attributes")
1494614946

1494714947
# `between` invalid args, and verbose #3516
1494814948
test(2038.01, between(1:5, 2, 4, incbounds=423), error="incbounds must be TRUE or FALSE")
@@ -20650,13 +20650,14 @@ test(2294.72,
2065020650
label = list(character = "C3", VCharA = "Total", integer = 2L))),
2065120651
warning = "For the following variables, the 'label' value was already in the data: [VCharB (label: C3), VIntA (label: 2)]")
2065220652

20653+
# tests 1-3 disabled -- fix for #4784 causes various breaking changes, at least partially covered by 2295.4+.
2065320654
# setDT no longer leaks class modification to origin copy, #4784
20654-
d1 = data.frame(a=1, row.names='b')
20655-
d2 = d1
20656-
setDT(d2)
20657-
test(2295.1, !is.data.table(d1))
20658-
test(2295.2, rownames(d1), 'b')
20659-
test(2295.3, is.data.table(d2))
20655+
# d1 = data.frame(a=1, row.names='b')
20656+
# d2 = d1
20657+
# setDT(d2)
20658+
# test(2295.1, !is.data.table(d1))
20659+
# test(2295.2, rownames(d1), 'b')
20660+
# test(2295.3, is.data.table(d2))
2066020661
# Ensure against regression noted in #6725
2066120662
x = data.frame(a=1)
2066220663
e = environment()
@@ -20669,6 +20670,18 @@ e = new.env(parent=topenv())
2066920670
e$x = data.frame(a=1)
2067020671
foo('x', e)
2067120672
test(2295.5, is.data.table(e$x))
20673+
# More regressions noted in #6735
20674+
baz = function(x) setDT(x)
20675+
foo = function(x) {
20676+
bar = function() baz(x)
20677+
x = data.frame(a=1)
20678+
bar()
20679+
is.data.table(x)
20680+
}
20681+
test(2295.6, foo())
20682+
x = data.frame(a=1)
20683+
baz(x)
20684+
test(2295.7, is.data.table(x))
2067220685

2067320686
# #6588: .checkTypos used to give arbitrary strings to stopf as the first argument
2067420687
test(2296, d2[x %no such operator% 1], error = '%no such operator%')

man/all.equal.data.table.Rd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
}
2525

2626
\item{check.attributes}{
27-
A logical indicating whether or not to check attributes, will apply not only to data.table but also attributes of the columns. It will skip \code{c("row.names",".internal.selfref")} data.table attributes.
27+
A logical indicating whether or not to check attributes. Note that this will apply not only to the data.tables, but also to attributes of the columns. \code{"row.names"} and any internal data.table attributes are always skipped.
2828
}
2929

3030
\item{ignore.col.order}{

src/assign.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ static int _selfrefok(SEXP x, Rboolean checkNames, Rboolean verbose) {
119119
}
120120
p = R_ExternalPtrAddr(v);
121121
if (p==NULL) {
122-
if (verbose) Rprintf(_(".internal.selfref ptr is NULL. This is expected and normal for a data.table loaded from disk. Please remember to always setDT() immediately after loading to prevent unexpected behavior. If this table was not loaded from disk or you've already run setDT(), please report to data.table issue tracker.\n"));
122+
if (verbose) Rprintf(_("The data.table internal attributes of this table are invalid. This is expected and normal for a data.table loaded from disk. Please remember to always setDT() immediately after loading to prevent unexpected behavior. If this table was not loaded from disk or you've already run setDT(), please report to the data.table issue tracker.\n"));
123123
return -1;
124124
}
125125
if (!isNull(p)) internal_error(__func__, ".internal.selfref ptr is neither NULL nor R_NilValue"); // # nocov

0 commit comments

Comments
 (0)