Skip to content

Commit 03c647f

Browse files
vlullaben-schwenMichaelChirico
authored
List size for large data.tables (#6609)
* List size for large data.tables Fixes #6607 * Edits based on guidance from @MichaelChirico * Explicitly list the size as a numeric instead of integer. Modified NEWS and DESCRIPTION files to include details pertaining to the PR. * Better phrasing of the bug fix. Great suggestion by @ben-schwen! * Update inst/tests/other.Rraw * more specific about 2GiB size for "large" --------- Co-authored-by: Benjamin Schwendinger <[email protected]> Co-authored-by: Michael Chirico <[email protected]>
1 parent e308dcb commit 03c647f

File tree

4 files changed

+12
-2
lines changed

4 files changed

+12
-2
lines changed

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,5 +97,6 @@ Authors@R: c(
9797
person("Vincent", "Runge", role="ctb"),
9898
person("Christian", "Wia", role="ctb"),
9999
person("Elise", "Maigné", role="ctb"),
100-
person("Vincent", "Rocher", role="ctb")
100+
person("Vincent", "Rocher", role="ctb"),
101+
person("Vijay", "Lulla", role="ctb")
101102
)

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@ rowwiseDT(
105105
106106
10. `DT[1, on=NULL]` now works for returning the first row, [#6579](https://github.com/Rdatatable/data.table/issues/6579). Thanks to @Kodiologist for the report and @tdhock for the PR.
107107
108+
11. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
109+
108110
## NOTES
109111
110112
1. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix.

R/tables.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ type_size = function(DT) {
55
# for speed and ram efficiency, a lower bound by not descending into character string lengths or list items
66
# if a more accurate and higher estimate is needed then user can pass object.size or alternative to mb=
77
# in case number of columns is very large (e.g. 1e6 columns) then we use a for() to avoid allocation of sapply()
8-
ans = 0L
8+
ans = 0.0
99
lookup = c("raw"=1L, "integer"=4L, "double"=8L, "complex"=16L)
1010
for (i in seq_along(DT)) {
1111
col = DT[[i]]

inst/tests/other.Rraw

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -766,3 +766,10 @@ if (loaded[["nanotime"]]) {
766766
# respect dec=',' for nanotime, related to #6446, corresponding to tests 2281.*
767767
test(31, fwrite(data.table(as.nanotime(.POSIXct(0))), dec=',', sep=';'), output="1970-01-01T00:00:00,000000000Z")
768768
}
769+
770+
# tables() with large environment #6607
771+
.e <- new.env() ## to not populate the .GlobalEnv
772+
.e[["DT"]] <- as.data.table(lapply(1:15,function(i) runif(20e6)))
773+
res <- tables(env=.e)
774+
test(32, res[, .(NAME,NROW,NCOL,MB)], data.table(NAME="DT",NROW=20000000L,NCOL=15L,MB=2288.0))
775+
rm(.e, res)

0 commit comments

Comments
 (0)