Skip to content

Commit f0abdc4

Browse files
authored
Merge branch 'master' into dataCodeIntegration3
2 parents 8d20e2c + 345e336 commit f0abdc4

File tree

11 files changed

+135
-58
lines changed

11 files changed

+135
-58
lines changed

.github/workflows/pkgup.yaml

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ on:
1212
branches:
1313
- 'master'
1414

15-
name: pkgdown-deploy
15+
name: pkgup-deploy
1616

1717
jobs:
1818
build:
@@ -47,16 +47,9 @@ jobs:
4747
- name: manual
4848
if: github.ref == 'refs/heads/master'
4949
run: |
50-
cp -R ${{ env.R_LIBS_USER }} library
51-
R CMD INSTALL --library="library" $(ls -1t data.table_*.tar.gz | head -n 1) --html
52-
mkdir -p doc/html
53-
cp $(R RHOME)/doc/html/{left.jpg,up.jpg,Rlogo.svg,R.css,index.html} doc/html
54-
Rscript -e 'utils::make.packages.html("library", docdir="doc")'
55-
sed -i "s|file://|../..|g" doc/html/packages.html
50+
R CMD INSTALL $(ls -1t data.table_*.tar.gz | head -n 1)
5651
mkdir -p public
57-
mv doc public/doc
58-
cp -r --parents library/*/{html,help,doc,demo,DESCRIPTION,README,NEWS,README.md,NEWS.md} public 2>/dev/null || :
59-
sed -i 's|"/doc/html/|"/data.table/doc/html/|g' public/library/data.table/doc/index.html 2>/dev/null || :
52+
Rscript -e 'tools::pkg2HTML("data.table", out="public/index.html")'
6053
- name: repo
6154
if: github.ref == 'refs/heads/master'
6255
run: |

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Version: 1.17.99
33
Title: Extension of `data.frame`
44
Depends: R (>= 3.4.0)
55
Imports: methods
6-
Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), R.utils, xts, zoo (>= 1.8-1), yaml, knitr, markdown
6+
Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), R.utils (>= 2.13.0), xts, zoo (>= 1.8-1), yaml, knitr, markdown
77
Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
88
License: MPL-2.0 | file LICENSE
99
URL: https://r-datatable.com, https://Rdatatable.gitlab.io/data.table, https://github.com/Rdatatable/data.table

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ S3method(format_list_item, data.frame)
215215
export(fdroplevels, setdroplevels)
216216
S3method(droplevels, data.table)
217217
export(frev)
218+
export(.selfref.ok)
218219

219220
# sort_by added in R 4.4.0, #6662, https://stat.ethz.ch/pipermail/r-announce/2024/000701.html
220221
if (getRversion() >= "4.4.0") S3method(sort_by, data.table)

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,8 @@
311311
312312
6. Using a double vector in `set()`'s `i=` and/or `j=` no longer throws a warning about preferring integer, [#6594](https://github.com/Rdatatable/data.table/issues/6594). While it may improve efficiency to use integer, there's no guarantee it's an improvement and the difference is likely to be minimal. The coercion will still be reported under `datatable.verbose=TRUE`. For package/production use cases, static analyzers such as `lintr::implicit_integer_linter()` can also report when numeric literals should be rewritten as integer literals.
313313

314+
7. In rare situations a data.table object may lose its internal attribute that holds a self-reference. New helper function `.selfref.ok()` tests just that. It is only intended for technical use cases. See manual for examples.
315+
314316
## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
315317

316318
1. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070).

R/fread.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,9 @@ yaml=FALSE, tmpdir=tempdir(), tz="UTC")
119119
if ((w <- endsWithAny(file, c(".gz", ".bgz",".bz2"))) || (gzsig <- is_gzip(file_signature)) || is_bzip(file_signature)) {
120120
if (!requireNamespace("R.utils", quietly = TRUE))
121121
stopf("To read %s files directly, fread() requires 'R.utils' package which cannot be found. Please install 'R.utils' using 'install.packages('R.utils')'.", if (w<=2L || gzsig) "gz" else "bz2") # nocov
122+
# not worth doing a behavior test here, so just use getRversion().
123+
if (packageVersion("R.utils") < "2.13.0" && base::getRversion() >= "4.5.0")
124+
stopf("Reading compressed files in fread requires R.utils version 2.13.0 or higher. Please upgrade R.utils.") # nocov
122125
FUN = if (w<=2L || gzsig) gzfile else bzfile
123126
decompFile = tempfile(tmpdir=tmpdir)
124127
on.exit(unlink(decompFile), add=TRUE)

R/helpers.R

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# here we have helpers functions that DT exports, on contrary to utils.R file
2+
3+
# convert char to factor retaining order #4837
4+
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
5+
if (!isTRUEorFALSE(sort))
6+
stopf("argument 'sort' must be TRUE or FALSE")
7+
if (!isTRUEorFALSE(rev))
8+
stopf("argument 'rev' must be TRUE or FALSE")
9+
if (sort) levels = sort(levels)
10+
if (rev) levels = frev(levels)
11+
factor(x, levels=levels, ...)
12+
}
13+
14+
# add a function for validating data.tables that might need setDT #7329
15+
.selfref.ok = function(x) {
16+
if (!is.data.table(x))
17+
stopf(".selfref.ok expects data.table class object.")
18+
selfrefok(x, verbose=FALSE) > 0L
19+
}

R/utils.R

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -220,17 +220,6 @@ rss = function() { #5515 #5517
220220
# nocov end
221221
}
222222

223-
# convert char to factor retaining order #4837
224-
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
225-
if (!isTRUEorFALSE(sort))
226-
stopf("argument 'sort' must be TRUE or FALSE")
227-
if (!isTRUEorFALSE(rev))
228-
stopf("argument 'rev' must be TRUE or FALSE")
229-
if (sort) levels = sort(levels)
230-
if (rev) levels = frev(levels)
231-
factor(x, levels=levels, ...)
232-
}
233-
234223
formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
235224
terms <- terms(f)
236225
setNames(

inst/tests/froll.Rraw

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1798,7 +1798,7 @@ if (use.fork) {
17981798
setDTthreads(2)
17991799
test(6010.763, frollapply(c(1, 9), N=1L, FUN=identity), c(1,9)) ## good only because threads >= input
18001800
test(6010.764, frollapply(c(1, 5, 9), N=1L, FUN=identity), c(5,5,9)) ## unexpected again
1801-
is.ok = function(x) {stopifnot(is.data.table(x)); capture.output(print(attr(x, ".internal.selfref", TRUE)))!="<pointer: (nil)>"}
1801+
is.ok = function(x) {stopifnot(is.data.table(x)); out=capture.output(print(attr(x, ".internal.selfref", TRUE))); out!="<pointer: (nil)>" && out!="<pointer: 0x0>"}
18021802
ans = frollapply(1:2, 2, data.table, simplify=FALSE) ## default: fill=NA
18031803
test(6010.770, is.ok(ans[[2L]])) ## frollapply detected DT and fixed
18041804
ans = frollapply(1:2, 2, data.table, fill=data.table(NA)) ## fill type match

inst/tests/tests.Rraw

Lines changed: 64 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,20 @@ if (!test_longdouble) {
147147
tt = Sys.getenv("TZ", unset=NA)
148148
TZnotUTC = !identical(tt,"") && !is_utc(tt)
149149

150+
## nice utf8 testing code added in #7210 - to be called from inside local() !
151+
utf8_check_expr = quote(l10n_info()$`UTF-8` || {
152+
lc_ctype = Sys.getlocale('LC_CTYPE')
153+
lc_wantctype = 'en_US.UTF-8'
154+
# Japanese multibyte characters require utf8. As of 2025, we're likely to be already running in a UTF-8 locale, but if not, try this setlocale() call as a last chance.
155+
# Unfortunately, there is no guaranteed, portable way of switching to UTF-8 US English.
156+
# Avoid the warning upon possible failure, #7210.
157+
lc_newctype = suppressWarnings(Sys.setlocale('LC_CTYPE', lc_wantctype))
158+
if (identical(lc_newctype, lc_wantctype)) {
159+
on.exit(Sys.setlocale('LC_CTYPE', lc_ctype))
160+
TRUE
161+
} else FALSE
162+
})
163+
150164
# generate simple error messages from base that are checked against in our tests. this helps
151165
# protect us against these messages evolving in base in the future, and against these messages
152166
# potentially not being produced in English.
@@ -17638,7 +17652,11 @@ test(2194.5, endsWithAny(NA_character_, 'a'), FALSE)
1763817652
test(2194.6, endsWithAny(character(), 'a'), error="Internal error.*types or lengths incorrect")
1763917653
# file used in encoding tests
1764017654
txt = readLines(testDir("issue_563_fread.txt"))
17641-
test(2194.7, endsWithAny(txt, 'B'), error="Internal error.*types or lengths incorrect") # txt is length 5
17655+
local(if (eval(utf8_check_expr)) {
17656+
test(2194.7, endsWithAny(txt, 'B'), error="Internal error.*types or lengths incorrect") # txt is length 5
17657+
} else {
17658+
cat("Test 2194.7 skipped because it needs a UTF-8 locale.\n")
17659+
})
1764217660
test(2194.8, endsWith('abcd', 'd'), error="Internal error.*use endsWithAny")
1764317661

1764417662
# uniqueN(x, by=character()) was internal error, #4594
@@ -18619,19 +18637,8 @@ test(2252.1, dt[, b:=2L], error = "\\[ was called on a data.table.*not data.tabl
1861918637
test(2252.2, dt[, let(b=2L)], error = "\\[ was called on a data.table.*not data.table-aware.*'let'")
1862018638
rm(.datatable.aware)
1862118639

18622-
# tests for trunc.char handling wide characters # 5096
18623-
local(if (l10n_info()$`UTF-8` || {
18624-
lc_ctype = Sys.getlocale('LC_CTYPE')
18625-
lc_wantctype = 'en_US.UTF-8'
18626-
# Japanese multibyte characters require utf8. As of 2025, we're likely to be already running in a UTF-8 locale, but if not, try this setlocale() call as a last chance.
18627-
# Unfortunately, there is no guaranteed, portable way of switching to UTF-8 US English.
18628-
# Avoid the warning upon possible failure, #7210.
18629-
lc_newctype = suppressWarnings(Sys.setlocale('LC_CTYPE', lc_wantctype))
18630-
if (identical(lc_newctype, lc_wantctype)) {
18631-
on.exit(Sys.setlocale('LC_CTYPE', lc_ctype))
18632-
TRUE
18633-
} else FALSE
18634-
}) {
18640+
# tests for trunc.char handling wide characters #5096
18641+
local(if (eval(utf8_check_expr)) {
1863518642
accented_a = "\u0061\u0301"
1863618643
ja_ichi = "\u4E00"
1863718644
ja_ni = "\u4E8C"
@@ -20820,17 +20827,21 @@ x = data.table(a=1, b=2L)
2082020827
y = data.table(c=1.5, d=1L)
2082120828
test(2297.31, y[x, on=.(c == a, d == a), nomatch=NULL], output="Empty data.table (0 rows and 3 cols): c,d,b")
2082220829

20823-
# rbindlist(l, use.names=TRUE) should handle different colnames encodings #5452
20824-
x = data.table(a = 1, b = 2, c = 3)
20825-
y = data.table(x = 4, y = 5, z = 6)
20826-
# a-umlaut, o-umlaut, u-umlaut
20827-
setnames(x , c("\u00e4", "\u00f6", "\u00fc"))
20828-
setnames(y , iconv(c("\u00f6", "\u00fc", "\u00e4"), from = "UTF-8", to = "latin1"))
20829-
test(2298.1, rbindlist(list(x,y), use.names=TRUE), data.table("\u00e4"=c(1,6), "\u00f6"=c(2,4), "\u00fc"=c(3,5)))
20830-
test(2298.2, rbindlist(list(y,x), use.names=TRUE), data.table("\u00f6"=c(4,2), "\u00fc"=c(5,3), "\u00e4"=c(6,1)))
20831-
set(y, j="\u00e4", value=NULL)
20832-
test(2298.3, rbindlist(list(x,y), use.names=TRUE, fill=TRUE), data.table("\u00e4"=c(1,NA), "\u00f6"=c(2,4), "\u00fc"=c(3,5)))
20833-
test(2298.4, rbindlist(list(y,x), use.names=TRUE, fill=TRUE), data.table("\u00f6"=c(4,2), "\u00fc"=c(5,3), "\u00e4"=c(NA,1)))
20830+
local(if (eval(utf8_check_expr)) {
20831+
# rbindlist(l, use.names=TRUE) should handle different colnames encodings #5452
20832+
x = data.table(a = 1, b = 2, c = 3)
20833+
y = data.table(x = 4, y = 5, z = 6)
20834+
# a-umlaut, o-umlaut, u-umlaut
20835+
setnames(x , c("\u00e4", "\u00f6", "\u00fc"))
20836+
setnames(y , iconv(c("\u00f6", "\u00fc", "\u00e4"), from = "UTF-8", to = "latin1"))
20837+
test(2298.1, rbindlist(list(x,y), use.names=TRUE), data.table("\u00e4"=c(1,6), "\u00f6"=c(2,4), "\u00fc"=c(3,5)))
20838+
test(2298.2, rbindlist(list(y,x), use.names=TRUE), data.table("\u00f6"=c(4,2), "\u00fc"=c(5,3), "\u00e4"=c(6,1)))
20839+
set(y, j="\u00e4", value=NULL)
20840+
test(2298.3, rbindlist(list(x,y), use.names=TRUE, fill=TRUE), data.table("\u00e4"=c(1,NA), "\u00f6"=c(2,4), "\u00fc"=c(3,5)))
20841+
test(2298.4, rbindlist(list(y,x), use.names=TRUE, fill=TRUE), data.table("\u00f6"=c(4,2), "\u00fc"=c(5,3), "\u00e4"=c(NA,1)))
20842+
} else {
20843+
cat("Tests 2298.* skipped because they need a UTF-8 locale.\n")
20844+
})
2083420845

2083520846
# #6592: printing nested single-column frames
2083620847
test(2299.01, format_list_item(data.frame(a=1)), output="<data.frame[1x1]>")
@@ -21589,18 +21600,20 @@ test(2335.5, isoyear("2019-12-30"), 2020L)
2158921600
test(2335.6, isoyear(as.Date("2019-12-30")), 2020L)
2159021601

2159121602
# t1-t2 for Date/IDate should be consistent, modulo storage mode #4979
21592-
t1 = as.IDate("2025-07-01")
21593-
t2 = as.IDate("2025-06-01")
21594-
test(2336.1, all.equal(as.Date(t1) - as.Date(t2), t1 - t2))
21595-
test(2336.2, all.equal(as.Date(t2) - as.Date(t1), t2 - t1))
21596-
test(2336.3, all.equal(as.Date(t1) - t2, t1 - t2))
21597-
test(2336.4, all.equal(as.Date(t2) - t1, t2 - t1))
21598-
test(2336.5, all.equal(t1 - as.Date(t2), t1 - t2))
21599-
test(2336.6, all.equal(t2 - as.Date(t1), t2 - t1))
21603+
if (base::getRversion() >= "4.3.0") { ## follow up of #7213, see #7321
21604+
t1 = as.IDate("2025-07-01")
21605+
t2 = as.IDate("2025-06-01")
21606+
test(2336.1, all.equal(as.Date(t1) - as.Date(t2), t1 - t2))
21607+
test(2336.2, all.equal(as.Date(t2) - as.Date(t1), t2 - t1))
21608+
test(2336.3, all.equal(as.Date(t1) - t2, t1 - t2))
21609+
test(2336.4, all.equal(as.Date(t2) - t1, t2 - t1))
21610+
test(2336.5, all.equal(t1 - as.Date(t2), t1 - t2))
21611+
test(2336.6, all.equal(t2 - as.Date(t1), t2 - t1))
21612+
}
2160021613

2160121614
# fwrite: allow dec=',' with single column, #7227
2160221615
test(2337.1, fwrite(data.table(1), dec=","), NULL)
21603-
if (getRversion() >= "4.0.0") { # rely on stopifnot(named = ...) for correct message
21616+
if (base::getRversion() >= "4.0.0") { # rely on stopifnot(named = ...) for correct message
2160421617
test(2337.2, fwrite(data.table(0.1, 0.2), dec=",", sep=","), error = "dec and sep must be distinct")
2160521618
}
2160621619
test(2337.3, is.null(fwrite(data.table(c(0.1, 0.2)), dec=",", sep="\t")))
@@ -21655,3 +21668,20 @@ local({
2165521668
...123 = 'a'
2165621669
test(2339.11, DT[, .....123], DT)
2165721670
})
21671+
21672+
# Add a function for validating data.tables that might need setDT #7329
21673+
test(2340.00, .selfref.ok(data.frame(V1=1L)), error=".selfref.ok expects data.table class object")
21674+
d1 = structure(list(V1=1L), class=c("data.table","data.frame"))
21675+
test(2340.01, .selfref.ok(d1), FALSE)
21676+
setDT(d1)
21677+
test(2340.02, .selfref.ok(d1), TRUE)
21678+
saveRDS(d1, f<-tempfile())
21679+
d2 = readRDS(f)
21680+
test(2340.03, .selfref.ok(d2), FALSE)
21681+
invisible(file.remove(f))
21682+
setDT(d2)
21683+
test(2340.04, .selfref.ok(d2), TRUE)
21684+
d3 = unserialize(serialize(d2, NULL))
21685+
test(2340.05, .selfref.ok(d3), FALSE)
21686+
setDT(d3)
21687+
test(2340.06, .selfref.ok(d3), TRUE)

man/.selfref.ok.Rd

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
\name{.selfref.ok}
2+
\alias{.selfref.ok}
3+
\title{Tests self reference of a data.table}
4+
\description{
5+
In rare situations, as presented in examples below, a data.table object may lose its internal attribute that holds a self-reference. This function tests just that.
6+
7+
It is not expected that many end users will have need for this highly technical function about data.table internals.
8+
}
9+
\usage{
10+
.selfref.ok(x)
11+
}
12+
\arguments{
13+
\item{x}{ A data.table. }
14+
}
15+
\value{
16+
\code{TRUE} if self reference attribute is properly set, \code{FALSE} otherwise.
17+
}
18+
\examples{
19+
d1 = structure(list(V1=1L), class=c("data.table","data.frame"))
20+
.selfref.ok(d1)
21+
setDT(d1)
22+
.selfref.ok(d1)
23+
24+
saveRDS(d1, f<-tempfile())
25+
d2 = readRDS(f)
26+
.selfref.ok(d2)
27+
invisible(file.remove(f))
28+
setDT(d2)
29+
.selfref.ok(d2)
30+
31+
d3 = unserialize(serialize(d2, NULL))
32+
.selfref.ok(d3)
33+
setDT(d3)
34+
.selfref.ok(d3)
35+
}
36+
\keyword{ data }

0 commit comments

Comments
 (0)