Skip to content

Commit 2d806f9

Browse files
committed
Merge branch 'master' into frollapply-interrupt
And fix a silly typo.
2 parents 7ce469b + 8d26f4e commit 2d806f9

29 files changed

+639
-576
lines changed

.gitattributes

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,16 @@
88
# want to preserve the line endings robustly
99
inst/tests/** -text
1010
inst/tests/*.Rraw text eol=lf linguist-language=R
11+
# Ensure GitHub linguist only considers source code files
12+
# for language statistics
13+
docs/** linguist-documentation
14+
man/** linguist-documentation
15+
tests/** linguist-vendored
16+
vignettes/** linguist-vendored
17+
po/** linguist-vendored
18+
.github/** linguist-vendored
19+
.ci/** linguist-vendored
20+
.dev/** linguist-vendored
21+
.devcontainer/** linguist-vendored
22+
.graphics/** linguist-vendored
23+
.Rproj.user/** linguist-vendored

.github/workflows/R-CMD-check-occasional.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ jobs:
4242

4343
env:
4444
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
45+
RUN_ALL_DATATABLE_TESTS: yes
4546

4647
steps:
4748
- name: Set locale

.gitlab-ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ variables:
1313
TZ: "UTC" ## to avoid 'Failed to create bus connection' from timedatectl via Sys.timezone() on Docker with R 3.4.
1414
## Setting TZ for all GLCI jobs to isolate them from timezone. We could have a new GLCI job to test under
1515
## a non-UTC timezone, although, that's what we do routinely in dev.
16+
RUN_ALL_DATATABLE_TESTS: "yes" ## run optional tests in CI
1617
R_REL_VERSION: "4.5" # only raise when RTOOLS for REL is available
1718
R_REL_WIN_BIN: "https://cloud.r-project.org/bin/windows/base/old/4.5.0/R-4.5.0-win.exe"
1819
R_DEV_VERSION: "4.6"

NEWS.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,10 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
346346

347347
23. `fread()` auto-detects separators for single-column files consisting solely of quoted values (e.g. `"this_that"\n"2025-01-01 00:00:01"`), [#7366](https://github.com/Rdatatable/data.table/issues/7366). Thanks @arunsrinivasan for the report and @ben-schwen for the fix.
348348

349+
24. Rolling functions now ensure there is no nested parallelism. It could have happened for vectorized input and `adaptive=TRUE`, [#7352](https://github.com/Rdatatable/data.table/issues/7352). Thanks @jangorecki for the fix.
350+
351+
25. By-group operations on missing rows (e.g. `foo[c(i, NA), bar, by=grp]`) now avoid leaving in data from the previous groups, [#7442](https://github.com/Rdatatable/data.table/issues/7442). Thanks @aitap for the report and the fix.
352+
349353
### NOTES
350354

351355
1. The following in-progress deprecations have proceeded:
@@ -371,6 +375,8 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
371375
372376
7. In rare situations a data.table object may lose its internal attribute that holds a self-reference. New helper function `.selfref.ok()` tests just that. It is only intended for technical use cases. See manual for examples.
373377
378+
8. Retain important information in the error message about the source of the error when `i=` fails, e.g. pointing to `charToDate()` failing in `DT[date_col == "20250101"]`, [#7444](https://github.com/Rdatatable/data.table/issues/7444). Thanks @jan-swissre for the report and @MichaelChirico for the fix.
379+
374380
## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
375381
376382
1. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070).
@@ -544,6 +550,8 @@ rowwiseDT(
544550

545551
22. `fread()` could fail to read Mac CSV files (with `\r` line endings) if the file contained any `\n` character, such as a final `\r\n`. This was fixed by detecting the predominant line ending in a sample of the file, [#4186](https://github.com/Rdatatable/data.table/issues/4186). Thanks to @MPagel for the report and @ben-schwen for the fix.
546552

553+
23. By reference assignments (':=') with functions that modified the data.table by reference e.g. (`foo=function(DT){modify(DT);return(1L)}`, `DT[,a:=foo(DT)]`) returned a malformed data.table due to the modification of the targeted named column index ("a") during the j expression evaluation [#6768](https://github.com/Rdatatable/data.table/issues/6768). Thanks @AntonNM for the report and fix.
554+
547555
### NOTES
548556

549557
1. There is a new vignette on joins! See `vignette("datatable-joins")`. Thanks to Angel Feliz for authoring it! Feedback welcome. This vignette has been highly requested since 2017: [#2181](https://github.com/Rdatatable/data.table/issues/2181).

R/data.table.R

Lines changed: 54 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ replace_dot_alias = function(e) {
111111
)
112112
idx = regexpr(missing_obj_regex, err_str, perl=TRUE)
113113
if (idx == -1L)
114-
stopf("%s", err_str, domain=NA) # Don't use stopf() directly, since err_str might have '%', #6588
114+
stop(err) # Pass 'err' to retain call site data (#7444); beware also #6588
115115
start = attr(idx, "capture.start", exact=TRUE)[ , "obj_name"]
116116
used = substr(
117117
err_str,
@@ -1189,7 +1189,6 @@ replace_dot_alias = function(e) {
11891189
} else if (is.numeric(lhs)) {
11901190
m = as.integer(lhs)
11911191
if (any(m<1L | ncol(x)<m)) stopf("LHS of := appears to be column positions but are outside [1,ncol] range. New columns can only be added by name.")
1192-
lhs = names_x[m]
11931192
} else
11941193
stopf("LHS of := isn't column names ('character') or positions ('integer' or 'numeric')")
11951194
if (!anyNA(m)) {
@@ -1214,44 +1213,16 @@ replace_dot_alias = function(e) {
12141213
return(invisible(x))
12151214
}
12161215
} else {
1217-
# Adding new column(s). TO DO: move after the first eval in case the jsub has an error.
1216+
# Adding new column(s). Allocation for columns and recalculation of target cols moved after the jval = eval(jsub)
1217+
# in case of error or by-reference modifications to the DT
12181218
newnames=setdiff(lhs, names_x)
12191219
m[is.na(m)] = ncol(x)+seq_along(newnames)
12201220
cols = as.integer(m)
12211221
# don't pass verbose to selfrefok here -- only activated when
1222-
# ok=-1 which will trigger setalloccol with verbose in the next
1223-
# branch, which again calls _selfrefok and returns the message then
1222+
# ok=-1 which will trigger setalloccol with verbose after
1223+
# the jval = eval(jsub, ...)
12241224
if ((ok<-selfrefok(x, verbose=FALSE))==0L) # ok==0 so no warning when loaded from disk (-1) [-1 considered TRUE by R]
12251225
if (is.data.table(x)) warningf("A shallow copy of this data.table was taken so that := can add or remove %d columns by reference. At an earlier point, this data.table was copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. It's also not unusual for data.table-agnostic packages to produce tables affected by this issue. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.", length(newnames))
1226-
# !is.data.table for DF |> DT(,:=) tests 2212.16-19 (#5113) where a shallow copy is routine for data.frame
1227-
if ((ok<1L) || (truelength(x) < ncol(x)+length(newnames))) {
1228-
DT = x # in case getOption contains "ncol(DT)" as it used to. TODO: warn and then remove
1229-
n = length(newnames) + eval(getOption("datatable.alloccol")) # TODO: warn about expressions and then drop the eval()
1230-
# i.e. reallocate at the size as if the new columns were added followed by setalloccol().
1231-
name = substitute(x)
1232-
if (is.name(name) && ok && verbose) { # && NAMED(x)>0 (TO DO) # ok here includes -1 (loaded from disk)
1233-
catf("Growing vector of column pointers from truelength %d to %d. A shallow copy has been taken, see ?setalloccol. Only a potential issue if two variables point to the same data (we can't yet detect that well) and if not you can safely ignore this. To avoid this message you could setalloccol() first, deep copy first using copy(), wrap with suppressWarnings() or increase the 'datatable.alloccol' option.\n", truelength(x), n)
1234-
# #1729 -- copying to the wrong environment here can cause some confusion
1235-
if (ok == -1L) catf("Note that the shallow copy will assign to the environment from which := was called. That means for example that if := was called within a function, the original table may be unaffected.\n")
1236-
1237-
# Verbosity should not issue warnings, so cat rather than warning.
1238-
# TO DO: Add option 'datatable.pedantic' to turn on warnings like this.
1239-
1240-
# TO DO ... comments moved up from C ...
1241-
# Note that the NAMED(dt)>1 doesn't work because .Call
1242-
# always sets to 2 (see R-ints), it seems. Work around
1243-
# may be possible but not yet working. When the NAMED test works, we can drop allocwarn argument too
1244-
# because that's just passed in as FALSE from [<- where we know `*tmp*` isn't really NAMED=2.
1245-
# Note also that this growing will happen for missing columns assigned NULL, too. But so rare, we
1246-
# don't mind.
1247-
}
1248-
setalloccol(x, n, verbose=verbose) # always assigns to calling scope; i.e. this scope
1249-
if (is.name(name)) {
1250-
assign(as.character(name),x,parent.frame(),inherits=TRUE)
1251-
} else if (.is_simple_extraction(name)) {
1252-
.reassign_extracted_table(name, x)
1253-
} # TO DO: else if env$<- or list$<-
1254-
}
12551226
}
12561227
}
12571228
}
@@ -1411,6 +1382,55 @@ replace_dot_alias = function(e) {
14111382
}
14121383

14131384
if (!is.null(lhs)) {
1385+
# Re-matches characters names in the lhs after jval to account for jsub's that modify the columns of the data.table (#6768)
1386+
# Replaces numerical lhs with respective names_x
1387+
if(is.character(lhs)){
1388+
m = chmatch(lhs, names_x)
1389+
if(!anyNA(m)) {
1390+
# updates by reference to existing columns
1391+
cols = as.integer(m)
1392+
newnames = NULL
1393+
} else {
1394+
# Adding new column(s).
1395+
newnames = setdiff(lhs, names_x)
1396+
m[is.na(m)] = ncol(x) + seq_along(newnames)
1397+
cols = as.integer(m)
1398+
# ok <- selfrefok above called without verbose -- only activated when
1399+
# ok=-1 which will trigger setalloccol with verbose in the next
1400+
# branch, which again calls _selfrefok and returns the message then
1401+
# !is.data.table for DF |> DT(,:=) tests 2212.16-19 (#5113) where a shallow copy is routine for data.frame
1402+
if ((ok<1L) || (truelength(x) < ncol(x)+length(newnames))) {
1403+
DT = x # in case getOption contains "ncol(DT)" as it used to. TODO: warn and then remove
1404+
n = length(newnames) + eval(getOption("datatable.alloccol")) # TODO: warn about expressions and then drop the eval()
1405+
# i.e. reallocate at the size as if the new columns were added followed by setalloccol().
1406+
name = substitute(x)
1407+
if (is.name(name) && ok && verbose) { # && NAMED(x)>0 (TO DO) # ok here includes -1 (loaded from disk)
1408+
catf("Growing vector of column pointers from truelength %d to %d. A shallow copy has been taken, see ?setalloccol. Only a potential issue if two variables point to the same data (we can't yet detect that well) and if not you can safely ignore this. To avoid this message you could setalloccol() first, deep copy first using copy(), wrap with suppressWarnings() or increase the 'datatable.alloccol' option.\n", truelength(x), n)
1409+
# #1729 -- copying to the wrong environment here can cause some confusion
1410+
if (ok == -1L) catf("Note that the shallow copy will assign to the environment from which := was called. That means for example that if := was called within a function, the original table may be unaffected.\n")
1411+
1412+
# Verbosity should not issue warnings, so cat rather than warning.
1413+
# TO DO: Add option 'datatable.pedantic' to turn on warnings like this.
1414+
1415+
# TO DO ... comments moved up from C ...
1416+
# Note that the NAMED(dt)>1 doesn't work because .Call
1417+
# always sets to 2 (see R-ints), it seems. Work around
1418+
# may be possible but not yet working. When the NAMED test works, we can drop allocwarn argument too
1419+
# because that's just passed in as FALSE from [<- where we know `*tmp*` isn't really NAMED=2.
1420+
# Note also that this growing will happen for missing columns assigned NULL, too. But so rare, we
1421+
# don't mind.
1422+
}
1423+
setalloccol(x, n, verbose=verbose) # always assigns to calling scope; i.e. this scope
1424+
if (is.name(name)) {
1425+
assign(as.character(name),x,parent.frame(),inherits=TRUE)
1426+
} else if (.is_simple_extraction(name)) {
1427+
.reassign_extracted_table(name, x)
1428+
} # TO DO: else if env$<- or list$<-
1429+
}
1430+
}
1431+
} else if (is.numeric(lhs)) {
1432+
lhs = names_x[m]
1433+
}
14141434
# TODO?: use set() here now that it can add new columns. Then remove newnames and alloc logic above.
14151435
.Call(Cassign,x,irows,cols,newnames,jval)
14161436
return(suppPrint(x))

R/frollapply.R

Lines changed: 6 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -297,38 +297,22 @@ frollapply = function(X, N, FUN, ..., by.column=TRUE, fill=NA, align=c("right","
297297
tight = function(i, dest, src, n) FUN(.Call(CmemcpyDT, dest, src, i, n), ...)
298298
}
299299
} else {
300-
#has.growable = base::getRversion() >= "3.4.0"
301-
## this is now always TRUE
302-
## we keep this branch, it may be useful when getting rid of SET_GROWABLE_BIT and SETLENGTH #6180
303-
has.growable = TRUE
304-
cpy = if (has.growable) function(x) .Call(Csetgrowable, copy(x)) else copy
300+
cpy = function(x) .Call(CcopyAsGrowable, x)
305301
ansMask = function(len, n) {
306302
mask = seq_len(len) >= n
307303
mask[is.na(mask)] = FALSE ## test 6010.206
308304
mask
309305
}
310306
if (by.column) {
311-
allocWindow = function(x, n) x[seq_len(max(n, na.rm=TRUE))]
312-
if (has.growable) {
313-
tight = function(i, dest, src, n) FUN(.Call(CmemcpyVectoradaptive, dest, src, i, n), ...) # CmemcpyVectoradaptive handles k[i]==0
314-
} else {
315-
tight = function(i, dest, src, n) {stopf("internal error: has.growable should be TRUE, implement support for n==0"); FUN(src[(i-n[i]+1L):i], ...)} # nocov
316-
}
307+
allocWindow = function(x, n) cpy(x[seq_len(max(n, na.rm=TRUE))])
308+
tight = function(i, dest, src, n) FUN(.Call(CmemcpyVectoradaptive, dest, src, i, n), ...) # CmemcpyVectoradaptive handles k[i]==0
317309
} else {
318310
if (!list.df) {
319-
allocWindow = function(x, n) x[seq_len(max(n, na.rm=TRUE)), , drop=FALSE]
320-
} else {
321-
allocWindow = function(x, n) lapply(x, `[`, seq_len(max(n)))
322-
}
323-
if (has.growable) {
324-
tight = function(i, dest, src, n) FUN(.Call(CmemcpyDTadaptive, dest, src, i, n), ...) # CmemcpyDTadaptive handles k[i]==0
311+
allocWindow = function(x, n) cpy(x[seq_len(max(n, na.rm=TRUE)), , drop=FALSE])
325312
} else {
326-
if (!list.df) { # nocov
327-
tight = function(i, dest, src, n) {stopf("internal error: has.growable should be TRUE, implement support for n==0"); FUN(src[(i-n[i]+1L):i, , drop=FALSE], ...)} # nocov
328-
} else {
329-
tight = function(i, dest, src, n) {stopf("internal error: has.growable should be TRUE, implement support for n==0"); FUN(lapply(src, `[`, (i-n[i]+1L):i), ...)} # nocov
330-
}
313+
allocWindow = function(x, n) cpy(lapply(x, `[`, seq_len(max(n))))
331314
}
315+
tight = function(i, dest, src, n) FUN(.Call(CmemcpyDTadaptive, dest, src, i, n), ...) # CmemcpyDTadaptive handles k[i]==0
332316
}
333317
}
334318
## prepare templates for errors and warnings

R/test.data.table.R

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=FALSE, showProgress=interactive()&&!silent, testPattern=NULL,
2-
memtest=Sys.getenv("TEST_DATA_TABLE_MEMTEST", 0L), memtest.id=NULL) {
3-
stopifnot(isTRUEorFALSE(verbose), isTRUEorFALSE(silent), isTRUEorFALSE(showProgress))
2+
memtest=Sys.getenv("TEST_DATA_TABLE_MEMTEST", 0L), memtest.id=NULL, optional=FALSE) {
3+
stopifnot(isTRUEorFALSE(verbose), isTRUEorFALSE(silent), isTRUEorFALSE(showProgress), isTRUEorFALSE(optional))
4+
5+
# Skip optional tests unless RUN_ALL_DATATABLE_TESTS is set
6+
if (optional && Sys.getenv("RUN_ALL_DATATABLE_TESTS") != "yes") {
7+
return(invisible(TRUE))
8+
}
9+
410
memtest = as.integer(memtest)
511
stopifnot(length(memtest)==1L, memtest %in% 0:2)
612
memtest.id = as.integer(memtest.id)

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
[![CRAN usage](https://jangorecki.gitlab.io/rdeps/data.table/CRAN_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps)
1111
[![BioC usage](https://jangorecki.gitlab.io/rdeps/data.table/BioC_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps)
1212
[![indirect usage](https://jangorecki.gitlab.io/rdeps/data.table/indirect_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps)
13-
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A )](http://numfocus.org)
13+
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A )](https://numfocus.org)
1414
<!-- badges: end -->
1515

1616
`data.table` provides a high-performance version of [base R](https://www.r-project.org/about.html)'s `data.frame` with syntax and feature enhancements for ease of use, convenience and programming speed.
@@ -108,7 +108,7 @@ A list of packages that significantly support, extend, or make use of `data.tabl
108108

109109
- click the **Watch** button at the top and right of GitHub project page
110110
- read [NEWS file](https://github.com/Rdatatable/data.table/blob/master/NEWS.md)
111-
- follow [#rdatatable](https://twitter.com/hashtag/rdatatable) and the [r_data_table](https://x.com/r_data_table) account on X/Twitter
111+
- follow [#rdatatable](https://x.com/hashtag/rdatatable) and the [r_data_table](https://x.com/r_data_table) account on X/Twitter
112112
- follow [#rdatatable](https://fosstodon.org/tags/rdatatable) and the [r_data_table account](https://fosstodon.org/@r_data_table) on fosstodon
113113
- follow the [data.table community page](https://www.linkedin.com/company/data-table-community) on LinkedIn
114114
- watch recent [Presentations](https://github.com/Rdatatable/data.table/wiki/Presentations)

0 commit comments

Comments
 (0)