Skip to content

Commit dacb594

Browse files
authored
Merge branch 'master' into todoRename
2 parents 25ccc1b + e0d40b1 commit dacb594

23 files changed

+990
-882
lines changed

.ci/linters/r/eval_parse_linter.R

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
eval_parse_linter = make_linter_from_xpath(
2+
"//SYMBOL_FUNCTION_CALL[text() = 'parse']
3+
/ancestor::expr
4+
/preceding-sibling::expr[SYMBOL_FUNCTION_CALL[text() = 'eval']]
5+
/parent::expr
6+
",
7+
"Avoid eval(parse()); build the language directly, possibly using substitute2()."
8+
)

NAMESPACE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ if (getRversion() >= "3.6.0") {
153153

154154
# IDateTime support:
155155
export(as.IDate,as.ITime,IDateTime)
156-
export(second,minute,hour,yday,wday,mday,week,isoweek,month,quarter,year,yearmon,yearqtr)
156+
export(second,minute,hour,yday,wday,mday,week,isoweek,isoyear,month,quarter,year,yearmon,yearqtr)
157157

158158
S3method("[", ITime)
159159
S3method("+", IDate)

NEWS.md

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,16 @@
1010

1111
### NEW FEATURES
1212

13-
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
13+
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also matches `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
14+
15+
```r
16+
DT = data.table(a=c(1L, 2L, 1L), b=c(3L, 1L, 2L))
17+
sort_by(DT, ~a + b)
18+
# a b
19+
# 1: 1 2
20+
# 2: 1 3
21+
# 3: 2 1
22+
```
1423

1524
2. `melt()` now supports using `patterns()` with `id.vars`, [#6867](https://github.com/Rdatatable/data.table/issues/6867). Thanks to Toby Dylan Hocking for the suggestion and PR.
1625

@@ -56,6 +65,10 @@
5665

5766
13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
5867

68+
14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation.
69+
70+
15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation.
71+
5972
### BUG FIXES
6073

6174
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.
@@ -84,6 +97,10 @@
8497
8598
13. Reference to `.SD` in `...` arguments to `lapply()`, e.g. ``lapply(list_of_tables, `[`, j=.SD[1L])`` is evaluated correctly, [#2982](https://github.com/Rdatatable/data.table/issues/2982). Thanks @franknarf1 for the report and @MichaelChirico for the fix.
8699
100+
14. Filling columns of class Date with POSIXct (and vice versa) using `shift()` now yields a clear, informative error message specifying the class mismatch, [#5218](https://github.com/Rdatatable/data.table/issues/5218). Thanks @ashbaldry for the report and @ben-schwen for the fix.
101+
102+
15. `split.data.table()` output list elements retain the S3 class of the generating data.table, e.g. in `l=split(x, ...)` if `x` has class `my_class`, so will `l[[1]]` and so on, [#7105](https://github.com/Rdatatable/data.table/issues/7105). Thanks @m-muecke for the bug report and @MichaelChirico for the fix.
103+
87104
### NOTES
88105
89106
1. The following in-progress deprecations have proceeded:
@@ -105,21 +122,21 @@
105122
106123
5. A GitHub Actions workflow is now in place to warn the entire maintainer team, as well as any contributor following the GitHub repository, when the package is at risk of archival on CRAN [#7008](https://github.com/Rdatatable/data.table/issues/7008). Thanks @tdhock for the original report and @Bisaloo and @TysonStanley for the fix.
107124
108-
# data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
125+
## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
109126
110127
1. Internal functions used to signal errors are now marked as non-returning, silencing a compiler warning about potentially unchecked allocation failure. Thanks to Prof. Brian D. Ripley for the report and @aitap for the fix, [#7070](https://github.com/Rdatatable/data.table/pull/7070).
111128
112-
# data.table [v1.17.6](https://github.com/Rdatatable/data.table/milestone/40) (15 June 2025)
129+
## data.table [v1.17.6](https://github.com/Rdatatable/data.table/milestone/40) (15 June 2025)
113130
114131
1. On a heavily loaded machine, a `forder` thread could try to perform a zero-length copy from a null pointer, which was de-facto harmless but is against the C standard and was caught by additional CRAN checks, [#7051](https://github.com/Rdatatable/data.table/issues/7051). Thanks to @helske for the report and @aitap for the PR.
115132
116-
# data.table [v1.17.4](https://github.com/Rdatatable/data.table/milestone/39) (25 May 2025)
133+
## data.table [v1.17.4](https://github.com/Rdatatable/data.table/milestone/39) (25 May 2025)
117134
118135
1. The C code now avoids passing invalid data pointers from 0-length vectors to `memcpy()`, which previously caused undefined behaviour. Thanks to Prof. Brian D. Ripley for the report and Michael Chirico for the fix, [#6911](https://github.com/Rdatatable/data.table/pull/6911).
119136
120-
# data.table [v1.17.2](https://github.com/Rdatatable/data.table/milestone/38) (7 May 2025)
137+
## data.table [v1.17.2](https://github.com/Rdatatable/data.table/milestone/38) (7 May 2025)
121138
122-
## BUG FIXES
139+
### BUG FIXES
123140
124141
1. `fwrite(compress="gzip")` once again produces a gzip header when the column names are missing or disabled, [@6852](https://github.com/Rdatatable/data.table/issues/6852). Thanks @maxscheiber for the report and @aitap for the fix.
125142
@@ -135,7 +152,7 @@
135152
136153
7. `as.data.table()` now properly handles keys: specifying keys sets them, omitting keys preserves existing ones, and setting `key=NULL` clears them, [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.
137154
138-
## NOTES
155+
### NOTES
139156
140157
1. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PRs and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/.
141158

R/IDateTime.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,7 @@ isoweek = function(x) as.integer(format(as.IDate(x), "%V"))
355355
# nearest_thurs = as.IDate(7L * (as.integer(x + 3L) %/% 7L))
356356
# year_start = as.IDate(format(nearest_thurs, '%Y-01-01'))
357357
# 1L + (nearest_thurs - year_start) %/% 7L
358-
358+
isoyear = function(x) as.integer(format(as.IDate(x), "%G"))
359359

360360
month = function(x) convertDate(as.IDate(x), "month")
361361
quarter = function(x) convertDate(as.IDate(x), "quarter")

R/between.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ between = function(x, lower, upper, incbounds=TRUE, NAbounds=TRUE, check=FALSE,
3030
}
3131
if (is.i64(x)) {
3232
if (!requireNamespace("bit64", quietly=TRUE)) stopf("trying to use integer64 class when 'bit64' package is not installed") # nocov
33-
if (!is.i64(lower) && is.numeric(lower)) lower = bit64::as.integer64(lower)
34-
if (!is.i64(upper) && is.numeric(upper)) upper = bit64::as.integer64(upper)
33+
if (!is.i64(lower) && (is.integer(lower) || fitsInInt64(lower))) lower = bit64::as.integer64(lower)
34+
if (!is.i64(upper) && (is.integer(upper) || fitsInInt64(upper))) upper = bit64::as.integer64(upper)
3535
}
3636
is.supported = function(x) is.numeric(x) || is.character(x) || is.px(x)
3737
if (is.supported(x) && is.supported(lower) && is.supported(upper)) {

R/data.table.R

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -97,34 +97,32 @@ replace_dot_alias = function(e) {
9797
}
9898

9999
.checkTypos = function(err, ref) {
100+
err_str <- conditionMessage(err)
100101
# a slightly wonky workaround so that this still works in non-English sessions, #4989
101102
# generate this at run time (as opposed to e.g. onAttach) since session language is
102103
# technically OK to update (though this should be rare), and since it's low-cost
103104
# to do so here because we're about to error anyway.
104-
missing_obj_fmt = gsub(
105-
"'missing_datatable_variable____'",
105+
missing_obj_regex = gsub(
106+
"'____missing_datatable_variable____'",
106107
"'(?<obj_name>[^']+)'",
107-
tryCatch(eval(parse(text="missing_datatable_variable____")), error=identity)$message
108-
# eval(parse()) to avoid "no visible binding for global variable" note from R CMD check
109-
# names starting with _ don't parse, so no leading _ in the name
108+
# expression() to avoid "no visible binding for global variable" note from R CMD check
109+
conditionMessage(tryCatch(eval(quote(`____missing_datatable_variable____`)), error=identity)),
110+
fixed=TRUE
110111
)
111-
idx = regexpr(missing_obj_fmt, err$message, perl=TRUE)
112-
if (idx > 0L) {
113-
start = attr(idx, "capture.start", exact=TRUE)[ , "obj_name"]
114-
used = substr(
115-
err$message,
116-
start,
117-
start + attr(idx, "capture.length", exact=TRUE)[ , "obj_name"] - 1L
118-
)
119-
found = agrep(used, ref, value=TRUE, ignore.case=TRUE, fixed=TRUE)
120-
if (length(found)) {
121-
stopf("Object '%s' not found. Perhaps you intended %s", used, brackify(found))
122-
} else {
123-
stopf("Object '%s' not found amongst %s", used, brackify(ref))
124-
}
112+
idx = regexpr(missing_obj_regex, err_str, perl=TRUE)
113+
if (idx == -1L)
114+
stopf("%s", err_str, domain=NA) # Don't use stopf() directly, since err_str might have '%', #6588
115+
start = attr(idx, "capture.start", exact=TRUE)[ , "obj_name"]
116+
used = substr(
117+
err_str,
118+
start,
119+
start + attr(idx, "capture.length", exact=TRUE)[ , "obj_name"] - 1L
120+
)
121+
found = agrep(used, ref, value=TRUE, ignore.case=TRUE, fixed=TRUE)
122+
if (length(found)) {
123+
stopf("Object '%s' not found. Perhaps you intended %s", used, brackify(found))
125124
} else {
126-
# Don't use stopf() directly, since err$message might have '%', #6588
127-
stopf("%s", err$message, domain=NA)
125+
stopf("Object '%s' not found amongst %s", used, brackify(ref))
128126
}
129127
}
130128

@@ -2493,7 +2491,7 @@ Ops.data.table = function(e1, e2 = NULL)
24932491
}
24942492

24952493
split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TRUE, flatten = TRUE, ..., verbose = getOption("datatable.verbose")) {
2496-
if (!is.data.table(x)) stopf("x argument must be a data.table")
2494+
if (!is.data.table(x)) internal_error("x argument to split.data.table must be a data.table") # nocov
24972495
stopifnot(is.logical(drop), is.logical(sorted), is.logical(keep.by), is.logical(flatten))
24982496
# split data.frame way, using `f` and not `by` argument
24992497
if (!missing(f)) {
@@ -2568,8 +2566,11 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
25682566
setattr(ll, "names", nm)
25692567
# handle nested split
25702568
if (flatten || length(by) == 1L) {
2571-
for (x in ll) .Call(C_unlock, x)
2572-
lapply(ll, setDT)
2569+
for (xi in ll) .Call(C_unlock, xi)
2570+
out = lapply(ll, setDT)
2571+
# TODO(#2000): just let setDT handle this
2572+
if (!identical(old_class <- class(x), c("data.table", "data.frame"))) for (xi in out) setattr(xi, "class", old_class)
2573+
out
25732574
# alloc.col could handle DT in list as done in: c9c4ff80bdd4c600b0c4eff23b207d53677176bd
25742575
} else if (length(by) > 1L) {
25752576
lapply(ll, split.data.table, drop=drop, by=by[-1L], sorted=sorted, keep.by=keep.by, flatten=flatten)

R/onLoad.R

Lines changed: 23 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -73,31 +73,29 @@
7373
# In fread and fwrite we have moved back to using getOption's default argument since it is unlikely fread and fread will be called in a loop many times, plus they
7474
# are relatively heavy functions where the overhead in getOption() would not be noticed. It's only really [.data.table where getOption default bit.
7575
# Improvement to base::getOption() now submitted (100x; 5s down to 0.05s): https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17394
76-
opts = c(
77-
"datatable.verbose"="FALSE", # datatable.<argument name>
78-
"datatable.optimize"="Inf", # datatable.<argument name>
79-
"datatable.print.nrows"="100L", # datatable.<argument name>
80-
"datatable.print.topn"="5L", # datatable.<argument name>
81-
"datatable.print.class"="TRUE", # for print.data.table
82-
"datatable.print.rownames"="TRUE", # for print.data.table
83-
"datatable.print.colnames"="'auto'", # for print.data.table
84-
"datatable.print.keys"="TRUE", # for print.data.table
85-
"datatable.print.trunc.cols"="FALSE", # for print.data.table
86-
"datatable.show.indices"="FALSE", # for print.data.table
87-
"datatable.allow.cartesian"="FALSE", # datatable.<argument name>
88-
"datatable.join.many"="TRUE", # mergelist, [.data.table #4383 #914
89-
"datatable.dfdispatchwarn"="TRUE", # not a function argument
90-
"datatable.warnredundantby"="TRUE", # not a function argument
91-
"datatable.alloccol"="1024L", # argument 'n' of alloc.col. Over-allocate 1024 spare column slots
92-
"datatable.auto.index"="TRUE", # DT[col=="val"] to auto add index so 2nd time faster
93-
"datatable.use.index"="TRUE", # global switch to address #1422
94-
"datatable.prettyprint.char" = NULL, # FR #1091
95-
"datatable.old.matrix.autoname"="TRUE", # #7145: how data.table(x=1, matrix(1)) is auto-named set to change
96-
NULL
97-
)
98-
for (i in setdiff(names(opts),names(options()))) {
99-
eval(parse(text=paste0("options(",i,"=",opts[i],")")))
100-
}
76+
opts = list(
77+
datatable.verbose=FALSE, # datatable.<argument name>
78+
datatable.optimize=Inf, # datatable.<argument name>
79+
datatable.print.nrows=100L, # datatable.<argument name>
80+
datatable.print.topn=5L, # datatable.<argument name>
81+
datatable.print.class=TRUE, # for print.data.table
82+
datatable.print.rownames=TRUE, # for print.data.table
83+
datatable.print.colnames='auto', # for print.data.table
84+
datatable.print.keys=TRUE, # for print.data.table
85+
datatable.print.trunc.cols=FALSE, # for print.data.table
86+
datatable.show.indices=FALSE, # for print.data.table
87+
datatable.allow.cartesian=FALSE, # datatable.<argument name>
88+
datatable.join.many=TRUE, # mergelist, [.data.table #4383 #914
89+
datatable.dfdispatchwarn=TRUE, # not a function argument
90+
datatable.warnredundantby=TRUE, # not a function argument
91+
datatable.alloccol=1024L, # argument 'n' of alloc.col. Over-allocate 1024 spare column slots
92+
datatable.auto.index=TRUE, # DT[col=="val"] to auto add index so 2nd time faster
93+
datatable.use.index=TRUE, # global switch to address #1422
94+
datatable.prettyprint.char=NULL, # FR #1091
95+
datatable.old.matrix.autoname=TRUE # #7145: how data.table(x=1, matrix(1)) is auto-named set to change
96+
)
97+
opts = opts[!names(opts) %chin% names(options())]
98+
options(opts)
10199

102100
# Test R behaviour that changed in v3.1 and is now depended on
103101
x = 1L:3L

R/wrappers.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Very small (e.g. one line) R functions that just call C.
33
# One file wrappers.R to avoid creating lots of small .R files.
44

5-
fcoalesce = function(...) .Call(Ccoalesce, list(...), FALSE)
6-
setcoalesce = function(...) .Call(Ccoalesce, list(...), TRUE)
5+
fcoalesce = function(..., nan=NA) .Call(Ccoalesce, list(...), FALSE, nan_is_na(nan))
6+
setcoalesce = function(..., nan=NA) .Call(Ccoalesce, list(...), TRUE, nan_is_na(nan))
77

88
fifelse = function(test, yes, no, na=NA) .Call(CfifelseR, test, yes, no, na)
99
fcase = function(..., default=NA) {

inst/tests/nafill.Rraw

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,9 @@ test(3.02, setnafill(list(copy(x)), "locf", fill=0L), list(x))
114114
test(3.03, setnafill(x, "locf"), error="in-place update is supported only for list")
115115
test(3.04, nafill(letters[1:5], fill=0), error="must be numeric type, or list/data.table")
116116
test(3.05, setnafill(list(letters[1:5]), fill=0), error="must be numeric type, or list/data.table")
117-
test(3.06, nafill(x, fill=1:2), error="fill must be a vector of length 1")
118-
test(3.07, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion"))
117+
test(3.06, nafill(x, fill=1:2), error="fill must be a vector of length 1.*fcoalesce")
118+
test(3.07, nafill(x, "locf", fill=1:2), error="fill must be a vector of length 1.*x\\.$")
119+
test(3.08, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion"))
119120

120121
# colnamesInt helper
121122
dt = data.table(a=1, b=2, d=3)

0 commit comments

Comments
 (0)