Skip to content

Commit 572d67f

Browse files
authored
Merge branch 'master' into parserFunctionRename
2 parents 01fb33c + 1a5003b commit 572d67f

File tree

9 files changed

+685
-28
lines changed

9 files changed

+685
-28
lines changed

GOVERNANCE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Governance for the R data.table project
1+
# Governance for the R data.table project
22

33
# Purpose and scope
44

@@ -121,7 +121,7 @@ Funds acquired by the data.table project will be disbursed at the discretion of
121121

122122
# Code of conduct
123123

124-
The full Code of Conduct can be found [here](CODE_OF_CONDUCT.md), including details for reporting violations.
124+
The full Code of Conduct can be found [here](.github/CODE_OF_CONDUCT.md), including details for reporting violations.
125125

126126
## Reporting Responsibility
127127

NEWS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@
88

99
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
1010

11+
### BREAKING CHANGE
12+
13+
1. `dcast()` now errors when `fun.aggregate` returns length != 1 (consistent with documentation), regardless of `fill`, [#6629](https://github.com/Rdatatable/data.table/issues/6629). Previously, when `fill` was not `NULL`, `dcast` warned and returned an undefined result. This change has been planned since 1.16.0 (25 Aug 2024).
14+
1115
### NEW FEATURES
1216

1317
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also matches `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
@@ -107,6 +111,8 @@
107111
108112
18. `fwrite` now allows `dec` to be the same as `sep` for edge cases where only one will be written, e.g. 0-row or 1-column tables. [#7227](https://github.com/Rdatatable/data.table/issues/7227). Thanks @MichaelChirico for the report and @venom1204 for the fix.
109113
114+
19. Ellipsis elements like `..1` are correctly excluded when searching for variables in "up-a-level" syntax inside `[`, [#5460](https://github.com/Rdatatable/data.table/issues/5460). Thanks @ggrothendieck for the report and @MichaelChirico for the fix.
115+
110116
### NOTES
111117
112118
1. The following in-progress deprecations have proceeded:

R/data.table.R

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,8 @@ replace_dot_alias = function(e) {
264264
if (!missing(j)) {
265265
jsub = replace_dot_alias(jsub)
266266
root = root_name(jsub)
267-
av = all.vars(jsub)
267+
# exclude '..1' etc. for #5460
268+
av = grepv("^[.][.](?:[.]|[0-9]+)$", all.vars(jsub), invert=TRUE)
268269
all..names = FALSE
269270
if ((.is_withFALSE_range(jsub, x, root, av)) ||
270271
(root %chin% c("-","!") && jsub[[2L]] %iscall% '(' && jsub[[2L]][[2L]] %iscall% ':') || ## x[, !(V8:V10)]
@@ -1297,8 +1298,8 @@ replace_dot_alias = function(e) {
12971298
SDenv = new.env(parent=parent.frame())
12981299

12991300
syms = all.vars(jsub)
1300-
syms = syms[ startsWith(syms, "..") ]
1301-
syms = syms[ substr(syms, 3L, 3L) != "." ] # exclude ellipsis
1301+
syms = syms[startsWith(syms, "..")]
1302+
syms = grepv("^[.][.](?:[.]|[0-9]+)$", syms, invert=TRUE) # exclude ellipsis and '..n' ellipsis elements
13021303
for (sym in syms) {
13031304
if (sym %chin% names_x) {
13041305
# if "..x" exists as column name, use column, for backwards compatibility; e.g. package socialmixr in rev dep checks #2779

R/fcast.R

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -186,11 +186,7 @@ dcast.data.table = function(data, formula, fun.aggregate = NULL, sep = "_", ...,
186186
maybe_err = function(list.of.columns) {
187187
if (!all(lengths(list.of.columns) == 1L)) {
188188
msg = gettext("Aggregating functions should take a vector as input and return a single value (length=1), but they do not, so the result is undefined. Please fix by modifying your function so that a single value is always returned.")
189-
if (is.null(fill)) { # TODO change to always stopf #6329
190-
stop(msg, domain=NA, call. = FALSE)
191-
} else {
192-
warning(msg, domain=NA, call. = FALSE)
193-
}
189+
stop(msg, domain=NA, call. = FALSE)
194190
}
195191
list.of.columns
196192
}

inst/tests/tests.Rraw

Lines changed: 45 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13701,6 +13701,10 @@ test(1962.054, forder(DT, ), 3:1)
1370113701

1370213702
test(1962.055, fsort(as.double(DT$a), internal = TRUE),
1370313703
error = 'Internal code should not be being called on type double')
13704+
# coverage for fsort #7239
13705+
set.seed(1)
13706+
x = as.double(sample(0:(2^20), size = 1e6, replace = TRUE))
13707+
test(1962.0551, sort(x), fsort(x))
1370413708

1370513709
l = as.list(DT)
1370613710
test(1962.056, setorder(l, a), error = 'x must be a data.frame or data.table')
@@ -21564,16 +21568,16 @@ test(2333, as.expression(data.table(a = 1))[["a"]], 1)
2156421568

2156521569
# regression test for hexdigits subscript overrun (uint8_t wraps over 255, unsigned overflow is well defined in c)
2156621570
local({
21567-
f = tempfile()
21568-
on.exit(unlink(f))
21569-
# the line is likely invalid in current encoding, so disable any translation, #7209
21570-
# test.data.table() sets options(encoding="UTF-8"), so go the long way around.
21571-
ff = file(f, encoding = "")
21572-
tryCatch(
21573-
writeLines(c('a', rep('0x1.ffffp0', 10000L), `Encoding<-`('0x1.ff\x9fp0', 'bytes'), rep('0x1.ffffp0', 20000L)), ff),
21574-
finally = close(ff)
21575-
)
21576-
test(2334, names(fread(f)), "a")
21571+
f = tempfile()
21572+
on.exit(unlink(f))
21573+
# the line is likely invalid in current encoding, so disable any translation, #7209
21574+
# test.data.table() sets options(encoding="UTF-8"), so go the long way around.
21575+
ff = file(f, encoding = "")
21576+
tryCatch(
21577+
writeLines(c('a', rep('0x1.ffffp0', 10000L), `Encoding<-`('0x1.ff\x9fp0', 'bytes'), rep('0x1.ffffp0', 20000L)), ff),
21578+
finally = close(ff)
21579+
)
21580+
test(2334, names(fread(f)), "a")
2157721581
})
2157821582

2157921583
# Tests for new isoyear() helper (complement to isoweek) #7154
@@ -21620,3 +21624,34 @@ local({
2162021624
test(2338.9, {fwrite(dd, f, forceDecimal=FALSE); fread(f)}, di)
2162121625
})
2162221626

21627+
# '..1' is ...elt(1), not a variable named '1', when doing "up-a-level" search (#5460)
21628+
DT = data.table(a=1.0)
21629+
sqrt_dot_sym = function(...) sqrt(..1)
21630+
test(2339.01, lapply(DT, function(...) sqrt(..1)), list(a=1.0))
21631+
test(2339.02, lapply(DT, sqrt_dot_sym), list(a=1.0))
21632+
test(2339.03, DT[, lapply(.SD, function(...) sqrt(..1))], data.table(a=1.0))
21633+
test(2339.04, DT[, lapply(.SD, sqrt_dot_sym)], data.table(a=1.0))
21634+
sqrt_elt_sym = function(...) sqrt(...elt(1))
21635+
# TODO(R>=3.5.0): run this unconditionally
21636+
if (!inherits(tryCatch(sqrt_elt_sym(1), error=identity), "error")) {
21637+
test(2339.05, lapply(DT, sqrt_elt_sym), list(a=1.0))
21638+
test(2339.06, lapply(DT, function(...) sqrt(...elt(1L))), list(a=1.0))
21639+
test(2339.07, DT[, lapply(.SD, sqrt_elt_sym)], data.table(a=1.0))
21640+
test(2339.08, DT[, lapply(.SD, function(...) sqrt(...elt(1L)))], data.table(a=1.0))
21641+
}
21642+
# edge case of weird, though technically valid, names:
21643+
# _not_ '..N' style and distinct from '...'
21644+
# `123`='a'; DT[, ..123] doesn't work, but that's fine & easy to work around
21645+
DT = data.table(a = 4L)
21646+
local({
21647+
`.123` = 'a'
21648+
test(2339.09, DT[, ...123], DT)
21649+
})
21650+
local({
21651+
..123 = 'a'
21652+
test(2339.10, DT[, ....123], DT)
21653+
})
21654+
local({
21655+
...123 = 'a'
21656+
test(2339.11, DT[, .....123], DT)
21657+
})

man/fread.Rd

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,15 @@ Currently, the \code{yaml} setting is somewhat inflexible with respect to incorp
118118
119119
When \code{input} begins with http://, https://, ftp://, ftps://, or file://, \code{fread} detects this and \emph{downloads} the target to a temporary file (at \code{tempfile()}) before proceeding to read the file as usual. URLS (ftps:// and https:// as well as ftp:// and http://) paths are downloaded with \code{download.file} and \code{method} set to \code{getOption("download.file.method")}, defaulting to \code{"auto"}; and file:// is downloaded with \code{download.file} with \code{method="internal"}. NB: this implies that for file://, even files found on the current machine will be "downloaded" (i.e., hard-copied) to a temporary file. See \code{\link{download.file}} for more details.
120120
121+
\bold{Automatic Decompression:}
122+
123+
In many cases, \code{fread} can automatically detect and decompress files with common compression extensions directly, without needing an explicit connection object or shell commands. This works by checking the file extension.
124+
125+
\itemize{
126+
\item \code{.gz} and \code{.bz2} are supported out of the box.
127+
\item \code{.zip} is also supported. If the archive contains a single data file, \code{fread} will read it. If the archive contains multiple files, \code{fread} will produce an error.
128+
}
129+
121130
\bold{Shell commands:}
122131
123132
\code{fread} accepts shell commands for convenience. The input command is run and its output written to a file in \code{tmpdir} (\code{\link{tempdir}()} by default) to which \code{fread} is applied "as normal". The details are platform dependent -- \code{system} is used on UNIX environments, \code{shell} otherwise; see \code{\link[base]{system}}.

src/fsort.c

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#include "data.table.h"
22

3-
#define INSERT_THRESH 200 // TODO: expose via api and test
3+
static const int INSERT_THRESH = 200; // TODO: expose via api and test
44

55
static void dinsert(double *x, const int n) { // TODO: if and when twiddled, double => ull
66
if (n<2) return;
@@ -43,17 +43,16 @@ static void dradix_r( // single-threaded recursive worker
4343
return;
4444
}
4545

46-
uint64_t cumSum=0;
47-
for (uint64_t i=0; cumSum<n; ++i) { // cumSum<n better than i<width as may return early
48-
uint64_t tmp;
49-
if ((tmp=counts[i])) { // don't cumulate through 0s, important below to save a wasteful memset to zero
46+
for (uint64_t i = 0, cumSum = 0; cumSum < n; i++) { // cumSum<n better than i<width as may return early
47+
uint64_t tmp = counts[i];
48+
if (tmp) { // don't cumulate through 0s, important below to save a wasteful memset to zero
5049
counts[i] = cumSum;
5150
cumSum += tmp;
5251
}
5352
} // leaves cumSum==n && 0<i && i<=width
5453

5554
tmp=in;
56-
for (uint64_t i=0; i<n; ++i) { // go forwards not backwards to give cpu pipeline better chance
55+
for (uint64_t i = 0; i<n; ++i) { // go forwards not backwards to give cpu pipeline better chance
5756
int thisx = (*(uint64_t *)tmp - minULL) >> fromBit & mask;
5857
working[ counts[thisx]++ ] = *tmp;
5958
tmp++;
@@ -71,8 +70,7 @@ static void dradix_r( // single-threaded recursive worker
7170
return;
7271
}
7372

74-
cumSum=0;
75-
for (int i=0; cumSum<n; ++i) { // again, cumSum<n better than i<width as it can return early
73+
for (uint64_t i = 0, cumSum = 0; cumSum < n; i++) { // again, cumSum<n better than i<width as it can return early
7674
if (counts[i] == 0) continue;
7775
uint64_t thisN = counts[i] - cumSum; // undo cummulate; i.e. diff
7876
if (thisN <= INSERT_THRESH) {

0 commit comments

Comments
 (0)