Skip to content

Commit 6c8dd59

Browse files
authored
Merge branch 'Rdatatable:master' into issue5829
2 parents b333d52 + 5bbc4d5 commit 6c8dd59

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

103 files changed

+1620
-1364
lines changed

.ci/.lintr.R

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,13 @@ linters = c(dt_linters, all_linters(
1616
# system_time_linter = undesirable_function_linter(c(
1717
# system.time = "Only run timings in benchmark.Rraw"
1818
# )),
19+
undesirable_function_linter(c(
20+
cat = "Use catf to enable translations",
21+
message = "Use messagef to avoid fragmented translations.",
22+
warning = "Use warningf to avoid fragmented translations.",
23+
stop = "Use stopf to avoid fragmented translations.",
24+
NULL
25+
)),
1926
# undesirable_function_linter(modify_defaults(
2027
# default_undesirable_functions,
2128
# ifelse = "Use fifelse instead.",

.ci/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,15 @@ Test jobs:
1212
- `test-lin-rel-cran` - `--as-cran` on Linux, strict test for final status of `R CMD check`.
1313
- `test-lin-dev-gcc-strict-cran` - `--as-cran` on Linux, `r-devel` built with `-enable-strict-barrier --disable-long-double`, test for compilation warnings, test for new NOTEs/WARNINGs from `R CMD check`.
1414
- `test-lin-dev-clang-cran` - same as `gcc-strict` job but R built with `clang` and no `--enable-strict-barrier --disable-long-double` flags.
15-
- `test-lin-310-cran` - R 3.1.0 on Linux, stated R dependency version.
15+
- `test-lin-ancient-cran` - Stated R dependency version (currently 3.4.0) on Linux.
16+
- `test-lin-dev-san` - `r-devel` on Linux built with `clang -fsanitize=address,undefined` (including LeakSanitizer), test for sanitizer output in tests and examples.
1617
- `test-win-rel` - `r-release` on Windows.
1718
- `test-win-dev` - `r-devel` on Windows.
1819
- `test-win-old` - `r-oldrel` on Windows.
19-
- `test-mac-rel` - macOS build not yet available, see [#3326](https://github.com/Rdatatable/data.table/issues/3326) for status
20+
- `test-mac-rel` - `r-release` on macOS.
21+
- `test-mac-old` - `r-oldrel` on macOS.
22+
23+
The CI steps for the tests are [required](https://github.com/Rdatatable/data.table/blob/55eb0f160b169398d51f138131c14a66c86e5dc9/.ci/publish.R#L162-L168) to be named according to the pattern `test-(lin|win|mac)-<R version>[-<suffix>]*`, where `<R version>` is `rel`, `dev`, `old`, `ancient`, or three digits comprising an R version (e.g. `362` corresponding to R-3.6.2).
2024

2125
Tests jobs are allowed to fail, summary and logs of test jobs are later published at _CRAN-like checks_ page, see artifacts below.
2226

.ci/atime/tests.R

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
pval.thresh <- 0.001 # to reduce false positives.
2+
13
# Test case adapted from https://github.com/Rdatatable/data.table/issues/6105#issue-2268691745 which is where the issue was reported.
24
# https://github.com/Rdatatable/data.table/pull/6107 fixed performance across 3 ways to specify a column as Date, and we test each individually.
35
extra.args.6107 <- c(
@@ -13,6 +15,7 @@ for (extra.arg in extra.args.6107){
1315
tmp_csv = tempfile()
1416
fwrite(DT, tmp_csv)
1517
},
18+
FasterIO = "60a01fa65191c44d7997de1843e9a1dfe5be9f72", # First commit of the PR (https://github.com/Rdatatable/data.table/pull/6925/commits) that reduced time usage
1619
Slow = "e9087ce9860bac77c51467b19e92cf4b72ca78c7", # Parent of the merge commit (https://github.com/Rdatatable/data.table/commit/a77e8c22e44e904835d7b34b047df2eff069d1f2) of the PR (https://github.com/Rdatatable/data.table/pull/6107) that fixes the issue
1720
Fast = "a77e8c22e44e904835d7b34b047df2eff069d1f2") # Merge commit of the PR (https://github.com/Rdatatable/data.table/pull/6107) that fixes the issue
1821
this.test$expr = str2lang(sprintf("data.table::fread(tmp_csv, %s)", extra.arg))
@@ -117,12 +120,29 @@ test.list <- atime::atime_test_list(
117120
file.path("src", "init.c"),
118121
paste0("R_init_", Package_regex),
119122
paste0("R_init_", gsub("[.]", "_", new.Package_)))
123+
# allow compilation on new R versions where 'Calloc' is not defined
124+
pkg_find_replace(
125+
file.path("src", "*.c"),
126+
"\\b(Calloc|Free|Realloc)\\b",
127+
"R_\\1")
120128
pkg_find_replace(
121129
"NAMESPACE",
122130
sprintf('useDynLib\\("?%s"?', Package_regex),
123131
paste0('useDynLib(', new.Package_))
124132
},
125133

134+
# Constant overhead improvement https://github.com/Rdatatable/data.table/pull/6925
135+
# Test case adapted from https://github.com/Rdatatable/data.table/pull/7022#discussion_r2107900643
136+
"fread disk overhead improved in #6925" = atime::atime_test(
137+
N = 2^seq(0, 20), # smaller N because we are doing multiple fread calls.
138+
setup = {
139+
fwrite(iris[1], iris.csv <- tempfile())
140+
},
141+
expr = replicate(N, data.table::fread(iris.csv)),
142+
Fast = "60a01fa65191c44d7997de1843e9a1dfe5be9f72", # First commit of the PR (https://github.com/Rdatatable/data.table/pull/6925/commits) that reduced time usage
143+
Slow = "e25ea80b793165094cea87d946d2bab5628f70a6" # Parent of the first commit (https://github.com/Rdatatable/data.table/commit/60a01fa65191c44d7997de1843e9a1dfe5be9f72)
144+
),
145+
126146
# Performance regression discussed in https://github.com/Rdatatable/data.table/issues/4311
127147
# Test case adapted from https://github.com/Rdatatable/data.table/pull/4440#issuecomment-632842980 which is the fix PR.
128148
"shallow regression fixed in #4440" = atime::atime_test(
@@ -172,8 +192,9 @@ test.list <- atime::atime_test_list(
172192
# Fixed in https://github.com/Rdatatable/data.table/pull/4558
173193
"DT[by] fixed in #4558" = atime::atime_test(
174194
setup = {
195+
N9 <- as.integer(N * 0.9)
175196
d <- data.table(
176-
id = sample(c(seq.int(N * 0.9), sample(N * 0.9, N * 0.1, TRUE))),
197+
id = sample(c(seq.int(N9), sample(N9, N-N9, TRUE))),
177198
v1 = sample(5L, N, TRUE),
178199
v2 = sample(5L, N, TRUE)
179200
)
@@ -246,5 +267,15 @@ test.list <- atime::atime_test_list(
246267
Before = "f339aa64c426a9cd7cf2fcb13d91fc4ed353cd31", # Parent of the first commit https://github.com/Rdatatable/data.table/commit/fcc10d73a20837d0f1ad3278ee9168473afa5ff1 in the PR https://github.com/Rdatatable/data.table/pull/6393/commits with major change to fwrite with gzip.
247268
PR = "3630413ae493a5a61b06c50e80d166924d2ef89a"), # Close-to-last merge commit in the PR.
248269

249-
tests=extra.test.list)
270+
# Test case created directly using the atime code below (not adapted from any other benchmark), based on the PR, Removes unnecessary data.table call from as.data.table.array https://github.com/Rdatatable/data.table/pull/7010
271+
"as.data.table.array improved in #7010" = atime::atime_test(
272+
setup = {
273+
dims = c(N, 1, 1)
274+
arr = array(seq_len(prod(dims)), dim=dims)
275+
},
276+
expr = data.table:::as.data.table.array(arr, na.rm=FALSE),
277+
Slow = "73d79edf8ff8c55163e90631072192301056e336", # Parent of the first commit in the PR (https://github.com/Rdatatable/data.table/commit/8397dc3c993b61a07a81c786ca68c22bc589befc)
278+
Fast = "8397dc3c993b61a07a81c786ca68c22bc589befc"), # Commit in the PR (https://github.com/Rdatatable/data.table/pull/7019/commits) that removes inefficiency
279+
280+
tests=extra.test.list)
250281
# nolint end: undesirable_operator_linter.

.ci/linters/c/alloc_linter.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ alloc_linter = function(c_obj) {
66
lines = c_obj$lines
77
# Be a bit more precise to avoid mentions in comments, and allow
88
# malloc(0) to be used for convenience (e.g. #6757)
9-
alloc_lines = grep(R"{=\s*([(]\w+\s*[*][)])?[mc]alloc[(][^0]}", lines)
9+
alloc_lines = grep(R"{=\s*([(]\s*\w+\s*[*]\s*[)])?\s*[mc]alloc[(][^0]}", lines)
1010
if (!length(alloc_lines)) return()
1111
# int *tmp=(int*)malloc(...); or just int tmp=malloc(...);
1212
alloc_keys = lines[alloc_lines] |>
@@ -31,7 +31,7 @@ alloc_linter = function(c_obj) {
3131
cat("FILE: ", c_obj$path, "; LINES: ", head(bad_lines_idx, 1L), "-", tail(bad_lines_idx, 1L), "\n", sep="")
3232
writeLines(lines[bad_lines_idx])
3333
cat(strrep("-", max(nchar(lines[bad_lines_idx]))), "\n", sep="")
34-
stop("Expected the malloc()/calloc() usage above to be followed immediately by error checking.", call.=FALSE)
34+
stop("Expected the malloc()/calloc() usage above to be followed immediately by error checking (using '!', not '==NULL').", call.=FALSE)
3535
}
3636
})
3737
}

.ci/linters/c/cocci_linter.R

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,23 @@
11
cocci_linter = if (!nzchar(Sys.which("spatch"))) function(...) {} else function(c_obj) {
2-
bad <- FALSE
2+
bad = FALSE
3+
tmp = tempfile(fileext = '.c')
4+
on.exit(unlink(tmp))
5+
writeLines(c_obj$preprocessed, tmp)
36
for (spfile in list.files(".ci/linters/cocci", full.names = TRUE)) {
4-
# Coccinelle parser gets confused sometimes, so ignore stderr and the exit code
5-
out = suppressWarnings(system2(
7+
out = system2(
68
"spatch",
7-
shQuote(c(
8-
"--sp-file", spfile, c_obj$path, "--recursive-includes",
9-
"-I", R.home("include"), "-I", "src"
10-
)),
9+
shQuote(c("--sp-file", spfile, tmp)),
1110
stdout = TRUE, stderr = FALSE
12-
))
11+
)
1312
if (length(out) > 0) {
14-
cat(sprintf("In file '%s', Coccinelle patch '%s' recommends the following changes:\n", c_obj$path, spfile))
13+
cat(sprintf("In file '%s', Coccinelle linter '%s' located the following problems:\n", c_obj$path, spfile))
1514
writeLines(out)
16-
bad <- TRUE
15+
bad = TRUE
16+
}
17+
if (!is.null(status <- attr(out, 'status'))) {
18+
cat(sprintf("While working on file '%s', Coccinelle linter '%s' failed with exit code %d:\n", c_obj$path, spfile, status))
19+
bad = TRUE
1720
}
1821
}
19-
if (bad) stop("Please apply the changes above or fix the linter")
22+
if (bad) stop("Please investigate the problems above.")
2023
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
@malloc_return_value_cast expression@
2+
type T;
3+
expression E;
4+
@@
5+
- (T)
6+
malloc(E)
7+
8+
@calloc_realloc_return_value_cast expression@
9+
type T;
10+
expression E1, E2;
11+
identifier alloc =~ "^(c|re)alloc$";
12+
@@
13+
- (T)
14+
alloc(E1, E2)

.dev/CRAN_Release.cmd

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -95,17 +95,6 @@ grep "PROTECT_PTR" ./src/*.c
9595
# No use of long long, instead use int64_t. TODO
9696
# grep "long long" ./src/*.c
9797

98-
// No use of llu, lld, zd or zu
99-
grep -nE "(llu|lld|zd|zu)" src/*.[hc]
100-
// Comment moved here from fread.c on 19 Nov 2019
101-
// [Moved from fread.c on 19 Nov 2019] On Windows variables of type `size_t` cannot be printed
102-
// with "%zu" in the `snprintf()` function. For those variables we used to cast them into
103-
// `unsigned long long int` before printing, and defined (llu) to make the cast shorter.
104-
// We're now observing warnings from gcc-8 with -Wformat-extra-args, #4062. So
105-
// now we're more strict and cast to [u]int64_t and use PRIu64/PRId64 from <inttypes.h>
106-
// In many cases the format specifier is passed to our own macro (e.g. DTPRINT) or to Rprintf(),
107-
// error() etc, and even if they don't call sprintf() now, they could in future.
108-
10998
# No tabs in C or R code (sorry, Richard Hendricks)
11099
grep -P "\t" ./R/*.R
111100
grep -P "\t" ./src/*.c
@@ -235,23 +224,23 @@ system.time(test.data.table(script="*.Rraw")) # apx 8h = froll 3h + nafill 1m +
235224

236225

237226
###############################################
238-
# R 3.3.0 (stated dependency)
227+
# R 3.4.0 (stated dependency)
239228
###############################################
240229

241230
### ONE TIME BUILD
242231
sudo apt-get -y build-dep r-base
243232
cd ~/build
244-
wget http://cran.stat.ucla.edu/src/base/R-3/R-3.3.0.tar.gz
245-
tar xvf R-3.3.0.tar.gz
246-
cd R-3.3.0
233+
wget http://cran.stat.ucla.edu/src/base/R-3/R-3.4.0.tar.gz
234+
tar xvf R-3.4.0.tar.gz
235+
cd R-3.4.0
247236
CFLAGS="-fcommon" FFLAGS="-fallow-argument-mismatch" ./configure --without-recommended-packages
248237
make
249-
alias R330=~/build/R-3.3.0/bin/R
238+
alias R340=~/build/R-3.4.0/bin/R
250239
### END ONE TIME BUILD
251240

252241
cd ~/GitHub/data.table
253-
R330 CMD INSTALL ./data.table_1.16.99.tar.gz
254-
R330
242+
R340 CMD INSTALL ./data.table_1.16.99.tar.gz
243+
R340
255244
require(data.table)
256245
test.data.table(script="*.Rraw")
257246

.dev/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# data.table developer
22

3+
Inside this repository we provide some tools to improves development experience. Most notable is the `cc()` helper function that recompiles C sources, reloads R sources, and runs tests.
4+
5+
Typical development workflow will then look like:
6+
7+
0. `git checkout -b [branch]`
8+
1. edit package files
9+
2. run `R`
10+
3. call `cc(TRUE)`
11+
4. (if needed) go to point 1.
12+
13+
Once we (and tests) are satisfied with changes, we then run complete package checks:
14+
15+
0. in shell terminal
16+
1. run `make build`
17+
2. run `make check`
18+
3. (optionally) run on `r-devel`, e.g. `R=~/build/R-devel/bin/R make check`
19+
4. (optionally) run on ancient R, e.g. `R=~/build/R-340/bin/R make check`
20+
5. `git commit -m '[changes description]'`
21+
6. `git push [remote] [branch]`
22+
323
## Setup
424

525
To use the optional helper function `cc()`, one needs to set up the project path and source `.dev/cc.R` to use `cc()` conveniently. This works through creating an additional `.Rprofile` in the `data.table` directory.

.dev/lsan.supp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
leak:libfontconfig.so

.devcontainer/r-ancient-gcc/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM registry.gitlab.com/jangorecki/dockerfiles/r-3.3.0
1+
FROM registry.gitlab.com/jangorecki/dockerfiles/r-3.4.0
22

33
RUN apt-get -qq update \
44
&& apt-get install -y --no-install-recommends git

0 commit comments

Comments
 (0)