Skip to content
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ebd152d
modular optimization paths - init
ben-schwen Oct 28, 2025
71b21ab
make linter happy
ben-schwen Oct 29, 2025
8a9e727
move tests
ben-schwen Oct 30, 2025
04e5782
add lapply(list(col1, col2, ...), fun) pattern
ben-schwen Oct 30, 2025
a8dde19
turn on optimization
ben-schwen Oct 31, 2025
67f2874
add type conversion support to GForce
ben-schwen Nov 1, 2025
2876ebe
remove stale branch
ben-schwen Nov 1, 2025
c445c38
add tests
ben-schwen Nov 2, 2025
5410e31
update man
ben-schwen Nov 2, 2025
dece1c6
merge tests
ben-schwen Nov 2, 2025
5e1789d
polish test fun
ben-schwen Nov 2, 2025
62f1c48
add arithmetic
ben-schwen Nov 2, 2025
c47ec27
add AST walker and update tests
ben-schwen Nov 2, 2025
1d324d6
add tests
ben-schwen Nov 2, 2025
6b54c1e
Merge branch 'master' into modular_gforce
ben-schwen Nov 2, 2025
22cf35e
add NEWS
ben-schwen Nov 2, 2025
25a7e2e
make function name in massageSD more expressive
ben-schwen Nov 3, 2025
eb8056c
rename levels argument to optimization
ben-schwen Nov 3, 2025
4544398
update docs
ben-schwen Nov 3, 2025
d40edb8
restore test nums
ben-schwen Nov 3, 2025
5e7efb7
remove double tests
ben-schwen Nov 3, 2025
3826927
simplify tests
ben-schwen Nov 3, 2025
982343f
phrasing
ben-schwen Nov 4, 2025
996b28c
Merge remote-tracking branch 'refs/remotes/origin/modular_gforce' int…
ben-schwen Nov 4, 2025
1e6ad03
use mget for all vector params
ben-schwen Nov 4, 2025
9e1297e
rename optimization parameter
ben-schwen Nov 4, 2025
f6981d6
rename optimization parameter also in test
ben-schwen Nov 4, 2025
9fc4734
add optimize param checks
ben-schwen Nov 4, 2025
6aaea51
Merge branch 'master' into modular_gforce
ben-schwen Nov 4, 2025
c07999a
remove trailing ws
ben-schwen Nov 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,16 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
# user system elapsed
# 0.028 0.000 0.005
```
20. `fread()` now supports the `comment.char` argument to skip trailing comments or comment-only lines, consistent with `read.table()`, [#856](https://github.com/Rdatatable/data.table/issues/856). The default remains `comment.char = ""` (no comment parsing) for backward compatibility and performance, in contrast to `read.table(comment.char = "#")`. Thanks to @arunsrinivasan and many others for the suggestion and @ben-schwen for the implementation.

20. `fread()` now supports the `comment.char` argument to skip trailing comments or comment-only lines, consistent with `read.table()`, [#856](https://github.com/Rdatatable/data.table/issues/856). The default remains `comment.char = ""` (no comment parsing) for backward compatibility and performance, in contrast to `read.table(comment.char = "#")`. Thanks to @arunsrinivasan and many others for the suggestion and @ben-schwen for the implementation.

21. GForce and lapply optimization detection has been refactored to use modular optimization paths and an AST (Abstract Syntax Tree) walker for improved maintainability and extensibility. The new architecture separates optimization detection into distinct, composable phases. This makes future optimization enhancements a lot easier. Thanks to @grantmcdermott, @jangorecki, @MichaelChirico, and @HughParsonage for the suggestions and @ben-schwen for the implementation.

This rewrite also introduces several new optimizations:
- Enables Map instead of lapply optimizations (e.g., `Map(fun, .SD)` -> `list(fun(col1), fun(col2), ...)`) [#5336](https://github.com/Rdatatable/data.table/issues/5336)
- lapply optimization works without .SD (e.g., `lapply(list(col1, col2), fun)` -> `list(fun(col1), fun(col2))` [#5032](https://github.com/Rdatatable/data.table/issues/5032)
- Type conversion support in GForce expressions (e.g., `sum(as.numeric(x))`) [#2934](https://github.com/Rdatatable/data.table/issues/2934)
- Arithmetic operation support in GForce (e.g., `max(x) - min(x)`) [#3815](https://github.com/Rdatatable/data.table/issues/3815)

### BUG FIXES

Expand Down
702 changes: 436 additions & 266 deletions R/data.table.R

Large diffs are not rendered by default.

29 changes: 28 additions & 1 deletion R/test.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,34 @@ gc_mem = function() {
# nocov end
}

test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,notOutput=NULL,ignore.warning=NULL,options=NULL,env=NULL) {
test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,notOutput=NULL,ignore.warning=NULL,options=NULL,env=NULL,levels=NULL) {
# if levels is provided, test across multiple optimization levels
if (!is.null(levels)) {
cl = match.call()
cl$levels = NULL # Remove levels from the recursive call

vector_params = c("error", "warning", "message", "output", "notOutput", "ignore.warning")
# Check if y was explicitly provided (not just the default)
y_provided = !missing(y)
compare = !y_provided && length(levels)>1L && !any(vapply_1b(vector_params, function(p) length(get(p, envir=environment())) > 0L))

for (i in seq_along(levels)) {
cl$num = num + (i - 1L) * 1e-6
opt_level = list(datatable.optimize = levels[i])
cl$options = if (!is.null(options)) c(as.list(options), opt_level) else opt_level
for (p in vector_params) {
val = get(p, envir=environment())
if (length(val) > 0L) {
cl[[p]] = val[((i - 1L) %% length(val)) + 1L] # cycle through values if fewer than levels
}
}

if (compare && i == 1L) cl$y = eval(cl$x, parent.frame())
eval(cl, parent.frame()) # actual test call
}
return(invisible())
}

if (!is.null(env)) {
old = Sys.getenv(names(env), names=TRUE, unset=NA)
to_unset = !lengths(env)
Expand Down
25 changes: 7 additions & 18 deletions inst/tests/benchmark.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -190,24 +190,13 @@ DT = data.table(A=1:10,B=rnorm(10),C=paste("a",1:100010,sep=""))
test(301.1, nrow(DT[,sum(B),by=C])==100010)

# Test := by key, and that := to the key by key unsets the key. Make it non-trivial in size too.
local({
old = options(datatable.optimize=0L); on.exit(options(old))
set.seed(1)
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
test(637.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
test(637.2, key(DT[J(43L), a:=99L]), NULL)
setkey(DT, a)
test(637.3, key(DT[, a:=99L, by=a]), NULL)
})
local({
options(datatable.optimize=2L); on.exit(options(old))
set.seed(1)
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
test(638.1, DT[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
test(638.2, key(DT[J(43L), a:=99L]), NULL)
setkey(DT,a)
test(638.3, key(DT[, a:=99L, by=a]), NULL)
})
set.seed(1)
DT = data.table(a=sample(1:100, 1e6, replace=TRUE), b=sample(1:1000, 1e6, replace=TRUE), key="a")
opt = c(0L,2L)
test(637.1, levels=opt, copy(DT)[, m:=sum(b), by=a][1:3], data.table(a=1L, b=c(156L, 808L, 848L), m=DT[J(1), sum(b)], key="a"))
test(637.2, levels=opt, key(copy(DT)[J(43L), a:=99L]), NULL)
setkey(DT, a)
test(637.3, levels=opt, key(copy(DT)[, a:=99L, by=a]), NULL)

# Test X[Y] slowdown, #2216
# Many minutes in 1.8.2! Now well under 1s, but 10s for very wide tolerance for CRAN. We'd like CRAN to tell us if any changes
Expand Down
Loading
Loading