Skip to content

Commit 8ea8f72

Browse files
ben-schwenMichaelChiricoOfekShilonAnirban166davidbudzynski
authored
implement frev as fast base::rev alternative (#5907)
* add macro version * write explicit parallel version * copy attributes * add tests * add to NAMESPACE * add to tests * copy names * add man page * update man * fix typos * update tests * add coverage * add benchmark example * coverage * NEWS * trim NEWS * update NEWS * add bit64 * update naming in NEWS * 1.15.0 on CRAN. Bump to 1.15.99 * Fix transform slowness (#5493) * Fix 5492 by limiting the costly deparse to `nlines=1` * Implementing PR feedbacks * Added inside * Fix typo in name * Idiomatic use of inside * Separating the deparse line limit to a different PR --------- Co-authored-by: Michael Chirico <[email protected]> * Improvements to the introductory vignette (#5836) * Added my improvements to the intro vignette * Removed two lines I added extra as a mistake earlier * Requested changes * Vignette typo patch (#5402) * fix typos and grammatical mistakes * fix typos and punctuation * remove double spaces where it wasn't necessary * fix typos and adhere to British English spelling * fix typos * fix typos * add missing closing bracket * fix typos * review fixes * Update vignettes/datatable-benchmarking.Rmd Co-authored-by: Michael Chirico <[email protected]> * Update vignettes/datatable-benchmarking.Rmd Co-authored-by: Michael Chirico <[email protected]> * Apply suggestions from code review benchmarking Co-authored-by: Michael Chirico <[email protected]> * remove unnecessary [ ] from datatable-keys-fast-subset.Rmd * Update vignettes/datatable-programming.Rmd Co-authored-by: Michael Chirico <[email protected]> * Update vignettes/datatable-reshape.Rmd Co-authored-by: Michael Chirico <[email protected]> * One last batch of fine-tuning --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Michael Chirico <[email protected]> * Improved handling of list columns with NULL entries (#4250) * Updated documentation for rbindlist(fill=TRUE) * Print NULL entries of list as NULL * Added news item * edit NEWS, use '[NULL]' not 'NULL' * fix test * split NEWS item * add example --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Benjamin Schwendinger <[email protected]> * clarify that list input->unnamed list output (#5383) * clarify that list input->unnamed list output * Add example where make.names is used * mention role of make.names * fix subsetting issue in split.data.table (#5368) * fix subsetting issue in split.data.table * add a test * drop=FALSE on inner [ * switch to 3.2.0 R dep (#5905) * Allow early exit from check for eval/evalq in cedta (#5660) * Allow early exit from check for eval/evalq in cedta Done in the browser+untested, please take a second look :) * Use %chin% * nocov new code * frollmax1: frollmax, frollmax adaptive, left adaptive support (#5889) * frollmax exact, buggy fast, no fast adaptive * frollmax fast fixing bugs * frollmax man to fix CRAN check * frollmax fast adaptive non NA, dev * froll docs, adaptive left * no frollmax fast adaptive * frollmax adaptive exact NAs handling * PR summary in news * copy-edit changes from reviews Co-authored-by: Benjamin Schwendinger <[email protected]> * Apply suggestions from code review Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Benjamin Schwendinger <[email protected]> * comment requested by Michael * update NEWS file * Apply suggestions from code review Co-authored-by: Michael Chirico <[email protected]> * Apply suggestions from code review Co-authored-by: Michael Chirico <[email protected]> * add comment requested by Michael * add comment about int iterator for loop over k-1 obs * extra comments * Revert "extra comments" This reverts commit 03af0e3. * add comments to frollmax and frollsum * typo fix --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Benjamin Schwendinger <[email protected]> * Friendlier error in assignment with trailing comma (#5467) * friendlier error in assignment with trailing comma e.g. `DT[, `:=`(a = 1, b = 2,)`. WIP. Need to add tests and such, but editing from browser before I forget. * Another pass * include unnamed indices on RHS too * tests * NEWS * test numbering * explicit example in NEWS * Link to ?read.delim in ?fread to give a closer analogue of expected behavior (#5635) * fread is similar to read.delim (#5634) * Use ?read.csv / ?read.delim --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Michael Chirico <[email protected]> * Run GHA jobs on 1-15-99 dev branch (#5909) * prohibit matrix * readd deleted line * Make declarations static for covr (#5910) * reorder code * return invisible if inplace * cut to 1 line * use isTRUE for copy=NA * speedup strings and lists * add Hughs comments * add coverage * dedup INTSXP LGLSXP * make tests lighter * rm altrep include * change testnum * remove altrep * remove duplicated tests * mostly fix botched merge * migrate NEWS item * revert bad search+replace * update NEWS wording * add small body * add additional test cases * rerun benchmarks single threaded * update doc * remove unnecessary assignment * change to frev/setrev * add symbol for setrev * update docs * update NEWS * add details about attributes * drop attributes except names, class and levels * update docs * vestigial copy= reference * use parity tests * change tests to capture behavior after side-effects * change man * allow matrix frev * rotate idiom * use frev and setrev internally * item numbering * block future rev() usage * setrev->setfrev * dont use setfrev on caller-owned object * update wording that was contrasting frev+setfrev; tweak examples * mention that levels are retained * add names test * remove temp variables * fix test ordering * move temporary variables --------- Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Ofek <[email protected]> Co-authored-by: Ani <[email protected]> Co-authored-by: David Budzynski <[email protected]> Co-authored-by: Michael Chirico <[email protected]> Co-authored-by: Scott Ritchie <[email protected]> Co-authored-by: Jan Gorecki <[email protected]> Co-authored-by: Manuel López-Ibáñez <[email protected]>
1 parent a9b28b3 commit 8ea8f72

File tree

12 files changed

+219
-11
lines changed

12 files changed

+219
-11
lines changed

.ci/.lintr.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ linters = c(dt_linters, all_linters(
2121
message = "Use messagef to avoid fragmented translations.",
2222
warning = "Use warningf to avoid fragmented translations.",
2323
stop = "Use stopf to avoid fragmented translations.",
24+
rev = "Use frev internally, or setfrev if by-reference is safe.",
2425
NULL
2526
)),
2627
# undesirable_function_linter(modify_defaults(

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,7 @@ S3method(format_list_item, data.frame)
208208

209209
export(fdroplevels, setdroplevels)
210210
S3method(droplevels, data.table)
211+
export(frev)
211212

212213
# sort_by added in R 4.4.0, #6662, https://stat.ethz.ch/pipermail/r-announce/2024/000701.html
213214
if (getRversion() >= "4.4.0") S3method(sort_by, data.table)

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@
4646

4747
10. `data.table()` and `as.data.table()` with `keep.rownames=TRUE` now extract row names from named vectors, matching `data.frame()` behavior. Names from the first named vector in the input are used to create the row names column (default name `"rn"` or custom name via `keep.rownames="column_name"`), [#1916](https://github.com/Rdatatable/data.table/issues/1916). Thanks to @richierocks for the feature request and @Mukulyadav2004 for the implementation.
4848

49+
11. New `frev(x)` as a faster analogue to `base::rev()` for atomic vectors/lists, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Twice as fast as `base::rev()` on large inputs, and faster with more threads. Thanks to Benjamin Schwendinger for suggesting and implementing.
50+
4951
### BUG FIXES
5052

5153
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.

R/as.data.table.R

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ as.data.table.table = function(x, keep.rownames=FALSE, key=NULL, ...) {
3636
# prevent #4179 & just cut out here
3737
if (any(dim(x) == 0L)) return(null.data.table())
3838
# Fix for bug #43 - order of columns are different when doing as.data.table(with(DT, table(x, y)))
39-
val = rev(dimnames(provideDimnames(x)))
39+
val = frev(dimnames(provideDimnames(x)))
4040
if (is.null(names(val)) || !any(nzchar(names(val))))
41-
setattr(val, 'names', paste0("V", rev(seq_along(val))))
41+
setattr(val, 'names', paste0("V", frev(seq_along(val))))
4242
ans = data.table(do.call(CJ, c(val, sorted=FALSE)), N = as.vector(x), key=key)
43-
setcolorder(ans, c(rev(head(names(ans), -1L)), "N"))
43+
setcolorder(ans, c(frev(head(names(ans), -1L)), "N"))
4444
ans
4545
}
4646

@@ -104,18 +104,18 @@ as.data.table.array = function(x, keep.rownames=FALSE, key=NULL, sorted=TRUE, va
104104
dnx[nulldnx] = lapply(dx[nulldnx], seq_len) #3636
105105
dnx
106106
} else dnx
107-
val = rev(val)
107+
setfrev(val)
108108
if (is.null(names(val)) || !any(nzchar(names(val))))
109-
setattr(val, 'names', paste0("V", rev(seq_along(val))))
109+
setattr(val, 'names', paste0("V", frev(seq_along(val))))
110110
if (value.name %chin% names(val))
111-
stopf("Argument 'value.name' should not overlap with column names in result: %s", brackify(rev(names(val))))
111+
stopf("Argument 'value.name' should not overlap with column names in result: %s", brackify(frev(names(val))))
112112
N = NULL
113113
ans = do.call(CJ, c(val, sorted=FALSE))
114114
set(ans, j="N", value=as.vector(x))
115115
if (isTRUE(na.rm))
116116
ans = ans[!is.na(N)]
117117
setnames(ans, "N", value.name)
118-
dims = rev(head(names(ans), -1L))
118+
dims = frev(head(names(ans), -1L))
119119
setcolorder(ans, c(dims, value.name))
120120
if (isTRUE(sorted) && is.null(key)) key = dims
121121
setkeyv(ans, key)

R/bmerge.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
110110
}
111111
if (x_merge_type=="integer64" || i_merge_type=="integer64") {
112112
nm = c(iname, xname)
113-
if (x_merge_type=="integer64") { w=i; wc=icol; wclass=i_merge_type; } else { w=x; wc=xcol; wclass=x_merge_type; nm=rev(nm) } # w is which to coerce
113+
if (x_merge_type=="integer64") { w=i; wc=icol; wclass=i_merge_type; } else { w=x; wc=xcol; wclass=x_merge_type; setfrev(nm) } # w is which to coerce
114114
if (wclass=="integer" || (wclass=="double" && fitsInInt64(w[[wc]]))) {
115115
from_detail = if (wclass == "double") gettext(" (which has integer64 representation, e.g. no fractions)") else ""
116116
coerce_col(w, wc, wclass, "integer64", nm[1L], nm[2L], from_detail, verbose=verbose)

R/utils.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ which.last = function(x)
8686
if (!is.logical(x)) {
8787
stopf("x not boolean")
8888
}
89-
length(x) - match(TRUE, rev(x)) + 1L
89+
length(x) - match(TRUE, frev(x)) + 1L
9090
}
9191

9292
require_bit64_if_needed = function(DT) {
@@ -226,7 +226,7 @@ fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
226226
if (!isTRUEorFALSE(rev))
227227
stopf("argument 'rev' must be TRUE or FALSE")
228228
if (sort) levels = sort(levels)
229-
if (rev) levels = rev(levels)
229+
if (rev) levels = frev(levels)
230230
factor(x, levels=levels, ...)
231231
}
232232

R/wrappers.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,6 @@ fitsInInt32 = function(x) .Call(CfitsInInt32R, x)
2121
fitsInInt64 = function(x) .Call(CfitsInInt64R, x)
2222

2323
coerceAs = function(x, as, copy=TRUE) .Call(CcoerceAs, x, as, copy)
24+
25+
frev = function(x) .Call(Cfrev, x, TRUE)
26+
setfrev = function(x) invisible(.Call(Cfrev, x, FALSE))

inst/tests/tests.Rraw

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
7171
setcoalesce = data.table:::setcoalesce
7272
setdiff_ = data.table:::setdiff_
7373
setreordervec = data.table:::setreordervec
74+
setfrev = data.table:::setfrev
7475
shallow = data.table:::shallow # until exported
7576
.shallow = data.table:::.shallow
7677
split.data.table = data.table:::split.data.table
@@ -4227,7 +4228,7 @@ test(1159.6, setDT(x), error="Argument 'x' to 'setDT' should be a")
42274228
x <- list(1, 2:3)
42284229
test(1159.7, setDT(x), error="All elements in argument 'x' to 'setDT'")
42294230

4230-
# test 1160 was for setrev, now removed
4231+
# test 1160 was for setfrev, now removed
42314232

42324233
# tests for setreordervec
42334234
# integer
@@ -21429,3 +21430,62 @@ test(2330.8, as.data.table(list(M), keep.rownames=TRUE), data.table(rn=rep("", 3
2142921430

2143021431
# .SD reference in '...' passed to lapply(FUN=) is recognized as data.table
2143121432
test(2331, lapply(list(data.table(a=1:2)), `[`, j=.SD[1L]), list(data.table(a=1L)))
21433+
21434+
# 5885 implement frev
21435+
d = c(NA, NaN, Inf, -Inf)
21436+
test(2332.00, frev(c(FALSE, NA)), rev(c(FALSE, NA)))
21437+
test(2332.01, frev(c(0L, NA)), rev(c(0L, NA)))
21438+
test(2332.02, frev(d), rev(d))
21439+
test(2332.03, frev(c(NA, 1, 0+2i)), rev(c(NA, 1, 0+2i)))
21440+
test(2332.04, frev(as.raw(0:1)), rev(as.raw(0:1)))
21441+
test(2332.05, frev(NULL), rev(NULL))
21442+
test(2332.06, frev(character(5)), rev(character(5)))
21443+
test(2332.07, frev(integer(0)), rev(integer(0)))
21444+
test(2332.08, frev(list(1, "a")), rev(list(1, "a")))
21445+
test(2332.09, {x=c(0L, NA); setfrev(x); x}, c(NA, 0L))
21446+
test(2332.10, {x=d; setfrev(x); x}, c(-Inf, Inf, NaN, NA))
21447+
test(2332.11, {x=c(NA, 1, 0+2i); setfrev(x); x}, c(0+2i, 1, NA))
21448+
test(2332.12, {x=as.raw(0:1); setfrev(x); x}, as.raw(1:0))
21449+
test(2332.13, {x=NULL; setfrev(x); x}, NULL)
21450+
test(2332.14, {x=character(5); setfrev(x); x}, character(5))
21451+
test(2332.15, {x=integer(0); setfrev(x); x}, integer(0))
21452+
test(2332.16, {x=list(1, "a"); setfrev(x); x}, list("a", 1))
21453+
test(2332.17, frev(1:1e2), rev(1:1e2))
21454+
# copy arguments
21455+
x = 1:3
21456+
test(2332.21, {frev(x); x}, 1:3)
21457+
test(2332.22, {setfrev(x); x}, 3:1)
21458+
test(2332.23, address(x) == address(setfrev(x)))
21459+
test(2332.24, address(x) != address(frev(x)))
21460+
# do not alter on subsets
21461+
test(2332.25, {setfrev(x[1:2]); x}, 1:3)
21462+
# levels
21463+
f = as.factor(letters)
21464+
test(2332.31, frev(f), rev(f))
21465+
test(2332.32, frev(as.IDate(1:10)), as.IDate(10:1))
21466+
test(2332.33, frev(as.IDate(1:10)), as.IDate(10:1))
21467+
# names
21468+
x = c(a=1L, b=2L, c=3L)
21469+
test(2332.41, frev(x), rev(x))
21470+
test(2332.42, setfrev(x), x)
21471+
x = c(a=1L, b=2L, c=3L)
21472+
test(2332.43, {frev(x); names(x)}, c("a","b","c"))
21473+
# attributes
21474+
x = structure(1:10, class = c("IDate", "Date"), att = 1L)
21475+
test(2332.51, attr(frev(x), "att"), attr(rev(x), "att"))
21476+
test(2332.52, class(frev(x)), class(rev(x)))
21477+
test(2332.53, attr(setfrev(x), "att"), 1L)
21478+
test(2332.54, class(setfrev(x)), c("IDate", "Date"))
21479+
x = structure(integer(0), att = 1L)
21480+
test(2332.55, attr(frev(x), "att"), attr(rev(x), "att"))
21481+
# errors
21482+
test(2332.61, frev(data.table()), error="should not be data.frame or data.table")
21483+
test(2332.62, frev(expression(1)), error="is not supported by frev")
21484+
if (test_bit64) {
21485+
x = as.integer64(c(1, NA, 3))
21486+
test(2332.71, frev(x), rev(x))
21487+
test(2332.72, setfrev(x), x)
21488+
}
21489+
# support rotate idiom
21490+
M1 = M2 = matrix(1:4, nrow=2)
21491+
test(2332.81, {M1[]=frev(M1); M1}, {M2[]=rev(M2); M2})

man/frev.Rd

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
\name{frev}
2+
\alias{frev}
3+
\alias{rev}
4+
\title{Fast reverse}
5+
\description{
6+
Similar to \code{\link[base]{rev}} but \emph{faster}.
7+
}
8+
9+
\usage{
10+
frev(x)
11+
}
12+
\arguments{
13+
\item{x}{ An atomic \code{vector} or \code{list}. }
14+
}
15+
16+
\details{
17+
Similar to \code{\link[base]{rev}}, \code{frev} only retains three attributes: names, class, and factor levels.
18+
}
19+
20+
\value{
21+
Returns the input reversed.
22+
}
23+
24+
\examples{
25+
# on vectors
26+
x = setNames(1:10, letters[1:10])
27+
frev(x)
28+
29+
# list
30+
frev(list(1, "a", TRUE))
31+
}
32+
\keyword{ data }

src/data.table.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,7 @@ bool isDataTable(SEXP x);
278278
bool isRectangularList(SEXP x);
279279
bool perhapsDataTable(SEXP x);
280280
SEXP perhapsDataTableR(SEXP x);
281+
SEXP frev(SEXP x, SEXP copyArg);
281282
NORET void internal_error(const char *call_name, const char *format, ...);
282283

283284
// types.c

0 commit comments

Comments
 (0)