Skip to content

Commit 6082852

Browse files
committed
add sort_by.data.table
1 parent 4a2474b commit 6082852

File tree

4 files changed

+24
-0
lines changed

4 files changed

+24
-0
lines changed

NAMESPACE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,3 +206,5 @@ S3method(format_list_item, data.frame)
206206

207207
export(fdroplevels, setdroplevels)
208208
S3method(droplevels, data.table)
209+
210+
S3method(sort_by, data.table)

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,9 @@ rowwiseDT(
6969

7070
6. `fread()` gains `logicalYN` argument to read columns consisting only of strings `Y`, `N` as `logical` (as opposed to character), [#4563](https://github.com/Rdatatable/data.table/issues/4563). The default is controlled by option `datatable.logicalYN`, itself defaulting to `FALSE`, for back-compatibility -- some smaller tables (especially sharded tables) might inadvertently read a "true" string column as `logical` and cause bugs. This is particularly important for tables with a column named `y` or `n` -- automatic header detection under `logicalYN=TRUE` will see these values in the first row as being "data" as opposed to column names. A parallel option was not included for `fwrite()` at this time -- users looking for a compact representation of logical columns can still use `fwrite(logical01=TRUE)`. We also opted for now to check only `Y`, `N` and not `Yes`/`No`/`YES`/`NO`.
7171

72+
7. Base R generic `sort_by()` (new in R 4.4.0) is implemented for data.table's. It internally uses data.table's `forder()` instead of base R `order()` for efficiency. Hence, it uses C-locale as data.table's conventional sorting (suggested by @rikivillalba).
73+
74+
7275
## BUG FIXES
7376
7477
1. `fwrite()` respects `dec=','` for timestamp columns (`POSIXct` or `nanotime`) with sub-second accuracy, [#6446](https://github.com/Rdatatable/data.table/issues/6446). Thanks @kav2k for pointing out the inconsistency and @MichaelChirico for the PR.

R/data.table.R

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2532,6 +2532,18 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
25322532
}
25332533
}
25342534

2535+
sort_by.data.table <- function (x, y, ...)
2536+
{
2537+
if (!cedta()) return(NextMethod()) # nocov
2538+
if (inherits(y, "formula"))
2539+
y <- .formula2varlist(y, x)
2540+
if (!is.list(y))
2541+
y <- list(y)
2542+
# use forder instead of base 'order'
2543+
o <- do.call(forder, c(unname(y), list(...)))
2544+
x[o, , drop = FALSE]
2545+
}
2546+
25352547
# TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want
25362548

25372549
copy = function(x) {

inst/tests/tests.Rraw

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20686,6 +20686,13 @@ test(2299.10, data.table(a=1), output="a\
2068620686
test(2299.11, data.table(a=list(data.frame(b=1))), output="a\n1: <data.frame[1x1]>")
2068720687
test(2299.12, data.table(a=list(data.table(b=1))), output="a\n1: <data.table[1x1]>")
2068820688

20689+
# sort_by.data.table
20690+
DT1 = data.table(a = c(1, 3, 2, NA, 3) , b = 4:0)
20691+
DT2 = data.table(a = c("c", "a", "B")) # data.table uses C-locale and should sort_by if cedta()
20692+
test(2300.01, sort_by(DT1, ~ a + b), data.table(a = c(1,2,3,3,NA), b = c(4L,2L,0L,3L,1L)))
20693+
test(2300.02, sort_by(DT1, ~ I(a + b)), data.table(a = c(3,2,1,3,NA), b = c(0L,2L,4L,3L,1L)))
20694+
test(2300.03, sort_by(DT2, ~ a), data.table(a = c("B", "a", "c")))
20695+
2068920696
if (test_bit64) {
2069020697
# Join to integer64 doesn't require integer32 representation, just integer64, #6625
2069120698
i64_val = .Machine$integer.max + 1

0 commit comments

Comments
 (0)